2013-11-11 // Nagios Monitoring - IBM Power Systems and HMC
We use several IBM Power Systems servers in our datacenters to run mission critical application and database services. Some time ago i wrote a – rather crude – Nagios plugin to monitor the IBM Power Systems hardware via the Hardware Management Console (HMC) appliance, as well as the HMC itself. In order to run the Nagios plugin, you need to have “Remote Command Execution” via SSH activated on the HMC appliances. Also, a network connection from the Nagios system to the HMC appliances on port TCP/22 must be allowed.
The whole setup for monitoring IBM Power Systems servers and HMC appliances looks like this:
Enable remote command execution on the HMC. Login to the HMC WebGUI and navigate to:
-> HMC Management -> Administration -> Remote Command Execution -> Check selection: "Enable remote command execution using the ssh facility" -> OK
Verify the port TCP/22 on the HMC appliance can be reached from the Nagios system.
Create a SSH private/public key pair on the Nagios system to authenticate the monitoring user against the HMC appliance:
$ mkdir -p /etc/nagios3/.ssh/ $ chown -R nagios:nagios /etc/nagios3/.ssh/ $ chmod 700 /etc/nagios3/.ssh/ $ cd /etc/nagios3/.ssh/ $ ssh-keygen -t rsa -b 2048 -f id_rsa_hmc Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in id_rsa_hmc. Your public key has been saved in id_rsa_hmc.pub. The key fingerprint is: xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx user@host $ chown -R nagios:nagios /etc/nagios3/.ssh/id_rsa_hmc*
Prepare a SSH authorized keys file and transfer it to the HMC.
Optional Create a separate user on the HMC for monitoring purposes. The new user needs to have permission to run the HMC commands
lsled
,lssvcevents
,lssyscfg
andmonhmc
. If a separate user should be used, replacehscroot
in the following text with the name of the new user.Optional If there is already a SSH authorized keys file set up on the HMC, be sure to retrieve it first so it doesn't get overwritten:
$ cd /tmp $ scp -r hscroot@hmc:~/.ssh /tmp/ Password: authorized_keys2 100% 600 0.6KB/s 00:00
Insert the SSH public key created above into a SSH authorized keys file and push it back to the HMC:
$ echo -n 'from="<IP of your Nagios system>",no-agent-forwarding,no-port-forwarding,no-X11-forwarding ' \ >> /tmp/.ssh/authorized_keys2 $ cat /etc/nagios3/.ssh/id_rsa_hmc.pub >> /tmp/.ssh/authorized_keys2 $ scp -r /tmp/.ssh hscroot@hmc:~/ Password: authorized_keys2 100% 1079 1.1KB/s 00:00 $ rm -r /tmp/.ssh
Add the SSH host public key from the HMC appliance:
HMC$ cat /etc/ssh/ssh_host_rsa_key.pub ssh-rsa AAAAB3N...[cut]...jxFAz+4O1X root@hmc
to the SSH
ssh_known_hosts
file of the Nagios system:$ vi /etc/ssh/ssh_known_hosts hmc,FQDN,<IP of your HMC> ssh-rsa AAAAB3N...[cut]...jxFAz+4O1X
Verify the login on the HMC appliance is possible from the Nagios system with the SSH public key created above:
$ ssh -i /etc/nagios3/.ssh/id_rsa_hmc hscroot@hmc 'lshmc -V' "version= Version: 7 Release: 7.7.0 Service Pack: 2 HMC Build level 20130730.1 MH01373: Fix for HMC V7R7.7.0 SP2 (08-06-2013) ","base_version=V7R7.7.0"
Download the Nagios plugin check_hmc.sh and place it in the plugins directory of your Nagios system, in this example
/usr/lib/nagios/plugins/
:$ mv -i check_hmc.sh /usr/lib/nagios/plugins/ $ chmod 755 /usr/lib/nagios/plugins/check_hmc.sh
Define the following Nagios commands. In this example this is done in the file
/etc/nagios-plugins/config/check_hmc.cfg
:# check HMC disk define command { command_name check_hmc_disk command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C disk -w $ARG1$ -c $ARG2$ -a $ARG3$ } # check HMC processor define command { command_name check_hmc_proc command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C proc -w $ARG1$ -c $ARG2$ } # check HMC memory define command { command_name check_hmc_mem command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C mem -w $ARG1$ -c $ARG2$ } # check HMC swap define command { command_name check_hmc_swap command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C swap -w $ARG1$ -c $ARG2$ } # check HMC rmc processes define command { command_name check_hmc_rmc command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C rmc -w $ARG1$ -c $ARG2$ -a $ARG3$ } # check HMC System Attention LED for physical system define command { command_name check_hmc_ledphys command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C ledphys } # check HMC System Attention LED for virtual partition define command { command_name check_hmc_ledvlpar command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C ledvlpar } # check HMC System Attention LED for virtual system define command { command_name check_hmc_ledvsys command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C ledvsys } # check HMC serviceable events define command { command_name check_hmc_events command_line $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C events }
Define a group of services in your Nagios configuration to be checked for each HMC appliance:
# check sshd define service { use generic-service hostgroup_name hmc service_description Check_SSH check_command check_ssh } # check_hmc_mem define service { use generic-service-pnp servicegroups sshchecks hostgroup_name hmc service_description Check_memory check_command check_hmc_mem!20!10 } # check_hmc_proc define service { use generic-service-pnp servicegroups sshchecks hostgroup_name hmc service_description Check_CPU_statistics check_command check_hmc_proc!10!5 } # check_hmc_disk define service { use generic-service-pnp servicegroups sshchecks hostgroup_name hmc service_description Check_space_on_/ check_command check_hmc_disk!10%!5%!'/$' } define service { use generic-service-pnp servicegroups sshchecks hostgroup_name hmc service_description Check_space_on_/dump check_command check_hmc_disk!10%!5%!/dump } define service { use generic-service-pnp servicegroups sshchecks hostgroup_name hmc service_description Check_space_on_/extra check_command check_hmc_disk!10%!5%!/extra } define service { use generic-service-pnp servicegroups sshchecks hostgroup_name hmc service_description Check_space_on_/var check_command check_hmc_disk!10%!5%!/var } # check_hmc_swap define service { use generic-service-pnp servicegroups sshchecks hostgroup_name hmc service_description Check_space_on_swap check_command check_hmc_swap!200!100 } # check_hmc_rmc: IBM.CSMAgentRMd define service { use generic-service servicegroups sshchecks hostgroup_name hmc service_description Check_process_IBM.CSMAgentRMd check_command check_hmc_rmc!1!1!IBM.CSMAgentRMd } # check_hmc_rmc: IBM.LparCmdRMd define service { use generic-service servicegroups sshchecks hostgroup_name hmc service_description Check_process_IBM.LparCmdRMd check_command check_hmc_rmc!1!1!IBM.LparCmdRMd } # check_hmc_rmc: IBM.ServiceRMd define service { use generic-service servicegroups sshchecks hostgroup_name hmc service_description Check_process_IBM.ServiceRMd check_command check_hmc_rmc!1!1!IBM.ServiceRMd } # check_hmc_ledphys define service { use generic-service servicegroups sshchecks hostgroup_name hmc service_description Check_SA_LED_physical check_command check_hmc_ledphys } # check_hmc_ledvlpar define service { use generic-service servicegroups sshchecks hostgroup_name hmc service_description Check_SA_LED_vpartition check_command check_hmc_ledvlpar } # check_hmc_ledvsys define service { use generic-service servicegroups sshchecks hostgroup_name hmc service_description Check_SA_LED_vsystem check_command check_hmc_ledvsys } # check_hmc_events define service { use generic-service servicegroups sshchecks hostgroup_name hmc service_description Check_serviceable_events check_command check_hmc_events }
Replace
generic-service
with your Nagios service template. Replacegeneric-service-pnp
with your Nagios service template that has performance data processing enabled.Define a service dependency to run the above checks only if the
Check_SSH
was run successfully:# HMC SSH dependencies define servicedependency { hostgroup_name hmc service_description Check_SSH dependent_service_description Check_process_.* execution_failure_criteria c,p,u,w notification_failure_criteria c,p,u,w } define servicedependency { hostgroup_name hmc service_description Check_SSH dependent_service_description Check_space_on_.* execution_failure_criteria c,p,u,w notification_failure_criteria c,p,u,w } define servicedependency { hostgroup_name hmc service_description Check_SSH dependent_service_description Check_memory execution_failure_criteria c,p,u,w notification_failure_criteria c,p,u,w } define servicedependency { hostgroup_name hmc service_description Check_SSH dependent_service_description Check_CPU_statistics execution_failure_criteria c,p,u,w notification_failure_criteria c,p,u,w } define servicedependency { hostgroup_name hmc service_description Check_SSH dependent_service_description Check_SA_LED_.* execution_failure_criteria c,p,u,w notification_failure_criteria c,p,u,w } define servicedependency { hostgroup_name hmc service_description Check_SSH dependent_service_description Check_serviceable_events execution_failure_criteria c,p,u,w notification_failure_criteria c,p,u,w }
Define a host in your Nagios configuration for each HMC appliance. In this example its named
hmc1
:define host { use hmc host_name hmc1 alias HMC 1 address 10.0.0.1 parents parent_lan }
Replace
hmc
with your Nagios host template for HMC appliances. Adjust theaddress
andparents
parameters according to your environment.Define a hostgroup in your Nagios configuration for all HMC appliances. In this example it is named
hmc
. The above checks are run against each member of the hostgroup:define hostgroup{ hostgroup_name hmc alias Hardware Management Console members hmc1 }
Run a configuration check and if successful reload the Nagios process:
$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg $ /etc/init.d/nagios3 reload
The new hosts and services should soon show up in the Nagios web interface.
Optional: If you're running PNP4Nagios v0.6 or later to graph Nagios performance data, you can use the
check_hmc_disk.php
,check_hmc_mem.php
,check_hmc_proc.php
andcheck_hmc_swap.php
PNP4Nagios template to beautify the graphs. Download the PNP4Nagios templates check_hmc_disk.php, check_hmc_mem.php, check_hmc_proc.php and check_hmc_swap.php and place them in the PNP4Nagios template directory, in this example/usr/share/pnp4nagios/html/templates/
:$ mv -i check_hmc_disk.php check_hmc_mem.php check_hmc_proc.php check_hmc_swap.php \ /usr/share/pnp4nagios/html/templates/ $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_disk.php $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_mem.php $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_proc.php $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_swap.php
The following image shows an example of what the PNP4Nagios graphs look like for a IBM HMC appliance:
All done, you should now have a complete Nagios-based monitoring solution for your IBM HMC appliances and IBM Power Systems servers.