bityard Blog

// Nagios Monitoring - IBM Power Systems and HMC

We use several IBM Power Systems servers in our datacenters to run mission critical application and database services. Some time ago i wrote a – rather crude – Nagios plugin to monitor the IBM Power Systems hardware via the Hardware Management Console (HMC) appliance, as well as the HMC itself. In order to run the Nagios plugin, you need to have “Remote Command Execution” via SSH activated on the HMC appliances. Also, a network connection from the Nagios system to the HMC appliances on port TCP/22 must be allowed.

The whole setup for monitoring IBM Power Systems servers and HMC appliances looks like this:

  1. Enable remote command execution on the HMC. Login to the HMC WebGUI and navigate to:

    -> HMC Management
       -> Administration
          -> Remote Command Execution
             -> Check selection: "Enable remote command execution using the ssh facility"
             -> OK

    Verify the port TCP/22 on the HMC appliance can be reached from the Nagios system.

  2. Create a SSH private/public key pair on the Nagios system to authenticate the monitoring user against the HMC appliance:

    $ mkdir -p /etc/nagios3/.ssh/
    $ chown -R nagios:nagios /etc/nagios3/.ssh/
    $ chmod 700 /etc/nagios3/.ssh/
    $ cd /etc/nagios3/.ssh/
    $ ssh-keygen -t rsa -b 2048 -f id_rsa_hmc
    Generating public/private rsa key pair.
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in id_rsa_hmc.
    Your public key has been saved in id_rsa_hmc.pub.
    The key fingerprint is:
    xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx user@host
    
    $ chown -R nagios:nagios /etc/nagios3/.ssh/id_rsa_hmc*
  3. Prepare a SSH authorized keys file and transfer it to the HMC.

    Optional Create a separate user on the HMC for monitoring purposes. The new user needs to have permission to run the HMC commands lsled, lssvcevents, lssyscfg and monhmc. If a separate user should be used, replace hscroot in the following text with the name of the new user.

    Optional If there is already a SSH authorized keys file set up on the HMC, be sure to retrieve it first so it doesn't get overwritten:

    $ cd /tmp
    $ scp -r hscroot@hmc:~/.ssh /tmp/
    Password:
    authorized_keys2                            100%  600     0.6KB/s   00:00

    Insert the SSH public key created above into a SSH authorized keys file and push it back to the HMC:

    $ echo -n 'from="<IP of your Nagios system>",no-agent-forwarding,no-port-forwarding,no-X11-forwarding ' \
        >> /tmp/.ssh/authorized_keys2
    $ cat /etc/nagios3/.ssh/id_rsa_hmc.pub >> /tmp/.ssh/authorized_keys2
    $ scp -r /tmp/.ssh hscroot@hmc:~/
    Password:
    authorized_keys2                            100% 1079     1.1KB/s   00:00
    
    $ rm -r /tmp/.ssh

    Add the SSH host public key from the HMC appliance:

    HMC$ cat /etc/ssh/ssh_host_rsa_key.pub
    ssh-rsa AAAAB3N...[cut]...jxFAz+4O1X root@hmc

    to the SSH ssh_known_hosts file of the Nagios system:

    $ vi /etc/ssh/ssh_known_hosts
    hmc,FQDN,<IP of your HMC> ssh-rsa AAAAB3N...[cut]...jxFAz+4O1X

    Verify the login on the HMC appliance is possible from the Nagios system with the SSH public key created above:

    $ ssh -i /etc/nagios3/.ssh/id_rsa_hmc hscroot@hmc 'lshmc -V'
    "version= Version: 7
     Release: 7.7.0
     Service Pack: 2
    HMC Build level 20130730.1
    MH01373: Fix for HMC V7R7.7.0 SP2 (08-06-2013)
    ","base_version=V7R7.7.0"
  4. Download the Nagios plugin check_hmc.sh and place it in the plugins directory of your Nagios system, in this example /usr/lib/nagios/plugins/:

    $ mv -i check_hmc.sh /usr/lib/nagios/plugins/
    $ chmod 755 /usr/lib/nagios/plugins/check_hmc.sh
  5. Define the following Nagios commands. In this example this is done in the file /etc/nagios-plugins/config/check_hmc.cfg:

    # check HMC disk
    define command {
        command_name    check_hmc_disk
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C disk -w $ARG1$ -c $ARG2$ -a $ARG3$
    }
    # check HMC processor
    define command {
        command_name    check_hmc_proc
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C proc -w $ARG1$ -c $ARG2$
    }
    # check HMC memory
    define command {
        command_name    check_hmc_mem
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C mem -w $ARG1$ -c $ARG2$
    }
    # check HMC swap
    define command {
        command_name    check_hmc_swap
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C swap -w $ARG1$ -c $ARG2$
    }
    # check HMC rmc processes
    define command {
        command_name    check_hmc_rmc
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C rmc -w $ARG1$ -c $ARG2$ -a $ARG3$
    }
    # check HMC System Attention LED for physical system
    define command {
        command_name    check_hmc_ledphys
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C ledphys
    }
    # check HMC System Attention LED for virtual partition
    define command {
        command_name    check_hmc_ledvlpar
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C ledvlpar
    }
    # check HMC System Attention LED for virtual system
    define command {
        command_name    check_hmc_ledvsys
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C ledvsys
    }
    # check HMC serviceable events
    define command {
        command_name    check_hmc_events
        command_line    $USER1$/check_hmc.sh -H $HOSTADDRESS$ -C events
    }

  6. Define a group of services in your Nagios configuration to be checked for each HMC appliance:

    # check sshd
    define service {
        use                     generic-service
        hostgroup_name          hmc
        service_description     Check_SSH
        check_command           check_ssh
    }
    # check_hmc_mem
    define service {
        use                     generic-service-pnp
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_memory
        check_command           check_hmc_mem!20!10
    }
    # check_hmc_proc
    define service {
        use                     generic-service-pnp
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_CPU_statistics
        check_command           check_hmc_proc!10!5
    }
    # check_hmc_disk
    define service {
        use                     generic-service-pnp
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_space_on_/
        check_command           check_hmc_disk!10%!5%!'/$'
    }
    define service {
        use                     generic-service-pnp
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_space_on_/dump
        check_command           check_hmc_disk!10%!5%!/dump
    }
    define service {
        use                     generic-service-pnp
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_space_on_/extra
        check_command           check_hmc_disk!10%!5%!/extra
    }
    define service {
        use                     generic-service-pnp
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_space_on_/var
        check_command           check_hmc_disk!10%!5%!/var
    }
    # check_hmc_swap
    define service {
        use                     generic-service-pnp
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_space_on_swap
        check_command           check_hmc_swap!200!100
    }
    # check_hmc_rmc: IBM.CSMAgentRMd
    define service {
        use                     generic-service
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_process_IBM.CSMAgentRMd
        check_command           check_hmc_rmc!1!1!IBM.CSMAgentRMd
    }
    # check_hmc_rmc: IBM.LparCmdRMd
    define service {
        use                     generic-service
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_process_IBM.LparCmdRMd
        check_command           check_hmc_rmc!1!1!IBM.LparCmdRMd
    }
    # check_hmc_rmc: IBM.ServiceRMd
    define service {
        use                     generic-service
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_process_IBM.ServiceRMd
        check_command           check_hmc_rmc!1!1!IBM.ServiceRMd
    }
    # check_hmc_ledphys
    define service {
        use                     generic-service
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_SA_LED_physical
        check_command           check_hmc_ledphys
    }
    # check_hmc_ledvlpar
    define service {
        use                     generic-service
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_SA_LED_vpartition
        check_command           check_hmc_ledvlpar
    }
    # check_hmc_ledvsys
    define service {
        use                     generic-service
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_SA_LED_vsystem
        check_command           check_hmc_ledvsys
    }
    # check_hmc_events
    define service {
        use                     generic-service
        servicegroups           sshchecks
        hostgroup_name          hmc
        service_description     Check_serviceable_events
        check_command           check_hmc_events
    }

    Replace generic-service with your Nagios service template. Replace generic-service-pnp with your Nagios service template that has performance data processing enabled.

  7. Define a service dependency to run the above checks only if the Check_SSH was run successfully:

    # HMC SSH dependencies
    define servicedependency {
        hostgroup_name                  hmc
        service_description             Check_SSH
        dependent_service_description   Check_process_.*
        execution_failure_criteria      c,p,u,w
        notification_failure_criteria   c,p,u,w
    }
    define servicedependency {
        hostgroup_name                  hmc
        service_description             Check_SSH
        dependent_service_description   Check_space_on_.*
        execution_failure_criteria      c,p,u,w
        notification_failure_criteria   c,p,u,w
    }
    define servicedependency {
        hostgroup_name                  hmc
        service_description             Check_SSH
        dependent_service_description   Check_memory
        execution_failure_criteria      c,p,u,w
        notification_failure_criteria   c,p,u,w
    }
    define servicedependency {
        hostgroup_name                  hmc
        service_description             Check_SSH
        dependent_service_description   Check_CPU_statistics
        execution_failure_criteria      c,p,u,w
        notification_failure_criteria   c,p,u,w
    }
    define servicedependency {
        hostgroup_name                  hmc
        service_description             Check_SSH
        dependent_service_description   Check_SA_LED_.*
        execution_failure_criteria      c,p,u,w
        notification_failure_criteria   c,p,u,w
    }
    define servicedependency {
        hostgroup_name                  hmc
        service_description             Check_SSH
        dependent_service_description   Check_serviceable_events
        execution_failure_criteria      c,p,u,w
        notification_failure_criteria   c,p,u,w
    }
  8. Define a host in your Nagios configuration for each HMC appliance. In this example its named hmc1:

    define host {
        use         hmc
        host_name   hmc1
        alias       HMC 1
        address     10.0.0.1
        parents     parent_lan
    }

    Replace hmc with your Nagios host template for HMC appliances. Adjust the address and parents parameters according to your environment.

  9. Define a hostgroup in your Nagios configuration for all HMC appliances. In this example it is named hmc. The above checks are run against each member of the hostgroup:

    define hostgroup{
        hostgroup_name  hmc
        alias           Hardware Management Console
        members         hmc1
    }
  10. Run a configuration check and if successful reload the Nagios process:

    $ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
    $ /etc/init.d/nagios3 reload

    The new hosts and services should soon show up in the Nagios web interface.

  11. Optional: If you're running PNP4Nagios v0.6 or later to graph Nagios performance data, you can use the check_hmc_disk.php, check_hmc_mem.php, check_hmc_proc.php and check_hmc_swap.php PNP4Nagios template to beautify the graphs. Download the PNP4Nagios templates check_hmc_disk.php, check_hmc_mem.php, check_hmc_proc.php and check_hmc_swap.php and place them in the PNP4Nagios template directory, in this example /usr/share/pnp4nagios/html/templates/:

    $ mv -i check_hmc_disk.php check_hmc_mem.php check_hmc_proc.php check_hmc_swap.php \
        /usr/share/pnp4nagios/html/templates/
    $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_disk.php
    $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_mem.php
    $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_proc.php
    $ chmod 644 /usr/share/pnp4nagios/html/templates/check_hmc_swap.php

    The following image shows an example of what the PNP4Nagios graphs look like for a IBM HMC appliance:

    PNP4Nagios graphs for the filesystems in a IBM HMC appliance

    PNP4Nagios graphs for the memory usage of a IBM HMC appliance

    PNP4Nagios graphs for the CPU statistics of a IBM HMC appliance

    PNP4Nagios graphs for the swap usage of a IBM HMC appliance

All done, you should now have a complete Nagios-based monitoring solution for your IBM HMC appliances and IBM Power Systems servers.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website. More information about cookies