====== Nagios Monitoring - EMC Centera ======
Some time ago i wrote a -- rather crude -- Nagios plugin to monitor EMC Centera [[wp>Object_storage|object storage]] or [[wp>Content-addressable_storage|CAS]] systems. The plugin was targeted at the Gen3 hardware nodes we used back then. Since the initial implementation of the plugin we upgraded our Centera systems to Gen4 hardware nodes. The plugin still works with Gen4 hardware nodes, but unfortunately the newer nodes do not seem to provide certain information like CPU and case temperature values anymore. At the time of implementation there was also a software issue with the CentraStar OS which caused the CLI tools to hang on ending the session ([[https://community.emc.com/message/378696#378696|Centera CLI hangs when ends the session with the "quit" command]]). So instead of using an existing Nagios plugin for the EMC Centera like [[https://www.netways.org/repositories/entry/plugins/plugins/check_emc/check_emc_centera.pl|check_emc_centera.pl]] i decided to gather the necessary information via the Centera API with a small application written in C, using function calls from the [[https://community.emc.com/community/edn/centera|Centera SDK]]. After the initial attempts it turned out, that gathering the information on the Centera can be a quite time consuming process. In order to adress this issue and keep the Nagios system from waiting a long time for the results i decided on an asynchronous setup, where:
- a recurring cron job for each Centera system calls the wrapper script ''get_centera_info.sh'', which uses the binary ''/opt/Centera/bin/GetInfo'' to gather the necessary information via the Centera API and writes the results into a file. <
- the Nagios plugin checks the file to be up-to-date or issues a warning if it isn't. <
- the Nagios plugin reads and evaluates the different sections of the result file according to the defined Nagios checks. <
In order to run the wrapper script ''get_centera_info.sh'' and ''GetInfo'' binary, you need to have the [[https://community.emc.com/community/edn/centera|Centera SDK]] installed on the Nagios system. A network connection from the Nagios system to the Centera system on port TCP/3218 must also be allowed.
Since the Nagios server in my setup runs on Debian/PPC on a IBM Power LPAR and there is no native Linux/PPC version of the Centera SDK, i had to run those tools through the [[wp>PowerVM_Lx86|IBM PowerVM LX86]] emulator. If the plugin is run in a x86 environment, the variables ''POWERVM_PATH'' and ''RUNX86'' have to be set to an empty value.
The whole setup looks like this:
- Verify the user ''anonymous'' on the Centera has the ''Monitor'' role, which is the default. Login to the Centera CLI and issue the following commands:
Config$ show profile detail anonymous
Centera Profile Detail Report
------------------------------------------------------
Profile Name: anonymous
Profile Enabled: yes
[...]
Cluster Management Roles:
Accesscontrol Role: off
Audit Role: off
Compliance Role: off
Configuration Role: off
Migration Role: off
Monitor Role: on
Replication Role: off
<
- **Optional**: Enable SNMP traps to be sent to the Nagios system from the Centera. This requires SNMPD and SNMPTT to be already setup on the Nagios system. Login to the Centera CLI and issue the following commands:
Config$ set snmp
Enable SNMP (yes, no) [yes]:
Management station [:162]:
Community name []:
Heartbeat trap interval [30 minutes]:
Issue the command?
(yes, no) [no]: yes
Config$ show snmp
SNMP enabled: Enabled
Management Station: :162
Community name:
Heartbeat trap interval: 30 minutes
Verify the port **UDP/162** on the Nagios system can be reached from the Centera system. <
- Install the [[https://community.emc.com/community/edn/centera|Centera SDK]] -- only the subdirectories ''lib'' and ''include'' are actually needed -- on the Nagios system, in this example in ''/opt/Centera''. Verify the port **TCP/3218** on the Centera system can be reached from the Nagios system. <
- **Optional**: Download the {{:2013:11:21:getinfo.c|GetInfo source}}, place it in the source directory of the Centera SDK and compile the ''GetInfo'' binary against the Centera API:
$ mkdir /opt/Centera/src
$ mv -i getinfo.c /opt/Centera/src/GetInfo.c
$ cd /opt/Centera/src/
$ gcc -Wall -DPOSIX -I /opt/Centera/include -Wl,-rpath /opt/Centera/lib \
-L/opt/Centera/lib GetInfo.c -lFPLibrary32 -lFPXML32 -lFPStreams32 \
-lFPUtils32 -lFPParser32 -lFPCore32 -o GetInfo
$ mkdir /opt/Centera/bin
$ mv -i GetInfo /opt/Centera/bin/
$ chmod 755 /opt/Centera/bin/GetInfo
<
- Download the {{:2013:11:21:getinfo.bz2|GetInfo binary}} and place it in the binary directory of the Centera SDK:
$ mkdir /opt/Centera/bin
$ bzip2 -d getinfo.bz2
$ mv -i GetInfo /opt/Centera/bin/
$ chmod 755 /opt/Centera/bin/GetInfo
<
- Download the {{:2013:11:21:check_centera.pl|Nagios plugin check_centera.pl}} and the cron job {{:2013:11:21:get_centera_info.sh|wrapper script get_centera_info.sh}} and place them in the plugins directory of your Nagios system, in this example ''/usr/lib/nagios/plugins/'':
$ mv -i check_centera.pl get_centera_info.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_centera.pl
$ chmod 755 /usr/lib/nagios/plugins/get_centera_info.sh
<
- Adjust the plugin settings according to your environment. Edit the following variable assignments in ''get_centera_info.sh'':
POWERVM_PATH=/srv/powervm-lx86/i386
RUNX86=/usr/local/bin/runx86
<
- Perform a manual test run of the wrapper script ''get_centera_info.sh'' against the primary and the secondary IP adress of the Centera system. The IP adresses are in this example ''10.0.0.1'' and ''10.0.0.2'' and the hostname of the Centera system is in this example ''centera1'':
$ /usr/lib/nagios/plugins/get_centera_info.sh -p 10.0.0.1 -s 10.0.0.2 -f /tmp/centera1.xml
$ ls -al /tmp/centera1.xml
-rw-r--r-- 1 root root 478665 Nov 21 21:21 /tmp/centera1.xml
$ less /tmp/centera1.xml
[...]
If the results file was sucessfully written, define a cron job to periodically update the necessary information from each Centera system:
0,10,20,30,40,50 * * * * /usr/lib/nagios/plugins/get_centera_info.sh -p 10.0.0.1 -s 10.0.0.2 -f /tmp/centera1.xml
<
- Define the following Nagios commands. In this example this is done in the file ''/etc/nagios-plugins/config/check_centera.cfg'':
# check Centera case temperature
define command {
command_name check_centera_casetemp
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C caseTemp -w $ARG1$ -c $ARG2$
}
# check Centera CPU temperature
define command {
command_name check_centera_cputemp
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C cpuTemp -w $ARG1$ -c $ARG2$
}
# check Centera storage capacity
define command {
command_name check_centera_capacity
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C capacity -w $ARG1$ -c $ARG2$
}
# check Centera drive status
define command {
command_name check_centera_drive
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C drive
}
# check Centera node status
define command {
command_name check_centera_node
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C node
}
# check Centera object capacity
define command {
command_name check_centera_objects
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C objects -w $ARG1$ -c $ARG2$
}
# check Centera power status
define command {
command_name check_centera_power
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C power
}
# check Centera stealthCorruption
define command {
command_name check_centera_sc
command_line $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C stealthCorruption -w $ARG1$ -c $ARG2$
<
- Define a group of services in your Nagios configuration to be checked for each Centera system:
# check case temperature
define service {
use generic-service-pnp
hostgroup_name centera
service_description Check_case_temperature
check_command check_centera_casetemp!35!40
}
# check CPU temperature
define service {
use generic-service-pnp
hostgroup_name centera
service_description Check_CPU_temperature
check_command check_centera_cputemp!50!55
}
# check storage capacity
define service {
use generic-service-pnp
hostgroup_name centera
service_description Check_storage_capacity
check_command check_centera_capacity!100G!200G
}
# check drive status
define service {
use generic-service
hostgroup_name centera
service_description Check_drive_status
check_command check_centera_drive
}
# check node status
define service {
use generic-service
hostgroup_name centera
service_description Check_node_status
check_command check_centera_node
}
# check object capacity
define service {
use generic-service-pnp
hostgroup_name centera
service_description Check_object_capacity
check_command check_centera_objects!10%!5%
}
# check power status
define service {
use generic-service
hostgroup_name centera
service_description Check_power_status
check_command check_centera_power
}
# check stealthCorruption
define service {
use generic-service
hostgroup_name centera
service_description Check_stealth_corruptions
check_command check_centera_sc!50!100
}
Replace ''generic-service'' with your Nagios service template. Replace ''generic-service-pnp'' with your Nagios service template that has performance data processing enabled. <
- Define hosts in your Nagios configuration for each IP adress of the Centera system. In this example they are named ''centera1-pri'' and ''centera1-sec'':
define host {
use disk
host_name centera1-pri
alias Centera 1 (Primary)
address 10.0.0.1
parents parent_lan
}
define host {
use disk
host_name centera1-sec
alias Centera 1 (Secondary)
address 10.0.0.2
parents parent_lan
}
Replace ''disk'' with your Nagios host template for storage devices. Adjust the ''address'' and ''parents'' parameters according to your environment.
Define a host in your Nagios configuration for each Centera system. In this example it is named ''centera1'':
define host {
use disk
host_name centera1
alias Centera 1
address 10.0.0.1
parents centera1-pri, centera1-sec
}
Replace ''disk'' with your Nagios host template for storage devices. Adjust the ''address'' parameter to the IP of the previously defined host ''centera1-pri'' and set the two previously defined hosts ''centera1-pri'' and ''centera1-sec'' as parents.
The three host configuration is necessary, since the services are checked against the main system ''centera1'', but the main system can be reached via both primary and secondary IP adress. Also, the optional SNMP traps originate either from the primary or from the secondary adress for which there must be a host entity to check the SNMP traps against. <
- Define a hostgroup in your Nagios configuration for all Centera systems. In this example it is named ''centera1''. The above checks are run against each member of the hostgroup:
define hostgroup {
hostgroup_name centera
alias EMC Centera Devices
members centera1
}
Define a second hostgroup in your Nagios configuration for the primary and secondary IP adresses of all Centera systems. In this example they are named ''centera1-pri'' and ''centera1-sec''.
define hostgroup {
hostgroup_name centera-node
alias EMC Centera Devices
members centera1-pri, centera1-sec
}
<
- Run a configuration check and if successful reload the Nagios process:
$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload
<
The new hosts and services should soon show up in the Nagios web interface.
If the optional step number 2 in the above list was done, SNMPTT also needs to be configured to be able to understand the incoming SNMP traps from Centera systems. This can be achieved by the following steps:
- Convert the EMC Centera SNMP MIB definitions in ''emc-centera.mib'' into a format that SNMPTT can understand.
$ /opt/snmptt/snmpttconvertmib --in=MIB/emc-centera.mib --out=/opt/snmptt/conf/snmptt.conf.emc-centera
...
Done
Total translations: 2
Successful translations: 2
Failed translations: 0
<
- Edit the trap severity according to your requirements, e.g.:
$ vim /opt/snmptt/conf/snmptt.conf.emc-centera
...
EVENT trapNotification .1.3.6.1.4.1.1139.5.0.1 "Status Events" Normal
...
EVENT trapAlarmNotification .1.3.6.1.4.1.1139.5.0.2 "Status Events" Critical
...
<
- Add the new configuration file to be included in the global SNMPTT configuration and restart the SNMPTT daemon:
$ vim /opt/snmptt/snmptt.ini
...
[TrapFiles]
snmptt_conf_files = < <
- Download the {{:2012:10:02:check_snmp_traps.sh|Nagios plugin check_snmp_traps.sh}} and the {{:2013:11:21:check_snmp_traps_heartbeat.sh|Nagios plugin check_snmp_traps_heartbeat.sh}} and place them in the plugins directory of your Nagios system, in this example ''/usr/lib/nagios/plugins/'':
$ mv -i check_snmp_traps.sh check_snmp_traps_heartbeat.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps.sh
$ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps_heartbeat.sh
<
- Define the following Nagios command to check for SNMP traps in the SNMPTT database. In this example this is done in the file ''/etc/nagios-plugins/config/check_snmp_traps.cfg'':
# check for snmp traps
define command {
command_name check_snmp_traps
command_line $USER1$/check_snmp_traps.sh -H $HOSTNAME$:$HOSTADDRESS$ -u -p -d
}
# check for EMC Centera heartbeat snmp traps
define command {
command_name check_snmp_traps_heartbeat
command_line $USER1$/check_snmp_traps_heartbeat.sh -H $HOSTNAME$ -S $ARG1$ -u -p -d
}
Replace ''user'', ''pass'' and ''snmptt_db'' with values suitable for your SNMPTT database environment. <
- Add another service in your Nagios configuration to be checked for each Centera system:
# check heartbeat snmptraps
define service {
use generic-service
hostgroup_name centera
service_description Check_SNMP_heartbeat_traps
check_command check_snmp_traps_heartbeat!3630
}
<
- Add another service in your Nagios configuration to be checked for each IP adress of the Centera system:
# check snmptraps
define service {
use generic-service
hostgroup_name centera-node
service_description Check_SNMP_traps
check_command check_snmp_traps
}
<
- **Optional**: Define a serviceextinfo to display a folder icon next to the ''Check_SNMP_traps'' service check for each Centera system. This icon provides a direct link to the SNMPTT web interface with a filter for the selected host:
define serviceextinfo {
hostgroup_name centera-node
service_description Check_SNMP_traps
notes SNMP Alerts
#notes_url http:///nagios3/nagtrap/index.php?hostname=$HOSTNAME$
#notes_url http:///nagios3/nsti/index.php?perpage=100&hostname=$HOSTNAME$
}
Uncomment the ''notes_url'' depending on which web interface (nagtrap or nsti) is used. Replace ''hostname'' with the FQDN or IP address of the server running the web interface. <
- Run a configuration check and if successful reload the Nagios process:
$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload
<
- **Optional**: If you're running PNP4Nagios v0.6 or later to graph Nagios performance data, you can use the ''check_centera_capacity.php'', ''check_centera_casetemp.php'', ''check_centera_cputemp.php'' and ''check_centera_objects.php'' PNP4Nagios template to beautify the graphs. Download the PNP4Nagios templates {{:2013:11:21:check_centera_capacity.php|check_centera_capacity.php}}, {{:2013:11:21:check_centera_casetemp.php|check_centera_casetemp.php}}, {{:2013:11:21:check_centera_cputemp.php|check_centera_cputemp.php}} and {{:2013:11:21:check_centera_objects.php|check_centera_objects.php}} and place them in the PNP4Nagios template directory, in this example ''/usr/share/pnp4nagios/html/templates/'':
$ mv -i check_centera_capacity.php check_centera_casetemp.php check_centera_cputemp.php \
check_centera_objects.php /usr/share/pnp4nagios/html/templates/
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_capacity.php
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_casetemp.php
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_cputemp.php
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_objects.php
The following image shows an example of what the PNP4Nagios graphs look like for a EMC Centera system:
{{:2013:11:21:check_centera_capacity.png?600x|PNP4Nagios graphs for the free space in a EMC Centera}}
{{:2013:11:21:check_centera_cputemp.png?600x|PNP4Nagios graphs for the CPU temperature in a EMC Centera}}
{{:2013:11:21:check_centera_objects.png?600x|PNP4Nagios graphs for the number of free objects in a EMC Centera}} <
All done, you should now have a complete Nagios-based monitoring solution for your EMC Centera system.
A note on the side: The result file(s) and their XML formatted content are definitely worth to be looked at. Besides the rudimentary information extracted for the Nagios health checks described above, there is an abundance of other promising information in there.