Nagios Monitoring - EMC Centera

Some time ago i wrote a – rather crude – Nagios plugin to monitor EMC Centera object storage or CAS systems. The plugin was targeted at the Gen3 hardware nodes we used back then. Since the initial implementation of the plugin we upgraded our Centera systems to Gen4 hardware nodes. The plugin still works with Gen4 hardware nodes, but unfortunately the newer nodes do not seem to provide certain information like CPU and case temperature values anymore. At the time of implementation there was also a software issue with the CentraStar OS which caused the CLI tools to hang on ending the session (Centera CLI hangs when ends the session with the "quit" command). So instead of using an existing Nagios plugin for the EMC Centera like check_emc_centera.pl i decided to gather the necessary information via the Centera API with a small application written in C, using function calls from the Centera SDK. After the initial attempts it turned out, that gathering the information on the Centera can be a quite time consuming process. In order to adress this issue and keep the Nagios system from waiting a long time for the results i decided on an asynchronous setup, where:

a recurring cron job for each Centera system calls the wrapper script get_centera_info.sh, which uses the binary /opt/Centera/bin/GetInfo to gather the necessary information via the Centera API and writes the results into a file.
the Nagios plugin checks the file to be up-to-date or issues a warning if it isn't.
the Nagios plugin reads and evaluates the different sections of the result file according to the defined Nagios checks.

In order to run the wrapper script get_centera_info.sh and GetInfo binary, you need to have the Centera SDK installed on the Nagios system. A network connection from the Nagios system to the Centera system on port TCP/3218 must also be allowed.

Since the Nagios server in my setup runs on Debian/PPC on a IBM Power LPAR and there is no native Linux/PPC version of the Centera SDK, i had to run those tools through the IBM PowerVM LX86 emulator. If the plugin is run in a x86 environment, the variables POWERVM_PATH and RUNX86 have to be set to an empty value.

The whole setup looks like this:

Verify the user anonymous on the Centera has the Monitor role, which is the default. Login to the Centera CLI and issue the following commands:

Config$ show profile detail anonymous
Centera Profile Detail Report 
------------------------------------------------------
Profile Name:                            anonymous
Profile Enabled:                         yes
[...]

Cluster Management Roles:                
    Accesscontrol Role:                  off
    Audit Role:                          off
    Compliance Role:                     off
    Configuration Role:                  off
    Migration Role:                      off
    Monitor Role:                        on
    Replication Role:                    off

Optional: Enable SNMP traps to be sent to the Nagios system from the Centera. This requires SNMPD and SNMPTT to be already setup on the Nagios system. Login to the Centera CLI and issue the following commands:

Config$ set snmp
Enable SNMP (yes, no) [yes]: 
Management station [<IP of the Nagios system>:162]: 
Community name [<SNMPDs community string>]: 
Heartbeat trap interval [30 minutes]: 
Issue the command?
 (yes, no) [no]: yes

Config$ show snmp
SNMP enabled:            Enabled
Management Station:      <IP of the Nagios system>:162
Community name:          <SNMPDs community string>
Heartbeat trap interval: 30 minutes

Verify the port UDP/162 on the Nagios system can be reached from the Centera system.

Install the Centera SDK – only the subdirectories lib and include are actually needed – on the Nagios system, in this example in /opt/Centera. Verify the port TCP/3218 on the Centera system can be reached from the Nagios system.

Optional: Download the GetInfo source, place it in the source directory of the Centera SDK and compile the GetInfo binary against the Centera API:

$ mkdir /opt/Centera/src
$ mv -i getinfo.c /opt/Centera/src/GetInfo.c
$ cd /opt/Centera/src/
$ gcc -Wall -DPOSIX -I /opt/Centera/include -Wl,-rpath /opt/Centera/lib \
    -L/opt/Centera/lib GetInfo.c -lFPLibrary32 -lFPXML32 -lFPStreams32 \
    -lFPUtils32 -lFPParser32 -lFPCore32 -o GetInfo
$ mkdir /opt/Centera/bin
$ mv -i GetInfo /opt/Centera/bin/
$ chmod 755 /opt/Centera/bin/GetInfo

Download the GetInfo binary and place it in the binary directory of the Centera SDK:

$ mkdir /opt/Centera/bin
$ bzip2 -d getinfo.bz2
$ mv -i GetInfo /opt/Centera/bin/
$ chmod 755 /opt/Centera/bin/GetInfo

Download the Nagios plugin check_centera.pl and the cron job wrapper script get_centera_info.sh and place them in the plugins directory of your Nagios system, in this example /usr/lib/nagios/plugins/:
```
$ mv -i check_centera.pl get_centera_info.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_centera.pl
$ chmod 755 /usr/lib/nagios/plugins/get_centera_info.sh
```
Adjust the plugin settings according to your environment. Edit the following variable assignments in get_centera_info.sh:
```
POWERVM_PATH=/srv/powervm-lx86/i386
RUNX86=/usr/local/bin/runx86
```

Perform a manual test run of the wrapper script get_centera_info.sh against the primary and the secondary IP adress of the Centera system. The IP adresses are in this example 10.0.0.1 and 10.0.0.2 and the hostname of the Centera system is in this example centera1:

$ /usr/lib/nagios/plugins/get_centera_info.sh -p 10.0.0.1 -s 10.0.0.2 -f /tmp/centera1.xml
$ ls -al /tmp/centera1.xml
-rw-r--r-- 1 root root 478665 Nov 21 21:21 /tmp/centera1.xml
$ less /tmp/centera1.xml
<?xml version="1.0"?>

<health>
  <reportIdentification
    type="health"
    formatVersion="1.7.1"
    formatDTD="health-1.7.1.dtd"
    sequenceNumber="2994"
    reportInterval="86400000"
    creationDateTime="1385025078472"/>
  <cluster>

[...]

  </cluster>
</health>

If the results file was sucessfully written, define a cron job to periodically update the necessary information from each Centera system:

0,10,20,30,40,50 * * * * /usr/lib/nagios/plugins/get_centera_info.sh -p 10.0.0.1 -s 10.0.0.2 -f /tmp/centera1.xml

Define the following Nagios commands. In this example this is done in the file /etc/nagios-plugins/config/check_centera.cfg:

# check Centera case temperature
define command {
    command_name    check_centera_casetemp
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C caseTemp -w $ARG1$ -c $ARG2$
}
# check Centera CPU temperature
define command {
    command_name    check_centera_cputemp
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C cpuTemp -w $ARG1$ -c $ARG2$
}
# check Centera storage capacity
define command {
    command_name    check_centera_capacity
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C capacity -w $ARG1$ -c $ARG2$
}
# check Centera drive status
define command {
    command_name    check_centera_drive
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C drive
}
# check Centera node status
define command {
    command_name    check_centera_node
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C node
}
# check Centera object capacity
define command {
    command_name    check_centera_objects
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C objects -w $ARG1$ -c $ARG2$
}
# check Centera power status
define command {
    command_name    check_centera_power
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C power
}
# check Centera stealthCorruption
define command {
    command_name    check_centera_sc
    command_line    $USER1$/check_centera.pl -f /tmp/$HOSTNAME$.xml -C stealthCorruption -w $ARG1$ -c $ARG2$

Define a group of services in your Nagios configuration to be checked for each Centera system:

# check case temperature
define service {
    use                     generic-service-pnp
    hostgroup_name          centera
    service_description     Check_case_temperature
    check_command           check_centera_casetemp!35!40
}
# check CPU temperature
define service {
    use                     generic-service-pnp
    hostgroup_name          centera
    service_description     Check_CPU_temperature
    check_command           check_centera_cputemp!50!55
}
# check storage capacity
define service {
    use                     generic-service-pnp
    hostgroup_name          centera
    service_description     Check_storage_capacity
    check_command           check_centera_capacity!100G!200G
}
# check drive status
define service {
    use                     generic-service
    hostgroup_name          centera
    service_description     Check_drive_status
    check_command           check_centera_drive
}
# check node status
define service {
    use                     generic-service
    hostgroup_name          centera
    service_description     Check_node_status
    check_command           check_centera_node
}
# check object capacity
define service {
    use                     generic-service-pnp
    hostgroup_name          centera
    service_description     Check_object_capacity
    check_command           check_centera_objects!10%!5%
}
# check power status
define service {
    use                     generic-service
    hostgroup_name          centera
    service_description     Check_power_status
    check_command           check_centera_power
}
# check stealthCorruption
define service {
    use                     generic-service
    hostgroup_name          centera
    service_description     Check_stealth_corruptions
    check_command           check_centera_sc!50!100
}

Replace generic-service with your Nagios service template. Replace generic-service-pnp with your Nagios service template that has performance data processing enabled.

Define hosts in your Nagios configuration for each IP adress of the Centera system. In this example they are named centera1-pri and centera1-sec:
```
define host {
    use         disk
    host_name   centera1-pri
    alias       Centera 1 (Primary)
    address     10.0.0.1
    parents     parent_lan
}
define host {
    use         disk
    host_name   centera1-sec
    alias       Centera 1 (Secondary)
    address     10.0.0.2
    parents     parent_lan
}
```
Replace disk with your Nagios host template for storage devices. Adjust the address and parents parameters according to your environment.

Define a host in your Nagios configuration for each Centera system. In this example it is named centera1:
```
define host {
    use         disk
    host_name   centera1
    alias       Centera 1
    address     10.0.0.1
    parents     centera1-pri, centera1-sec
}
```
Replace disk with your Nagios host template for storage devices. Adjust the address parameter to the IP of the previously defined host centera1-pri and set the two previously defined hosts centera1-pri and centera1-sec as parents.

The three host configuration is necessary, since the services are checked against the main system centera1, but the main system can be reached via both primary and secondary IP adress. Also, the optional SNMP traps originate either from the primary or from the secondary adress for which there must be a host entity to check the SNMP traps against.
Define a hostgroup in your Nagios configuration for all Centera systems. In this example it is named centera1. The above checks are run against each member of the hostgroup:
```
define hostgroup {
    hostgroup_name  centera
    alias           EMC Centera Devices
    members         centera1
}
```
Define a second hostgroup in your Nagios configuration for the primary and secondary IP adresses of all Centera systems. In this example they are named centera1-pri and centera1-sec.
```
define hostgroup {
    hostgroup_name  centera-node
    alias           EMC Centera Devices
    members         centera1-pri, centera1-sec
}
```

Run a configuration check and if successful reload the Nagios process:

$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload

The new hosts and services should soon show up in the Nagios web interface.

If the optional step number 2 in the above list was done, SNMPTT also needs to be configured to be able to understand the incoming SNMP traps from Centera systems. This can be achieved by the following steps:

Convert the EMC Centera SNMP MIB definitions in emc-centera.mib into a format that SNMPTT can understand.

$ /opt/snmptt/snmpttconvertmib --in=MIB/emc-centera.mib --out=/opt/snmptt/conf/snmptt.conf.emc-centera

...
Done

Total translations:        2
Successful translations:   2
Failed translations:       0

Edit the trap severity according to your requirements, e.g.:

$ vim /opt/snmptt/conf/snmptt.conf.emc-centera

...
EVENT trapNotification .1.3.6.1.4.1.1139.5.0.1 "Status Events" Normal
...
EVENT trapAlarmNotification .1.3.6.1.4.1.1139.5.0.2 "Status Events" Critical
...

Add the new configuration file to be included in the global SNMPTT configuration and restart the SNMPTT daemon:

$ vim /opt/snmptt/snmptt.ini

...
[TrapFiles]
snmptt_conf_files = <<END
...
/opt/snmptt/conf/snmptt.conf.emc-centera
...
END

$ /etc/init.d/snmptt reload

Download the Nagios plugin check_snmp_traps.sh and the Nagios plugin check_snmp_traps_heartbeat.sh and place them in the plugins directory of your Nagios system, in this example /usr/lib/nagios/plugins/:

$ mv -i check_snmp_traps.sh check_snmp_traps_heartbeat.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps.sh
$ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps_heartbeat.sh

Define the following Nagios command to check for SNMP traps in the SNMPTT database. In this example this is done in the file /etc/nagios-plugins/config/check_snmp_traps.cfg:

# check for snmp traps
define command {
    command_name    check_snmp_traps
    command_line    $USER1$/check_snmp_traps.sh -H $HOSTNAME$:$HOSTADDRESS$ -u <user> -p <pass> -d <snmptt_db>
}
# check for EMC Centera heartbeat snmp traps
define command {
    command_name    check_snmp_traps_heartbeat
    command_line    $USER1$/check_snmp_traps_heartbeat.sh -H $HOSTNAME$ -S $ARG1$ -u <user> -p <pass> -d <snmptt_db>
}

Replace user, pass and snmptt_db with values suitable for your SNMPTT database environment.

Add another service in your Nagios configuration to be checked for each Centera system:

# check heartbeat snmptraps
define service {
    use                     generic-service
    hostgroup_name          centera
    service_description     Check_SNMP_heartbeat_traps
    check_command           check_snmp_traps_heartbeat!3630
}

Add another service in your Nagios configuration to be checked for each IP adress of the Centera system:

# check snmptraps
define service {
    use                     generic-service
    hostgroup_name          centera-node
    service_description     Check_SNMP_traps
    check_command           check_snmp_traps
}

Optional: Define a serviceextinfo to display a folder icon next to the Check_SNMP_traps service check for each Centera system. This icon provides a direct link to the SNMPTT web interface with a filter for the selected host:
```
define  serviceextinfo {
    hostgroup_name          centera-node
    service_description     Check_SNMP_traps
    notes                   SNMP Alerts
    #notes_url               http://<hostname>/nagios3/nagtrap/index.php?hostname=$HOSTNAME$
    #notes_url               http://<hostname>/nagios3/nsti/index.php?perpage=100&hostname=$HOSTNAME$
}
```
Uncomment the notes_url depending on which web interface (nagtrap or nsti) is used. Replace hostname with the FQDN or IP address of the server running the web interface.

Run a configuration check and if successful reload the Nagios process:

$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload

Optional: If you're running PNP4Nagios v0.6 or later to graph Nagios performance data, you can use the check_centera_capacity.php, check_centera_casetemp.php, check_centera_cputemp.php and check_centera_objects.php PNP4Nagios template to beautify the graphs. Download the PNP4Nagios templates check_centera_capacity.php, check_centera_casetemp.php, check_centera_cputemp.php and check_centera_objects.php and place them in the PNP4Nagios template directory, in this example /usr/share/pnp4nagios/html/templates/:
```
$ mv -i check_centera_capacity.php check_centera_casetemp.php check_centera_cputemp.php \
    check_centera_objects.php /usr/share/pnp4nagios/html/templates/
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_capacity.php
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_casetemp.php
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_cputemp.php
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_centera_objects.php
```
The following image shows an example of what the PNP4Nagios graphs look like for a EMC Centera system:

All done, you should now have a complete Nagios-based monitoring solution for your EMC Centera system.

A note on the side: The result file(s) and their XML formatted content are definitely worth to be looked at. Besides the rudimentary information extracted for the Nagios health checks described above, there is an abundance of other promising information in there.

0 Comments | 2013-11-21 written by Frank Fegert (XING) | Permanentlink
Tags:

2012-10-02 // Nagios Monitoring - EMC Clariion

Some time ago i wrote a - rather crude - Nagios plugin to monitor EMC Clariion storage arrays, specifically the CX4-120 model. The plugin isn't very pretty, but it'll do in a pinch In order to run it, you need to have the command line tools navicli and naviseccli - which are provided by EMC - installed on the Nagios system and SNMP activated on the SPs of the array. A network connection from the Nagios system to the Clariion device on ports TCP/6389 and UDP/161 must be allowed.

Since the Nagios server in my setup runs on Debian/PPC on a IBM Power LPAR and there is no native Linux/PPC version of navicli or naviseccli, i had to run those tools through the IBM PowerVM LX86 emulator. If the plugin is run in a x86 environment, the variable RUNX86 has to be set to an empty value.

The whole setup looks like this:

Enable SNMP queries on each of the Clariion devices SP. Login to NaviSphere and for each Clariion device navigate to:

-> Storage tab
   -> Storage Domains
      -> Local Domain
         -> <Name of the Clariion device>
            -> SP A
               -> Properties
                  -> Network tab
                     -> Check box "Enable / Disable processing of SNMP MIB read requests"
                        -> Apply or OK
            -> SP B
               -> Properties
                  -> Network tab
                     -> Check box "Enable / Disable processing of SNMP MIB read requests"
                        -> Apply or OK

Verify the port UDP/161 on the Clariion device can be reached from the Nagios system.

Optional: Enable SNMP traps to be sent to the Nagios system on each of the Clariion devices SP. This requires SNMPD and SNMPTT to be already setup on the Nagios system. Login to NaviSphere and for each Clariion device navigate to:

-> Monitors tab
   -> Monitor
      -> Templates
         -> Create New Template Based On ...
            -> Call_Home_Template_6.28.5
               -> General tab
                  -> <Select the events you're interested in>
               -> SNMP tab
                  -> <Enter IP of the Nagios system and the SNMPDs community string>
               -> Apply or OK
      -> SP A
         -> <Name of the Clariion device>
            -> Monitor Using Template ...
               -> <Template name>
                  -> OK
      -> SP B
         -> <Name of the Clariion device>
            -> Monitor Using Template ...
               -> <Template name>
                  -> OK

Verify the port UDP/162 on the Nagios system can be reached from the Clariion devices.

Install navicli or naviseccli on the Nagios system, in this example /opt/Navisphere/bin/navicli and /opt/Navisphere/bin/naviseccli. Verify the port TCP/6389 on the Clariion device can be reached from the Nagios system.
Download the Nagios plugin check_cx.sh and place it in the plugins directory of your Nagios system, in this example /usr/lib/nagios/plugins/:
```
$ mv -i check_cx.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_cx.sh
```

Adjust the plugin settings according to your environment. Edit the following variable assignments:

NAVICLI=/opt/Navisphere/bin/navicli
NAVISECCLI=/opt/Navisphere/bin/naviseccli
NAVIUSER="adminuser"
NAVIPASS="adminpassword"
RUNX86=/usr/local/bin/runx86
SNMPGETNEXT_ARGS="-On -v 1 -c public -t 15"

Define the following Nagios commands. In this example this is done in the file /etc/nagios-plugins/config/check_cx.cfg:

# check CX4 status
define command {
        command_name    check_cx_status
        command_line    $USER1$/check_cx.sh -H $HOSTNAME$ -C snmp
}
define command {
        command_name    check_cx_cache
        command_line    $USER1$/check_cx.sh -H $HOSTNAME$ -C cache
}
define command {
        command_name    check_cx_disk
        command_line    $USER1$/check_cx.sh -H $HOSTNAME$ -C disk
}
define command {
        command_name    check_cx_faults
        command_line    $USER1$/check_cx.sh -H $HOSTNAME$ -C faults
}
define command {
        command_name    check_cx_sp
        command_line    $USER1$/check_cx.sh -H $HOSTNAME$ -C sp
}

Verify that a generic check command for a running SNMPD is already present in your Nagios configuration. If not add a new check command like this:

define command {
        command_name    check_snmpd
        command_line    $USER1$/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.1.3.0 -P 1 -C public -t 30
}

Define a group of services in your Nagios configuration to be checked for each Clariion device:

# check snmpd
define service {
    use                     generic-service
    hostgroup_name          cx4-ctrl
    service_description     Check_SNMPD
    check_command           check_snmpd
}
# check CX status
define service {
    use                     generic-service
    hostgroup_name          cx4-ctrl
    service_description     Check_CX_status
    check_command           check_cx_status
}
# check CX cache
define service {
    use                     generic-service
    hostgroup_name          cx4-ctrl
    service_description     Check_CX_cache
    check_command           check_cx_cache
}
# check CX disk
define service {
    use                     generic-service
    hostgroup_name          cx4-ctrl
    service_description     Check_CX_disk
    check_command           check_cx_disk
}
# check CX faults
define service {
    use                     generic-service
    hostgroup_name          cx4-ctrl
    service_description     Check_CX_faults
    check_command           check_cx_faults
}
# check CX sp
define service {
    use                     generic-service
    hostgroup_name          cx4-ctrl
    service_description     Check_CX_sp
    check_command           check_cx_sp
}

Replace generic-service with your Nagios service template.

Define a service dependency to run the check Check_CX_status only if the Check_SNMPD was run successfully:

# CX4 SNMPD dependencies
define servicedependency {
    hostgroup_name                  cx4-ctrl
    service_description             Check_SNMPD
    dependent_service_description   Check_CX_status
    execution_failure_criteria      c,p,u,w
    notification_failure_criteria   c,p,u,w
}

Define hosts in your Nagios configuration for each SP in the Clariion device. In this example they are named cx1-spa and cx1-spb:

define host {
    use         disk
    host_name   cx1-spa
    alias       CX4-120 Disk Storage
    address     10.0.0.1
    parents     parent_lan
}
define host {
    use         disk
    host_name   cx1-spb
    alias       CX4-120 Disk Storage
    address     10.0.0.2
    parents     parent_lan
}

Replace disk with your Nagios host template for storage devices. Adjust the address and parents parameters according to your environment.

Define a hostgroup in your Nagios configuration for all Clariion devices. In this example it is named cx4-ctrl. The above checks are run against each member of the hostgroup:
```
define hostgroup {
    hostgroup_name  cx4-ctrl
    alias           CX4 Disk Storages
    members         cx1-spa, cx1-spb
}
```

Run a configuration check and if successful reload the Nagios process:

$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload

The new hosts and services should soon show up in the Nagios web interface.

If the optional step number 2 in the above list was done, SNMPTT also needs to be configured to be able to understand the incoming SNMP traps from Clariion devices. This can be achieved by the following steps:

Convert the EMC Clariion SNMP MIB definitions in emc-clariion.mib into a format that SNMPTT can understand.

$ /opt/snmptt/snmpttconvertmib --in=MIB/emc-clariion.mib --out=/opt/snmptt/conf/snmptt.conf.emc-clariion

...
Done

Total translations:        5
Successful translations:   5
Failed translations:       0

Edit the trap severity according to your requirements, e.g.:

$ vim /opt/snmptt/conf/snmptt.conf.emc-clariion

...
EVENT EventMonitorTrapWarn .1.3.6.1.4.1.1981.0.4 "Status Events" Warning
...
EVENT EventMonitorTrapFault .1.3.6.1.4.1.1981.0.6 "Status Events" Critical
...

Optional: Apply the following patch to the configuration to reduce the number of false positives:

snmptt.conf.emc-clariion

diff -u snmptt.conf.emc-clariion_1 snmptt.conf.emc-clariion
--- snmptt.conf.emc-clariion.orig   2012-10-02 19:04:15.000000000 +0200
+++ snmptt.conf.emc-clariion        2009-07-21 10:28:44.000000000 +0200
@@ -54,8 +54,31 @@
 #
 #
 #
+EVENT EventMonitorTrapError .1.3.6.1.4.1.1981.0.5 "Status Events" Major
+FORMAT An Error EventMonitorTrap is generated in $*
+MATCH MODE=and
+MATCH $*: !(( Power [AB] : Faulted|Disk Array Enclosure .Bus [0-9] Enclosure [0-9]. is faulted))
+MATCH $X: !(0(2:5|3:0|3:1)[0-9]:[0-9][0-9])
+SDESC
+An Error EventMonitorTrap is generated in
+response to a user-specified event.
+Details can be found in Variables data.
+Variables:
+  1: hostName
+  2: deviceID
+  3: eventID
+  4: eventText
+  5: storageSystem
+EDESC
+#
+# Filter and ignore the following events
+#   02:50 - 03:15 Navisphere Power Supply Checks
+#
 EVENT EventMonitorTrapError .1.3.6.1.4.1.1981.0.5 "Status Events" Normal
 FORMAT An Error EventMonitorTrap is generated in $*
+MATCH MODE=and
+MATCH $*: (( Power [AB] : Faulted|Disk Array Enclosure .Bus [0-9] Enclosure [0-9]. is faulted))
+MATCH $X: (0(2:5|3:0|3:1)[0-9]:[0-9][0-9])
 SDESC
 An Error EventMonitorTrap is generated in
 response to a user-specified event.

The reason for this is, the Clariion performs a power supply check every friday around 3:00 am. This triggers a SNMP trap to be sent, even if the power supplies check out fine. In my opinion this behaviour is defective, but a case opened on this issue showed that EMC tends to think otherwise. Since there was very little hope for EMC to come to at least some sense, i just did the above patch to the SNMPTT configuration file. What it does is, it basically lowers the severity for all “Major” traps that are power supply related and sent around 3:00 am to “Normal”. All other “Major” traps keep their original severity.

Add the new configuration file to be included in the global SNMPTT configuration and restart the SNMPTT daemon:

$ vim /opt/snmptt/snmptt.ini

...
[TrapFiles]
snmptt_conf_files = <<END
...
/opt/snmptt/conf/snmptt.conf.emc-clariion
...
END

$ /etc/init.d/snmptt reload

Download the Nagios plugin check_snmp_traps.sh and place it in the plugins directory of your Nagios system, in this example /usr/lib/nagios/plugins/:
```
$ mv -i check_snmp_traps.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps.sh
```
Define the following Nagios command to check for SNMP traps in the SNMPTT database. In this example this is done in the file /etc/nagios-plugins/config/check_snmp_traps.cfg:
```
# check for snmp traps
define command{
    command_name    check_snmp_traps
    command_line    $USER1$/check_snmp_traps.sh -H $HOSTNAME$:$HOSTADDRESS$ -u <user> -p <pass> -d <snmptt_db>
}
```
Replace user, pass and snmptt_db with values suitable for your SNMPTT database environment.

Add another service in your Nagios configuration to be checked for each Clariion device:

# check snmptraps
define service {
    use                     generic-service
    hostgroup_name          cx4-ctrl
    service_description     Check_SNMP_traps
    check_command           check_snmp_traps
}

Optional: Define a serviceextinfo to display a folder icon next to the Check_SNMP_traps service check for each Clariion device. This icon provides a direct link to the SNMPTT web interface with a filter for the selected host:
```
define serviceextinfo {
    hostgroup_name          cx4-ctrl
    service_description     Check_SNMP_traps
    notes                   SNMP Alerts
    #notes_url               http://<hostname>/nagios3/nagtrap/index.php?hostname=$HOSTNAME$
    #notes_url               http://<hostname>/nagios3/nsti/index.php?perpage=100&hostname=$HOSTNAME$
}
```
Uncomment the notes_url depending on which web interface (nagtrap or nsti) is used. Replace hostname with the FQDN or IP address of the server running the web interface.

Run a configuration check and if successful reload the Nagios process:

$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload

All done, you should now have a complete Nagios-based monitoring solution for your EMC Clariion devices.

0 Comments | 2012-10-02 written by Frank Fegert (XING) | Permanentlink
Tags:

2013-11-21 // Nagios Monitoring - EMC Centera

2012-10-02 // Nagios Monitoring - EMC Clariion

Recent Comments

Syndiaction / RSS Feed

A word from our sponsors

QR Code: URL of current page

2013-11-21 // Nagios Monitoring - EMC Centera

2012-10-02 // Nagios Monitoring - EMC Clariion

Tag Cloud

Recent Comments

Blogroll

Syndiaction / RSS Feed

A word from our sponsors

QR Code: URL of current page