====== Nagios Monitoring - EMC Clariion ======
Some time ago i wrote a - rather crude - Nagios plugin to monitor EMC Clariion storage arrays, specifically the CX4-120 model. The plugin isn't very pretty, but it'll do in a pinch ;-) In order to run it, you need to have the command line tools ''navicli'' and ''naviseccli'' - which are provided by EMC - installed on the Nagios system and SNMP activated on the SPs of the array. A network connection from the Nagios system to the Clariion device on ports TCP/6389 and UDP/161 must be allowed.
Since the Nagios server in my setup runs on Debian/PPC on a IBM Power LPAR and there is no native Linux/PPC version of ''navicli'' or ''naviseccli'', i had to run those tools through the [[wp>PowerVM_Lx86|IBM PowerVM LX86]] emulator. If the plugin is run in a x86 environment, the variable ''RUNX86'' has to be set to an empty value.
The whole setup looks like this:
- Enable SNMP queries on each of the Clariion devices SP. Login to NaviSphere and for each Clariion device navigate to:
-> Storage tab
-> Storage Domains
-> Local Domain
->
-> SP A
-> Properties
-> Network tab
-> Check box "Enable / Disable processing of SNMP MIB read requests"
-> Apply or OK
-> SP B
-> Properties
-> Network tab
-> Check box "Enable / Disable processing of SNMP MIB read requests"
-> Apply or OK
Verify the port **UDP/161** on the Clariion device can be reached from the Nagios system. <
- **Optional**: Enable SNMP traps to be sent to the Nagios system on each of the Clariion devices SP. This requires SNMPD and SNMPTT to be already setup on the Nagios system. Login to NaviSphere and for each Clariion device navigate to:
-> Monitors tab
-> Monitor
-> Templates
-> Create New Template Based On ...
-> Call_Home_Template_6.28.5
-> General tab
->
Verify the port **UDP/162** on the Nagios system can be reached from the Clariion devices. <
- Install ''navicli'' or ''naviseccli'' on the Nagios system, in this example ''/opt/Navisphere/bin/navicli'' and ''/opt/Navisphere/bin/naviseccli''. Verify the port **TCP/6389** on the Clariion device can be reached from the Nagios system. <
- Download the {{:2012:10:02:check_cx.sh|Nagios plugin check_cx.sh}} and place it in the plugins directory of your Nagios system, in this example ''/usr/lib/nagios/plugins/'':
$ mv -i check_cx.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_cx.sh
<
- Adjust the plugin settings according to your environment. Edit the following variable assignments:
NAVICLI=/opt/Navisphere/bin/navicli
NAVISECCLI=/opt/Navisphere/bin/naviseccli
NAVIUSER="adminuser"
NAVIPASS="adminpassword"
RUNX86=/usr/local/bin/runx86
SNMPGETNEXT_ARGS="-On -v 1 -c public -t 15"
<
- Define the following Nagios commands. In this example this is done in the file ''/etc/nagios-plugins/config/check_cx.cfg'':
# check CX4 status
define command {
command_name check_cx_status
command_line $USER1$/check_cx.sh -H $HOSTNAME$ -C snmp
}
define command {
command_name check_cx_cache
command_line $USER1$/check_cx.sh -H $HOSTNAME$ -C cache
}
define command {
command_name check_cx_disk
command_line $USER1$/check_cx.sh -H $HOSTNAME$ -C disk
}
define command {
command_name check_cx_faults
command_line $USER1$/check_cx.sh -H $HOSTNAME$ -C faults
}
define command {
command_name check_cx_sp
command_line $USER1$/check_cx.sh -H $HOSTNAME$ -C sp
}
<
- Verify that a generic check command for a running SNMPD is already present in your Nagios configuration. If not add a new check command like this:
define command {
command_name check_snmpd
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.1.3.0 -P 1 -C public -t 30
}
<
- Define a group of services in your Nagios configuration to be checked for each Clariion device:
# check snmpd
define service {
use generic-service
hostgroup_name cx4-ctrl
service_description Check_SNMPD
check_command check_snmpd
}
# check CX status
define service {
use generic-service
hostgroup_name cx4-ctrl
service_description Check_CX_status
check_command check_cx_status
}
# check CX cache
define service {
use generic-service
hostgroup_name cx4-ctrl
service_description Check_CX_cache
check_command check_cx_cache
}
# check CX disk
define service {
use generic-service
hostgroup_name cx4-ctrl
service_description Check_CX_disk
check_command check_cx_disk
}
# check CX faults
define service {
use generic-service
hostgroup_name cx4-ctrl
service_description Check_CX_faults
check_command check_cx_faults
}
# check CX sp
define service {
use generic-service
hostgroup_name cx4-ctrl
service_description Check_CX_sp
check_command check_cx_sp
}
Replace ''generic-service'' with your Nagios service template. <
- Define a service dependency to run the check ''Check_CX_status'' only if the ''Check_SNMPD'' was run successfully:
# CX4 SNMPD dependencies
define servicedependency {
hostgroup_name cx4-ctrl
service_description Check_SNMPD
dependent_service_description Check_CX_status
execution_failure_criteria c,p,u,w
notification_failure_criteria c,p,u,w
}
<
- Define hosts in your Nagios configuration for each SP in the Clariion device. In this example they are named ''cx1-spa'' and ''cx1-spb'':
define host {
use disk
host_name cx1-spa
alias CX4-120 Disk Storage
address 10.0.0.1
parents parent_lan
}
define host {
use disk
host_name cx1-spb
alias CX4-120 Disk Storage
address 10.0.0.2
parents parent_lan
}
Replace ''disk'' with your Nagios host template for storage devices. Adjust the ''address'' and ''parents'' parameters according to your environment. <
- Define a hostgroup in your Nagios configuration for all Clariion devices. In this example it is named ''cx4-ctrl''. The above checks are run against each member of the hostgroup:
define hostgroup {
hostgroup_name cx4-ctrl
alias CX4 Disk Storages
members cx1-spa, cx1-spb
}
<
- Run a configuration check and if successful reload the Nagios process:
$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload
<
The new hosts and services should soon show up in the Nagios web interface.
If the optional step number 2 in the above list was done, SNMPTT also needs to be configured to be able to understand the incoming SNMP traps from Clariion devices. This can be achieved by the following steps:
- Convert the EMC Clariion SNMP MIB definitions in ''emc-clariion.mib'' into a format that SNMPTT can understand.
$ /opt/snmptt/snmpttconvertmib --in=MIB/emc-clariion.mib --out=/opt/snmptt/conf/snmptt.conf.emc-clariion
...
Done
Total translations: 5
Successful translations: 5
Failed translations: 0
<
- Edit the trap severity according to your requirements, e.g.:
$ vim /opt/snmptt/conf/snmptt.conf.emc-clariion
...
EVENT EventMonitorTrapWarn .1.3.6.1.4.1.1981.0.4 "Status Events" Warning
...
EVENT EventMonitorTrapFault .1.3.6.1.4.1.1981.0.6 "Status Events" Critical
...
<
- **Optional**: Apply the following patch to the configuration to reduce the number of false positives:
diff -u snmptt.conf.emc-clariion_1 snmptt.conf.emc-clariion
--- snmptt.conf.emc-clariion.orig 2012-10-02 19:04:15.000000000 +0200
+++ snmptt.conf.emc-clariion 2009-07-21 10:28:44.000000000 +0200
@@ -54,8 +54,31 @@
#
#
#
+EVENT EventMonitorTrapError .1.3.6.1.4.1.1981.0.5 "Status Events" Major
+FORMAT An Error EventMonitorTrap is generated in $*
+MATCH MODE=and
+MATCH $*: !(( Power [AB] : Faulted|Disk Array Enclosure .Bus [0-9] Enclosure [0-9]. is faulted))
+MATCH $X: !(0(2:5|3:0|3:1)[0-9]:[0-9][0-9])
+SDESC
+An Error EventMonitorTrap is generated in
+response to a user-specified event.
+Details can be found in Variables data.
+Variables:
+ 1: hostName
+ 2: deviceID
+ 3: eventID
+ 4: eventText
+ 5: storageSystem
+EDESC
+#
+# Filter and ignore the following events
+# 02:50 - 03:15 Navisphere Power Supply Checks
+#
EVENT EventMonitorTrapError .1.3.6.1.4.1.1981.0.5 "Status Events" Normal
FORMAT An Error EventMonitorTrap is generated in $*
+MATCH MODE=and
+MATCH $*: (( Power [AB] : Faulted|Disk Array Enclosure .Bus [0-9] Enclosure [0-9]. is faulted))
+MATCH $X: (0(2:5|3:0|3:1)[0-9]:[0-9][0-9])
SDESC
An Error EventMonitorTrap is generated in
response to a user-specified event.
The reason for this is, the Clariion performs a power supply check every friday around 3:00 am. This triggers a SNMP trap to be sent, even if the power supplies check out fine. In my opinion this behaviour is defective, but a case opened on this issue showed that EMC tends to think otherwise. Since there was very little hope for EMC to come to at least some sense, i just did the above patch to the SNMPTT configuration file. What it does is, it basically lowers the severity for all "Major" traps that are power supply related and sent around 3:00 am to "Normal". All other "Major" traps keep their original severity. <
- Add the new configuration file to be included in the global SNMPTT configuration and restart the SNMPTT daemon:
$ vim /opt/snmptt/snmptt.ini
...
[TrapFiles]
snmptt_conf_files = < <
- Download the {{:2012:10:02:check_snmp_traps.sh|Nagios plugin check_snmp_traps.sh}} and place it in the plugins directory of your Nagios system, in this example ''/usr/lib/nagios/plugins/'':
$ mv -i check_snmp_traps.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps.sh
<
- Define the following Nagios command to check for SNMP traps in the SNMPTT database. In this example this is done in the file ''/etc/nagios-plugins/config/check_snmp_traps.cfg'':
# check for snmp traps
define command{
command_name check_snmp_traps
command_line $USER1$/check_snmp_traps.sh -H $HOSTNAME$:$HOSTADDRESS$ -u -p -d
}
Replace ''user'', ''pass'' and ''snmptt_db'' with values suitable for your SNMPTT database environment. <
- Add another service in your Nagios configuration to be checked for each Clariion device:
# check snmptraps
define service {
use generic-service
hostgroup_name cx4-ctrl
service_description Check_SNMP_traps
check_command check_snmp_traps
}
<
- **Optional**: Define a serviceextinfo to display a folder icon next to the ''Check_SNMP_traps'' service check for each Clariion device. This icon provides a direct link to the SNMPTT web interface with a filter for the selected host:
define serviceextinfo {
hostgroup_name cx4-ctrl
service_description Check_SNMP_traps
notes SNMP Alerts
#notes_url http:///nagios3/nagtrap/index.php?hostname=$HOSTNAME$
#notes_url http:///nagios3/nsti/index.php?perpage=100&hostname=$HOSTNAME$
}
Uncomment the ''notes_url'' depending on which web interface (nagtrap or nsti) is used. Replace ''hostname'' with the FQDN or IP address of the server running the web interface. <
- Run a configuration check and if successful reload the Nagios process:
$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload
<
All done, you should now have a complete Nagios-based monitoring solution for your EMC Clariion devices.