2012-10-23 // Nagios Monitoring - Rittal CMC-TC with LCP
We use Rittal LCP - Liquid Cooling Package units to chill the 19“ racks and equipment in the datacenters. The LCPs come with their own Rittal CMC-TC units for management and monitoring purposes. With check_rittal_health there is already a Nagios plugin to monitor Rittal CMC-TC units. Unfortunately this plugin didn't cover the LCPs, which come with a plethora of built-in sensors. Also the existing plugin didn't allow to set individual monitoring thresholds. Therefor i modified the existing plugin to accomodate our needs. The modified version can be downloaded here check_rittal_health.pl.
The whole setup for monitoring Rittal CMC-TC and LCP with Nagios looks like this:
Configure your CMC-TC unit, see the manual here: CMC-TC Basic CMC DK 7320.111 - Montage, Installation und Bedienung. Essential are the network settings, a user for SNMPv3 access, a SNMP trap receiver (your Nagios server running SNMPTT). Optional, but highly recommended, are the settings for the NTP server, change of the default user passwords and disabling insecure services (Telnet, FTP, HTTP).
Download the Nagios plugin check_rittal_health.pl and place it in the plugins directory of your Nagios system, in this example
/usr/lib/nagios/plugins/
:$ mv -i check_rittal_health.pl /usr/lib/nagios/plugins/ $ chmod 755 /usr/lib/nagios/plugins/check_rittal_health.pl
Define the following Nagios commands. In this example this is done in the file
/etc/nagios-plugins/config/check_cmc.cfg
:# check Rittal CMC status define command { command_name check_cmc_status command_line $USER1$/check_rittal_health.pl --hostname $HOSTNAME$ --protocol $ARG1$ --username $ARG2$ --authpassword $ARG3$ --customthresholds "$ARG4$" }
Verify that a generic check command for a running SNMPD is already present in your Nagios configuration. If not add a new check command like this:
define command { command_name check_snmpdv3 command_line $USER1$/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.1.3.0 -P 3 -t 30 -L $ARG1$ -U $ARG2$ -A $ARG3$ }
Verify that a generic check command for a SSH service is already present in your Nagios configuration. If not add a new check command like this:
# 'check_ssh' command definition define command { command_name check_ssh command_line /usr/lib/nagios/plugins/check_ssh -t 20 '$HOSTADDRESS$' }
Verify that a generic check command for a HTTPS service is already present in your Nagios configuration. If not add a new check command like this:
# 'check_https_port_uri' command definition define command { command_name check_https_port_uri command_line /usr/lib/nagios/plugins/check_http --ssl -I '$HOSTADDRESS$' -p '$ARG1$' -u '$ARG2$' }
Define a group of services in your Nagios configuration to be checked for each CMC-TC device:
# check host alive define service { use generic-service-pnp hostgroup_name cmc service_description Check_host_alive check_command check-host-alive } # check sshd define service { use generic-service hostgroup_name cmc service_description Check_SSH check_command check_ssh } # check snmpd define service { use generic-service hostgroup_name cmc service_description Check_SNMPDv3 check_command check_snmpdv3!authNoPriv!<user>!<pass> } # check httpd define service { use generic-service-pnp hostgroup_name cmc service_description Check_service_https check_command check_https_port_uri!443!/ } # check Rittal CMC status define service { use generic-service-pnp servicegroups snmpchecks hostgroup_name cmc service_description Check_CMC_Status check_command check_cmc_status!3!<user>!<pass>!airTemp,15:32,10:35\;coolingCapacity,0:10000,0:15000\;events,0:1,0:2\;fan,450:2000,400:2500\;temp,15:30,10:35\;waterFlow,0:70,0:100\;waterTemp,10:25,5:30 }
Replace
generic-service-pnp
with your Nagios service template that has performance data processing enabled. Replace<user>
and<pass>
with the user credentials configured on the CMC-TC devices for SNMPv3 access. Adjust the sensor threshold settings according to your requirements, see the output ofcheck_rittal_health.pl -h
for an explaination of the threshold settings format.Define a service dependency to run the check
Check_CMC_status
only if theCheck_SNMPDv3
was run successfully:# Rittal CMC SNMPD dependencies define servicedependency { hostgroup_name cmc service_description Check_SNMPDv3 dependent_service_description Check_CMC_Status execution_failure_criteria c,p,u,w notification_failure_criteria c,p,u,w }
Define hosts in your Nagios configuration for each CMC-TC device. In this example its named
cmc-host1
:define host { use cmc host_name cmc-host1 alias Rittal CMC LPC address 10.0.0.1 parents parent_lan }
Replace
cmc
with your Nagios host template for the CMC-TC devices. Adjust theaddress
andparents
parameters according to your environment.Define a hostgroup in your Nagios configuration for all CMC-TC devices. In this example it is named
cmc
. The above checks are run against each member of the hostgroup:define hostgroup { hostgroup_name cmc alias Rittal CMC members cmc-host1 }
Run a configuration check and if successful reload the Nagios process:
$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg $ /etc/init.d/nagios3 reload
The new hosts and services should soon show up in the Nagios web interface.
If the Nagios server is running SNMPTT and was configured as a SNMP trap receiver in step number 1 in the above list, SNMPTT also needs to be configured to be able to understand the incoming SNMP traps from CMC-TC devices. This can be achieved by the following steps:
Convert the Rittal SNMP MIB definitions in
CMC-TC_MIB_v1.1h.txt
into a format that SNMPTT can understand.$ /opt/snmptt/snmpttconvertmib --in=MIB/CMC-TC_MIB_v1.1h.txt --out=/opt/snmptt/conf/snmptt.conf.rittal-cmc-tc ... Done Total translations: 10 Successful translations: 10 Failed translations: 0
The trap severity settings should be pretty reasonable by default, but you can edit them according to your requirements with:
$ vim /opt/snmptt/conf/snmptt.conf.rittal-cmc-tc
Add the new configuration file to be included in the global SNMPTT configuration and restart the SNMPTT daemon:
$ vim /opt/snmptt/snmptt.ini ... [TrapFiles] snmptt_conf_files = <<END ... /opt/snmptt/conf/snmptt.conf.rittal-cmc-tc ... END $ /etc/init.d/snmptt reload
Download the Nagios plugin check_snmp_traps.sh and place it in the plugins directory of your Nagios system, in this example
/usr/lib/nagios/plugins/
:$ mv -i check_snmp_traps.sh /usr/lib/nagios/plugins/ $ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps.sh
Define the following Nagios command to check for SNMP traps in the SNMPTT database. In this example this is done in the file
/etc/nagios-plugins/config/check_snmp_traps.cfg
:# check for snmp traps define command { command_name check_snmp_traps command_line $USER1$/check_snmp_traps.sh -H $HOSTNAME$:$HOSTADDRESS$ -u <user> -p <pass> -d <snmptt_db> }
Replace
user
,pass
andsnmptt_db
with values suitable for your SNMPTT database environment.Add another service in your Nagios configuration to be checked for each CMC device:
# check snmptraps define service { use generic-service hostgroup_name cmc service_description Check_SNMP_traps check_command check_snmp_traps }
Optional: Define a serviceextinfo to display a folder icon next to the
Check_SNMP_traps
service check for each CMC device. This icon provides a direct link to the SNMPTT web interface with a filter for the selected host:define serviceextinfo { hostgroup_name cmc service_description Check_SNMP_traps notes SNMP Alerts #notes_url http://<hostname>/nagios3/nagtrap/index.php?hostname=$HOSTNAME$ #notes_url http://<hostname>/nagios3/nsti/index.php?perpage=100&hostname=$HOSTNAME$ }
Uncomment the
notes_url
depending on which web interface (nagtrap or nsti) is used. Replacehostname
with the FQDN or IP address of the server running the web interface.Run a configuration check and if successful reload the Nagios process:
$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg $ /etc/init.d/nagios3 reload
Optional: If you're running PNP4Nagios v0.6 or later to graph Nagios performance data, you can use the
check_cmc_status.php
PNP4Nagios template to beautify the graphs. Download the PNP4Nagios template check_cmc_status.php and place it in the PNP4Nagios template directory, in this example/usr/share/pnp4nagios/html/templates/
:$ mv -i check_cmc_status.php /usr/share/pnp4nagios/html/templates/ $ chmod 644 /usr/share/pnp4nagios/html/templates/check_cmc_status.php
The following image shows an example of what the PNP4Nagios graphs look like for a Rittal CMC-TC with a LCP-T3+ unit:
All done, you should now have a complete Nagios-based monitoring solution for your Rittal CMC-TC and LCP devices.