Nagios Monitoring - Infortrend EonStor (Update)

After gaining some operational experience with the previously introduced Nagios plugin for Infortrend EonStor and EonStor DS storage arrays, some issues became apparent which needed to be addressed in an update. The updated version of the Nagios plugin check_infortrend.sh fixes the following issues:

Newer array firmware versions use different severity strings for event log entries. The Nagios plugin has been adapted to be able to cope with these.
Due to a limitation in the array firmware, event log entries available via SNMP are not cleared when the corresponding event log entries are cleared via RAIDWatch or SANWatch. They are only cleared during a controller reboot. A support case with Infortrend confirmed this and it is currently under investigation whether this feature could be added for future releases of the array firmware.

In the meantime a workaround to address this behaviour has been added to the Nagios plugin. In order to not constantly get alarms about already addressed events, the date of the event will now – along with the already used severity of the event – be taken into consideration. By default only events within the past 60 minutes will be evaluated. The extent of time to look into the past can be overridden from the default with the new “-t <minutes>” command line option of the Nagios plugin. E.g. “-t 120” will evaluate events within the past 2 hours. You might need to adjust the definition of the check_infortrend_events command to suite your environment, e.g.:
```
# check Infortrend ESDS cache status
define command {
    command_name    check_infortrend_events
    command_line    $USER1$/check_infortrend.sh -H $HOSTNAME$ -C events -t <minutes>
}
```
There are several pitfalls here though:
- Make sure the timeframe aligns with your Nagios configuration options max_check_attempts, normal_check_interval and retry_check_interval to actually allow non-normal events to trigger an alarm. Generally speaking, the cumulated timeframe of the three Nagios configuration options of your Check_IFT_Events service check must be smaller than the timeframe given in the definition of the check_infortrend_events command.
- Make sure you set the date and time on the array to the correct values or preferably use a SNTP server. Also, make sure the Nagios server and the array use the same timezone, preferably UTC in both cases.

Not exactly an issue with the Nagios plugin per se, but related to the monitoring of Infortrend storage arrays in general, is an adapted SNMPTT configuration. Since all SNMP traps generated from events of Infortrend arrays are indiscriminately reported with the same OID, a bit more extensive parsing of the SNMP traps needs to occur. In order to achieve this, the following entries should be used in the snmptt.conf.infortrend SNMPTT configuration file:

/opt/snmptt/conf/snmptt.conf.infortrend

EVENT iftEventText .1.3.6.1.4.1.1714.0.1 "Status Events" Critical
FORMAT The description of the event $*
MATCH $*: ( \[Alert Condition\])
MATCH $*: ( \[Critical\])
MATCH $*: ( \[Critical Error\])
MATCH $*: ( \[Error\])
SDESC
The description of the event
Variables:
EDESC
 
EVENT iftEventText .1.3.6.1.4.1.1714.0.1 "Status Events" Warning
FORMAT The description of the event $*
MATCH $*: ( \[Warning Condition\])
MATCH $*: ( \[Warning\])
SDESC
The description of the event
Variables:
EDESC
 
EVENT iftEventText .1.3.6.1.4.1.1714.0.1 "Status Events" Normal
FORMAT The description of the event $*
MATCH $*: ( \[Information\])
MATCH $*: ( \[Notification\])
SDESC
The description of the event
Variables:
EDESC

and the SNMPTT daemon should be restarted.

If you happen to find any additional issues with monitoring your Infortrend systems with this Nagios plugin, please feel free to leave a comment or drop me a note via email.

0 Comments | 2014-07-22 written by Frank Fegert (XING) | Permanentlink
Tags:

2014-01-12 // Nagios Monitoring - Infortrend EonStor

Please be sure to also read the update Nagios Monitoring - Infortrend EonStor (Update) to this blog post.

We use several Infortrend EonStor and EonStor DS storage arrays as low-cost, bulk storage units in our datacenters. With check_infortrend.pl, check_infortrend and check_ift_{dev|hdd|ld}.pl there are already several Nagios plugin to monitor Infortrend EonStor storage arrays. Since i wanted a low overhead, shell-based plugin with support for performance data, i decided to write my own version check_infortrend.sh. In order to run the Nagios plugin, you need to have SNMP activated on the Infortrend storage array and a network connection from the Nagios system to the Infortrend device on port UDP/161 must be allowed.

The whole setup looks like this:

Enable SNMP queries on the Infortrend storage array. Login via Telnet or SSH and navigate to:

-> view and edit Configuration parameters
   -> Communication Parameters
      -> Network Protocol Support
         -> SNMP - Disabled
            -> Enable SNMP Protocol?
               -> Yes

Verify the port UDP/161 on the Infortrend device can be reached from the Nagios system.

Optional: Enable SNMP traps to be sent to the Nagios system on the Infortrend storage array. This requires SNMPD and SNMPTT to be already setup on the Nagios system. Verify the port UDP/162 on the Nagios system can be reached from the Infortrend device.
Download the Nagios plugin check_infortrend.sh and place it in the plugins directory of your Nagios system, in this example /usr/lib/nagios/plugins/:
```
$ mv -i check_infortrend.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_infortrend.sh
```

Define the following Nagios commands. In this example this is done in the file /etc/nagios-plugins/config/check_infortrend.cfg:

# check Infortrend ESDS cache status
define command {
    command_name    check_infortrend_cache
    command_line    $USER1$/check_infortrend.sh -H $HOSTNAME$ -C cache
}
# check Infortrend ESDS controller status
define command {
    command_name    check_infortrend_controller
    command_line    $USER1$/check_infortrend.sh -H $HOSTNAME$ -C controller
}
# check Infortrend ESDS disk status
define command {
    command_name    check_infortrend_disk
    command_line    $USER1$/check_infortrend.sh -H $HOSTNAME$ -C disk
}
# check Infortrend ESDS logicaldrive status
define command {
    command_name    check_infortrend_logicaldrive
    command_line    $USER1$/check_infortrend.sh -H $HOSTNAME$ -C logicaldrive
}
# check Infortrend ESDS logicalunit status
define command {
    command_name    check_infortrend_logicalunit
    command_line    $USER1$/check_infortrend.sh -H $HOSTNAME$ -C logicalunit
}
# check Infortrend ESDS event status
define command {
    command_name    check_infortrend_events
    command_line    $USER1$/check_infortrend.sh -H $HOSTNAME$ -C events
}

Define a group of services in your Nagios configuration to be checked for each Infortrend system:

# check snmpd
define service {
    use                     generic-service
    hostgroup_name          infortrend
    service_description     Check_SNMPD
    check_command           check_snmpd
}
# check_infortrend_cache
define service {
    use                     generic-service-pnp
    hostgroup_name          infortrend
    service_description     Check_IFT_Cache
    check_command           check_infortrend_cache
}
# check_infortrend_controller
define service {
    use                     generic-service
    hostgroup_name          infortrend
    service_description     Check_IFT_Controller
    check_command           check_infortrend_controller
}
# check_infortrend_disk
define service {
    use                     generic-service-pnp
    hostgroup_name          infortrend
    service_description     Check_IFT_Disk
    check_command           check_infortrend_disk
}
# check_infortrend_logicaldrive
define service {
    use                     generic-service-pnp
    hostgroup_name          infortrend
    service_description     Check_IFT_LogicalDrive
    check_command           check_infortrend_logicaldrive
}
# check_infortrend_logicalunit
define service {
    use                     generic-service-pnp
    hostgroup_name          infortrend
    service_description     Check_IFT_LogicalUnit
    check_command           check_infortrend_logicalunit
}
# check_infortrend_events
define service {
    use                     generic-service
    hostgroup_name          infortrend
    service_description     Check_IFT_Events
    check_command           check_infortrend_events
}

Replace generic-service with your Nagios service template. Replace generic-service-pnp with your Nagios service template that has performance data processing enabled.

Define a service dependency to run the above checks only if the Check_SNMPD was run successfully:

# Infortrend SNMPD dependencies
define servicedependency {
    hostgroup_name                  infortrend
    service_description             Check_SNMPD
    dependent_service_description   Check_IFT_.*
    execution_failure_criteria      c,p,u,w
    notification_failure_criteria   c,p,u,w
}

Define hosts in your Nagios configuration for each Infortrend device. In this example its named esds1:
```
define host {
    use         disk
    host_name   esds1
    alias       Infortrend Disk Storage 1
    address     10.0.0.1
    parents     parent_lan
}
```
Replace disk with your Nagios host template for storage devices. Adjust the address and parents parameters according to your environment.
Define a hostgroup in your Nagios configuration for all Infortrend devices. In this example it is named infortrend. The above checks are run against each member of the hostgroup:
```
define hostgroup {
    hostgroup_name  infortrend
    alias           Infortrend Disk Storages
    members         esds1
}
```

Run a configuration check and if successful reload the Nagios process:

$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload

The new hosts and services should soon show up in the Nagios web interface.

If the optional step number 2 in the above list was done, SNMPTT also needs to be configured to be able to understand the incoming SNMP traps from Infortrend devices. This can be achieved by the following steps:

Request a current version of the Infortrend SNMP MIB file from Infortrend support. In this example it's IFT_MIB_v1.40A02.mib. Transfer the file IFT_MIB_v1.40A02.mib to the Nagios server.

Convert the SNMP MIB definitions in IFT_MIB_v1.40A02.mib into a format that SNMPTT can understand.

$ /opt/snmptt/snmpttconvertmib --in=MIB/IFT_MIB_v1.40A02.mib --out=/opt/snmptt/conf/snmptt.conf.infortrend
...
Done

Total translations:        1
Successful translations:   1
Failed translations:       0

Edit the trap severity according to your requirements, e.g.:

$ vim /opt/snmptt/conf/snmptt.conf.infortrend

...
EVENT iftEventText .1.3.6.1.4.1.1714.2.1.1 "Status Events" Warning
...

Add the new configuration file to be included in the global SNMPTT configuration and restart the SNMPTT daemon:

$ vim /opt/snmptt/snmptt.ini

...
[TrapFiles]
snmptt_conf_files = <<END
...
/etc/snmptt/conf.d/snmptt.conf.infortrend
...
END

$ /etc/init.d/snmptt reload

Download the Nagios plugin check_snmp_traps.sh and place it in the plugins directory of your Nagios system, in this example /usr/lib/nagios/plugins/:
```
$ mv -i check_snmp_traps.sh /usr/lib/nagios/plugins/
$ chmod 755 /usr/lib/nagios/plugins/check_snmp_traps.sh
```
Define the following Nagios command to check for SNMP traps in the SNMPTT database. In this example this is done in the file /etc/nagios-plugins/config/check_snmp_traps.cfg:
```
# check for snmp traps
define command {
    command_name    check_snmp_traps
    command_line    $USER1$/check_snmp_traps.sh -H $HOSTNAME$:$HOSTADDRESS$ -u <user> -p <pass> -d <snmptt_db>
}
```
Replace user, pass and snmptt_db with values suitable for your SNMPTT database environment.

Add another service in your Nagios configuration to be checked for each Infortrend device:

# check snmptraps
define service {
    use                     generic-service
    hostgroup_name          infortrend
    service_description     Check_SNMP_traps
    check_command           check_snmp_traps
}

Optional: Define a serviceextinfo to display a folder icon next to the Check_SNMP_traps service check for each Infortrend device. This icon provides a direct link to the SNMPTT web interface with a filter for the selected host:
```
define serviceextinfo {
    hostgroup_name          infortrend
    service_description     Check_SNMP_traps
    notes                   SNMP Alerts
    #notes_url               http://<hostname>/nagios3/nagtrap/index.php?hostname=$HOSTNAME$
    #notes_url               http://<hostname>/nagios3/nsti/index.php?perpage=100&hostname=$HOSTNAME$
}
```
Uncomment the notes_url depending on which web interface (nagtrap or nsti) is used. Replace hostname with the FQDN or IP address of the server running the web interface.

Run a configuration check and if successful reload the Nagios process:

$ /usr/sbin/nagios3 -v /etc/nagios3/nagios.cfg
$ /etc/init.d/nagios3 reload

Optional: If you're running PNP4Nagios v0.6 or later to graph Nagios performance data, you can use the check_infortrend_cache.php, check_infortrend_disk.php, check_infortrend_logicaldrive.php and check_infortrend_logicalunit.php PNP4Nagios templates to beautify the graphs. Download the PNP4Nagios templates check_infortrend_cache.php, check_infortrend_disk.php, check_infortrend_logicaldrive.php and check_infortrend_logicalunit.php and place them in the PNP4Nagios template directory, in this example /usr/share/pnp4nagios/html/templates/:
```
$ mv -i check_infortrend_*. /usr/share/pnp4nagios/html/templates/
$ chmod 644 /usr/share/pnp4nagios/html/templates/check_infortrend_*.php
```
The following image shows an example of what the PNP4Nagios graphs look like for a Infortrend EonStor unit: