Check_MK Monitoring - RMON Interface Statistics

Check_MK provides the check rmon_stats to collect and monitor Remote Network MONitoring (RMON) statistics for interfaces of network equipment. While this stock check probably works fine with Cisco network devices, it does – out of the box – not work with network equipment from other vendors like e.g. Dell PowerConnect switches. This article shows the modifications necessary to make the rmon_stats check work with non-Cisco network equipment. Along the way, other shortcomings of the stock rmon_stats are discussed and the necessary modifications to address those issues are shown.

For the impatient and TL;DR here are the enhanced versions of the rmon_stats check here along with a slightly beautified version of the accompanying PNP4Nagios template:

Enhanced version of the rmon_stats check
Slightly beautified version of the rmon_stats PNP4Nagios template

The limitation of the rmon_stats check to Cisco network devices is due to the fact that the Check_MK inventory is being limited to vendor specific OIDs in the checks snmp_scan_function. The following code snippet shows the respective lines:

rmon_stats

check_info["rmon_stats"] = {
    'check_function'        : check_rmon_stats,
    'inventory_function'    : inventory_rmon_stats,
    'service_description'   : 'RMON Stats IF %s',
    'has_perfdata'          : True,
    'snmp_info'             : ('.1.3.6.1.2.1.16.1.1.1', [ #
                                        '1',    # etherStatsIndex = Item
                                        '6',    # etherStatsBroadcastPkts
                                        '7',    # etherStatsMulticastPkts
                                        '14',   # etherStatsPkts64Octets
                                        '15',   # etherStatsPkts65to127Octets
                                        '16',   # etherStatsPkts128to255Octets
                                        '17',   # etherStatsPkts256to511Octets
                                        '18',   # etherStatsPkts512to1023Octets
                                        '19',   # etherStatsPkts1024to1518Octets
                                        ]),
    # for the scan we need to check for any single object in the RMON tree,
    # we choose netDefaultGateway in the hope that it will always be present
    'snmp_scan_function'    : lambda oid: ( oid(".1.3.6.1.2.1.1.1.0").lower().startswith("cisco") \
                    or oid(".1.3.6.1.2.1.1.2.0") == ".1.3.6.1.4.1.11863.1.1.3" \
                    ) and oid(".1.3.6.1.2.1.16.19.12.0") != None,
}

By replacing the snmp_scan_function lines at the bottom of the code above with the following lines:

rmon_stats

    # if at least one interface within the RMON tree is present (the previous
    # "netDefaultGateway" is not present in every device implementing the RMON
    # MIB
    'snmp_scan_function'    : lambda oid: oid(".1.3.6.1.2.1.16.1.1.1.1.1") > 0,

the check will now be able to successfully inventorize the RMON representation of interfaces even on non-Cisco network equipment.

To actually execute such an inventory, there first needs to be a Check_MK configuration rule in place, which enables the appropriate code in the inventory_rmon_stats function of the check. The following code snippet shows the respective lines:

rmon_stats

def inventory_rmon_stats(info):
    settings = host_extra_conf_merged(g_hostname, inventory_if_rules)
    if settings.get("rmon"):
        [...]

I guess this is supposed to be sort of a safeguard, since the number of interfaces can grow quite large in RMON and querying them can put a lot of strain on the service processor of the network device.

The configuration option to enable RMON based checks is neatly tucked away in the WATO WebGUI at:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Inventory - automatic service detection
         -> Network Interface and Switch Port Discovery
            -> Create rule in folder ...
               -> Value
                  [x] Collect RMON statistics data
                      [x] Create extra service with RMON statistics data (if available for the device)

After creating a global or folder specific configuration rule, the next run of the Check_MK inventory should discover a – possibly large – number of new interfaces and create the individual service checks, one for each new interface. If not, the network devices probably has disabled the RMON statistics feature by default or has no such feature at all. Since the procedure to enable the collection and presentation of RMON statistics via SNMP on a network device is different for each vendor, one has to check with the corresponding vendor documentation.

Taking a look at the list of newly discovered interfaces and the resulting service checks reveals two other issues of the rmon_stats check.

One is, the check is rather simple in the way that it indiscriminately collects RMON interface statistics on all interfaces of a network device, regardless of the actual link state of each interface. While the RMON-MIB lacks direct information about the link state of an interface, it fortunately carries a reference to the IF-MIB for each interface. The IF-MIB in turn provides the information about the link state (ifOperStatus), which can be used to determine if RMON statistics should be collected for a particular interface.

The other issue is, the checks use of the RMON interface index (etherStatsIndex) as a base for the name of the associated service. E.g. RMON Stats IF <etherStatsIndex> in the following example output:

OK  RMON Stats IF 1     OK - bcast=0 mcast=0 0-63b=5 64-127b=124 128-255b=28 256-511b=8 512-1023b=2 1024-1518b=94 octets/sec
OK  RMON Stats IF 10    OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 100   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 101   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 102   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 103   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 104   OK - bcast=0 mcast=0 0-63b=0 64-127b=11 128-255b=3 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 105   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 106   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 107   OK - bcast=0 mcast=1 0-63b=1 64-127b=8 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 108   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 109   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 11    OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 110   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 111   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Stats IF 112   OK - bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
[...]

Unfortunately in RMON the values of the interface index etherStatsIndex are not guaranteed to be consistent across reboots of the network device, much less the addition of new or the removal of existing interfaces. The interface numbering definition according to the original MIB-II on the other hand is much stricter in this regard. Although subsequent RFCs like the IF-MIB (see section “3.1.5. Interface Numbering”) have weakened the original definition to some degree.

Another drawback of the pure RMON interface index number as an interface identifier in the service check name is that it is less descriptive than e.g. the interface description ifDescr from the IF-MIB. This makes the visual mapping of the service check to the respective interface of the network device rather tedious and error-prone.

An obvious solution to both issues would be to use the already mentioned reference to the IF-MIB for each interface and – in the process of the Check_MK inventory – check for the interface link state (ifOperStatus). The Check_MK inventory process would return only those interfaces with a value of up(1) in the ifOperStatus variable. For those particular interfaces it would also return the interface description ifDescr to be used as a service name, instead of the previsously used RMON interface index.

As a side effect, this approach unfortunately breaks the necessary mapping from the service name – now based on the IF-MIB interface description (ifDescr) instead of the previously used RMON interface index (etherStatsIndex) – back to the RMON MIB where the interface statistic values are stored. This is mainly due to the fact that the IF-MIB itself does not provide a native reference to the RMON MIB. Without such a back reference, the statistics values collected from the RMON MIB can – during normal service check runs – not distinctly be assigned to the appropriate interface. It is therefore necessary to manually implement such a back reference within the service check. In this case the solution was to store the value of the RMON interface index etherStatsIndex in the parameters section (params) of each interface service checks inventory entry. The following excerpt shows an example of such a new inventory data structure for several RMON interface service checks:

[
  [...]
  ('rmon_stats', 'Link Aggregate 1', {'rmon_if_idx': 89}),
  ('rmon_stats', 'Link Aggregate 16', {'rmon_if_idx': 104}),
  ('rmon_stats', 'Link Aggregate 19', {'rmon_if_idx': 107}),
  ('rmon_stats', 'Link Aggregate 2', {'rmon_if_idx': 90}),
  ('rmon_stats', 'Link Aggregate 31', {'rmon_if_idx': 119}),
  ('rmon_stats', 'Link Aggregate 32', {'rmon_if_idx': 120}),
  ('rmon_stats', 'Link Aggregate 7', {'rmon_if_idx': 95}),
  ('rmon_stats', 'Link Aggregate 8', {'rmon_if_idx': 96}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 1 10G - Level', {'rmon_if_idx': 1}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 10 10G - Level', {'rmon_if_idx': 10}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 16 10G - Level', {'rmon_if_idx': 16}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 19 10G - Level', {'rmon_if_idx': 19}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 2 10G - Level', {'rmon_if_idx': 2}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 20 10G - Level', {'rmon_if_idx': 20}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 7 10G - Level', {'rmon_if_idx': 7}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 8 10G - Level', {'rmon_if_idx': 8}),
  ('rmon_stats', 'Unit: 1 Slot: 0 Port: 9 10G - Level', {'rmon_if_idx': 9}),
  ('rmon_stats', 'Unit: 1 Slot: 1 Port: 3 10G - Level', {'rmon_if_idx': 23}),
  ('rmon_stats', 'Unit: 1 Slot: 1 Port: 4 10G - Level', {'rmon_if_idx': 24}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 1 10G - Level', {'rmon_if_idx': 25}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 10 10G - Level', {'rmon_if_idx': 34}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 16 10G - Level', {'rmon_if_idx': 40}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 19 10G - Level', {'rmon_if_idx': 43}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 2 10G - Level', {'rmon_if_idx': 26}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 20 10G - Level', {'rmon_if_idx': 44}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 7 10G - Level', {'rmon_if_idx': 31}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 8 10G - Level', {'rmon_if_idx': 32}),
  ('rmon_stats', 'Unit: 2 Slot: 0 Port: 9 10G - Level', {'rmon_if_idx': 33}),
  ('rmon_stats', 'Unit: 2 Slot: 1 Port: 3 10G - Level', {'rmon_if_idx': 47}),
  ('rmon_stats', 'Unit: 2 Slot: 1 Port: 4 10G - Level', {'rmon_if_idx': 48}),
  ('rmon_stats', 'Unit: 3 Slot: 0 Port: 1 10G - Level', {'rmon_if_idx': 49}),
  ('rmon_stats', 'Unit: 3 Slot: 0 Port: 2 10G - Level', {'rmon_if_idx': 50}),
  ('rmon_stats', 'Unit: 3 Slot: 0 Port: 9 10G - Level', {'rmon_if_idx': 57}),
  ('rmon_stats', 'Unit: 4 Slot: 0 Port: 1 10G - Level', {'rmon_if_idx': 69}),
  ('rmon_stats', 'Unit: 4 Slot: 0 Port: 2 10G - Level', {'rmon_if_idx': 70}),
  ('rmon_stats', 'Unit: 4 Slot: 0 Port: 9 10G - Level', {'rmon_if_idx': 77}),
  [...]
]

The changes to the rmon_stats check, necessary to implement the features described above are shown in the following patch:

rmon_stats.patch

--- rmon_stats.orig 2015-12-21 13:41:46.000000000 +0100
+++ rmon_stats  2016-02-02 11:18:56.789188249 +0100
@@ -34,20 +34,43 @@
     settings = host_extra_conf_merged(g_hostname, inventory_if_rules)
     if settings.get("rmon"):
         inventory = []
-        for line in info:
-            inventory.append((line[0], None))
+        for line in info[1]:
+            rmon_if_idx = int(re.sub('.1.3.6.1.2.1.2.2.1.1.','',line[1]))
+            for iface in info[0]:
+                if int(iface[0]) == rmon_if_idx and int(iface[2]) == 1:
+                    params = {}
+                    params["rmon_if_idx"] = int(line[0])
+                    inventory.append((iface[1], "%r" % params))
         return inventory
 
-def check_rmon_stats(item, _no_params, info):
-    bytes = { 1: 'bcast', 2: 'mcast', 3: '0-63b', 4: '64-127b', 5: '128-255b', 6: '256-511b', 7: '512-1023b', 8: '1024-1518b' }
-    perfdata = []
+def check_rmon_stats(item, params, info):
+    bytes = { 2: 'bcast', 3: 'mcast', 4: '0-63b', 5: '64-127b', 6: '128-255b', 7: '256-511b', 8: '512-1023b', 9: '1024-1518b' }
+    rmon_if_idx = str(params.get("rmon_if_idx"))
+    if_alias = ''
     infotext = ''
+    perfdata = []
     now = time.time()
-    for line in info:
-        if line[0] == item:
+
+    for line in info[0]:
+        ifIndex, ifDescr, ifOperStatus, ifAlias = line
+        if item == ifDescr:
+            if_alias = ifAlias
+            if_index = int(ifIndex)
+
+    for line in info[1]:
+        if line[0] == rmon_if_idx:
+            if_mib_idx = int(re.sub('.1.3.6.1.2.1.2.2.1.1.','',line[1]))
+            if if_mib_idx != if_index:
+                return (3, "RMON interface index mapping to IF interface index mapping changed. Re-run Check_MK inventory.")
+
+            if item != if_alias and if_alias != '':
+                infotext = "[%s, RMON: %s] " % (if_alias, rmon_if_idx)
+            else:
+                infotext = "[RMON: %s] " % rmon_if_idx
+
             for i, val in bytes.items():
                 octets = int(re.sub(' Packets','',line[i]))
-                rate = get_rate("%s-%s" % (item, val), now, octets)
+                rate = get_rate("%s-%s" % (rmon_if_idx, val), now, octets)
                 perfdata.append((val, rate, 0, 0, 0))
                 infotext += "%s=%.0f " % (val, rate)
             infotext += 'octets/sec'
@@ -58,10 +81,18 @@
 check_info["rmon_stats"] = {
     'check_function'        : check_rmon_stats,
     'inventory_function'    : inventory_rmon_stats,
-    'service_description'   : 'RMON Stats IF %s',
+    'service_description'   : 'RMON Interface %s',
     'has_perfdata'          : True,
-    'snmp_info'             : ('.1.3.6.1.2.1.16.1.1.1', [ #
+    'snmp_info'             : [
+        ( '.1.3.6.1.2.1', [ #
+                                        '2.2.1.1',      # ifNumber
+                                        '2.2.1.2',      # ifDescr
+                                        '2.2.1.8',      # ifOperStatus
+                                        '31.1.1.1.18',  # ifAlias
+                          ]),
+        ('.1.3.6.1.2.1.16.1.1.1', [ #
                                         '1',    # etherStatsIndex = Item
+                                        '2',    # etherStatsDataSource = Interface in the RFC1213-MIB
                                         '6',    # etherStatsBroadcastPkts
                                         '7',    # etherStatsMulticastPkts
                                         '14',   # etherStatsPkts64Octets
@@ -70,10 +101,11 @@
                                         '17',   # etherStatsPkts256to511Octets
                                         '18',   # etherStatsPkts512to1023Octets
                                         '19',   # etherStatsPkts1024to1518Octets
-                                        ]),
-    # for the scan we need to check for any single object in the RMON tree,
-    # we choose netDefaultGateway in the hope that it will always be present
-    'snmp_scan_function'    : lambda oid: ( oid(".1.3.6.1.2.1.1.1.0").lower().startswith("cisco") \
-                    or oid(".1.3.6.1.2.1.1.2.0") == ".1.3.6.1.4.1.11863.1.1.3" \
-                    ) and oid(".1.3.6.1.2.1.16.19.12.0") != None,
+                                  ]),
+        ],
+    # if at least one interface within the RMON tree is present (the previous
+    # "netDefaultGateway" is not present in every device implementing the RMON
+    # MIB
+    'snmp_scan_function'    : lambda oid: oid(".1.3.6.1.2.1.16.1.1.1.1.1") > 0,
+
 }

The following output shows examples of the new service check names. The ifAlias and the etherStatsIndex have been added as auxiliary information to the service check output. The ifAlias entries in the examples have – in order to protect the innocent – been redacted with xxxxxxxx though.

OK  RMON Interface Link Aggregate 1                        OK - [xxxxxxxx, RMON: 89] bcast=0 mcast=0 0-63b=85 64-127b=390 128-255b=136 256-511b=25 512-1023b=20 1024-1518b=407 octets/sec
OK  RMON Interface Link Aggregate 16                       OK - [xxxxxxxx, RMON: 104] bcast=0 mcast=0 0-63b=1 64-127b=11 128-255b=3 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
UNKN    RMON Interface Link Aggregate 19                       UNKNOWN - RMON interface index mapping to IF interface index mapping changed. Re-run Check_MK inventory.
OK  RMON Interface Link Aggregate 2                        OK - [xxxxxxxx, RMON: 90] bcast=0 mcast=0 0-63b=111 64-127b=480 128-255b=207 256-511b=44 512-1023b=41 1024-1518b=484 octets/sec
[...]
OK  RMON Interface Unit: 1 Slot: 0 Port: 1 10G - Level     OK - [xxxxxxxx, RMON: 1] bcast=0 mcast=0 0-63b=49 64-127b=232 128-255b=57 256-511b=14 512-1023b=4 1024-1518b=185 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 10 10G - Level    OK - [xxxxxxxx, RMON: 10] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 16 10G - Level    OK - [xxxxxxxx, RMON: 16] bcast=0 mcast=0 0-63b=0 64-127b=2 128-255b=1 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 19 10G - Level    OK - [xxxxxxxx, RMON: 19] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 2 10G - Level     OK - [xxxxxxxx, RMON: 2] bcast=0 mcast=0 0-63b=62 64-127b=293 128-255b=89 256-511b=28 512-1023b=9 1024-1518b=252 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 20 10G - Level    OK - [xxxxxxxx, RMON: 20] bcast=0 mcast=0 0-63b=75 64-127b=1640 128-255b=217 256-511b=41 512-1023b=79 1024-1518b=2096 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 7 10G - Level     OK - [xxxxxxxx, RMON: 7] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 8 10G - Level     OK - [xxxxxxxx, RMON: 8] bcast=0 mcast=0 0-63b=0 64-127b=5 128-255b=1 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 1 Slot: 0 Port: 9 10G - Level     OK - [xxxxxxxx, RMON: 9] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 1 Slot: 1 Port: 3 10G - Level     OK - [xxxxxxxx, RMON: 23] bcast=0 mcast=0 0-63b=0 64-127b=3 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 1 Slot: 1 Port: 4 10G - Level     OK - [xxxxxxxx, RMON: 24] bcast=0 mcast=0 0-63b=0 64-127b=3 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 2 Slot: 0 Port: 1 10G - Level     OK - [xxxxxxxx, RMON: 25] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 2 Slot: 0 Port: 10 10G - Level    OK - [xxxxxxxx, RMON: 34] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 2 Slot: 0 Port: 16 10G - Level    OK - [xxxxxxxx, RMON: 40] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
OK  RMON Interface Unit: 2 Slot: 0 Port: 19 10G - Level    OK - [xxxxxxxx, RMON: 43] bcast=0 mcast=1 0-63b=1 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec
[...]
OK  RMON Interface Unit: 4 Slot: 0 Port: 9 10G - Level     OK - [xxxxxxxx, RMON: 77] bcast=0 mcast=0 0-63b=0 64-127b=0 128-255b=0 256-511b=0 512-1023b=0 1024-1518b=0 octets/sec

The enhanced version of the rmon_stats check can be downloaded here along with a slightly beautified version of the accompanying PNP4Nagios template:

Enhanced version of the rmon_stats check
Patch to enhance the original verion of the rmon_stats check
Slightly beautified version of the rmon_stats PNP4Nagios template
Patch to beautify the rmon_stats PNP4Nagios template

Finally, the following two screenshots show examples of PNP4Nagios graphs using the modified version of the PNP4Nagios template:

The sources to both, the enhanced version of the rmon_stats check and the beautified version of the rmon_stats PNP4Nagios template, can be found on GitHub in my Check_MK Plugins repository.

0 Comments | 2016-02-05 written by Frank Fegert (XING) | Permanentlink
Tags:

2016-01-21 // World + dog on GitHub - me too now

World + dog seems to be on GitHub, me too now. Fork me on GitHub.

Fair warning though – it currently looks a bit deserted there, because i'm still getting to grips with Git. Over time i think i'll be moving my source code from subversion (SVN) to Git. It'll probably take some time, but make sure to stay tuned for the repositories emerging there.

0 Comments | 2016-01-21 written by Frank Fegert (XING) | Permanentlink
Tags:

Git
GitHub

2015-11-13 // QLogic iSCSI HBA and Limitations in Bi-Directional Authentication

In the past the QLogic QConvergeConsole (qaucli) was used as an administration tool for the hardware initiator part of the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs. Unfortunately this tool was only supported on the so-called “enterprise Linux distributions” like RHEL and SLES. If you were running any other Linux distribution like e.g. Debian or even one of the BSD distributions you were out of luck.

Thankfully QLogic addressed this support issue indirectly, by first announcing and since then by actually moving from a IOCTL based management method towards the Open-iSCSI based management method via the iscsiadm command. The announcement QLogic iSCSI Solution for Transitioning to the Open-iSCSI Model and the User's Guide IOCTL to Open-iSCSI Interface can be found at the QLogic web site.

While trying to test and use the new management method for the hardware initiator via the Open-iSCSI iscsiadm command, i soon ran into the issue that the packaged version of Open-iSCSI, which is shipped with Debian Wheezy, is based on the last stable release v2.0.873 from Open-iSCSI and is thus hopelessly out of date. The Open-iSCSI package shipped with Debian Jessie is a bit better, since it's already based on a newer version from the projects GitHub repository. Still, the Git commit used there dates back to August 23rd of 2013, which is also fairly old. After updating my system to Debian Jessie, i soon decided to rebuild the Open-iSCSI package from a much more recent version from the projects GitHub repository. With this, the management of the QLogic hardware initiators worked very well via the Open-iSCSI iscsiadm command and its now enhanced host mode.

In the host mode there are now three sub-modes chap, flashnode, stats. See man iscsiadm and /usr/share/doc/open-iscsi/README.gz for more details on how to use them. By first calling the host mode without any sub-mode, iscsiadm prints a list of available iSCSI HBAs along with the host number – shown in the first pair of square brackets – associated with each host by the OS kernel:

root@host:~$ iscsiadm -m host
qla4xxx: [1] 10.0.0.5,[84:8f:69:35:fc:70],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2c.4
qla4xxx: [2] 10.0.0.6,[84:8f:69:35:fc:71],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2d.5

The host number – in the above example 1 and 2 – is used in the following examples showing the three sub-modes:

The stats sub-mode displays various statistics values, like e.g. TCP/IP and iSCSI sessions, of the given HBA port:

root@host:~$ iscsiadm -m host -H 1 -C stats

Host Statistics:
    mactx_frames: 2351750
    mactx_bytes: 233065914
    mactx_multicast_frames: 1209409
    mactx_broadcast_frames: 0
    mactx_pause_frames: 0
    mactx_control_frames: 0
    mactx_deferral: 0
    mactx_excess_deferral: 0
    mactx_late_collision: 0
    mactx_abort: 0
    mactx_single_collision: 0
    mactx_multiple_collision: 0
    mactx_collision: 0
    mactx_frames_dropped: 0
    mactx_jumbo_frames: 0
    macrx_frames: 4037613
    macrx_bytes: 1305799553
    macrx_unknown_control_frames: 0
    macrx_pause_frames: 0
    macrx_control_frames: 0
    macrx_dribble: 0
    macrx_frame_length_error: 0
    macrx_jabber: 0
    macrx_carrier_sense_error: 0
    macrx_frame_discarded: 0
    macrx_frames_dropped: 2409752
    mac_crc_error: 0
    mac_encoding_error: 0
    macrx_length_error_large: 0
    macrx_length_error_small: 0
    macrx_multicast_frames: 0
    macrx_broadcast_frames: 0
    iptx_packets: 1694187
    iptx_bytes: 112412836
    iptx_fragments: 0
    iprx_packets: 1446806
    iprx_bytes: 721191324
    iprx_fragments: 0
    ip_datagram_reassembly: 0
    ip_invalid_address_error: 0
    ip_error_packets: 0
    ip_fragrx_overlap: 0
    ip_fragrx_outoforder: 0
    ip_datagram_reassembly_timeout: 0
    ipv6tx_packets: 0
    ipv6tx_bytes: 0
    ipv6tx_fragments: 0
    ipv6rx_packets: 0
    ipv6rx_bytes: 0
    ipv6rx_fragments: 0
    ipv6_datagram_reassembly: 0
    ipv6_invalid_address_error: 0
    ipv6_error_packets: 0
    ipv6_fragrx_overlap: 0
    ipv6_fragrx_outoforder: 0
    ipv6_datagram_reassembly_timeout: 0
    tcptx_segments: 1694187
    tcptx_bytes: 69463008
    tcprx_segments: 1446806
    tcprx_byte: 692255204
    tcp_duplicate_ack_retx: 8
    tcp_retx_timer_expired: 28
    tcprx_duplicate_ack: 0
    tcprx_pure_ackr: 0
    tcptx_delayed_ack: 247594
    tcptx_pure_ack: 247710
    tcprx_segment_error: 0
    tcprx_segment_outoforder: 0
    tcprx_window_probe: 0
    tcprx_window_update: 2248673
    tcptx_window_probe_persist: 0
    ecc_error_correction: 0
    iscsi_pdu_tx: 1446486
    iscsi_data_bytes_tx: 30308
    iscsi_pdu_rx: 1446510
    iscsi_data_bytes_rx: 622721801
    iscsi_io_completed: 253632
    iscsi_unexpected_io_rx: 0
    iscsi_format_error: 0
    iscsi_hdr_digest_error: 0
    iscsi_data_digest_error: 0
    iscsi_sequence_error: 0

The chap sub-mode displays and alters a table containing authentication information. Calling this sub-mode with the -o show option displays the current contents of the table:
```
root@host:~$ iscsiadm -m host -H 1 -C chap -o show
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 0
host.auth.username_in = <empty>
host.auth.password_in = <empty>
# END RECORD
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 1
host.auth.username = <empty>
host.auth.password = <empty>
# END RECORD

[...]
```
Why show isn't the default option in the context of the chap sub-mode, like it is in many other iscsiadm modes and sub-modes is something i haven't quite understood yet. Maybe it's a security measure to not accidentially divulge sensitive information, maybe it has just been overlooked by the developers.

Usually, there are already two initial records with the indexes 0 and 1 present on a HBA. As shown in the example above, each authentication record consists of three parameters. A record index host.auth.tbl_idx to reference it, a username host.auth.username or host.auth.username_in and a password host.auth.password or host.auth.password_in. Depending on whether the record is used for outgoing authentication of an initiator against a target or the other way around for incoming authentication of a target against an initiator, the parameter pairs username/password or username_in/password_in are used. Apparently both types of parameter pairs – incoming and outgoing – cannot be mixed together in a single record. My guess is that this isn't a limitation in Open-iSCSI, but rather a limitation in the specification and/or of the underlying hardware.

New authentication records can be added with the -o new option:
```
root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o new
```
```
root@host:~$ iscsiadm -m host -H 1 -C chap -o show
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 0
host.auth.username_in = <empty>
host.auth.password_in = <empty>
# END RECORD
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 1
host.auth.username = <empty>
host.auth.password = <empty>
# END RECORD
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 2
host.auth.username = <empty>
host.auth.password = <empty>
# END RECORD

[...]
```
Parameters of existing authentication records can be set or updated with the -o update option. The particular record to be set or to be updated is selected in with the -x <host.auth.tbl_idx> option, which references the records host.auth.tbl_idx value. Multiple parameters can be set or updated with a single iscsiadm command by calling it with multiple pairs of -n <parameter-name> and -v <parameter-value>:
```
root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o update -n host.auth.username -v testuser -n host.auth.password -v testpassword
```
```
root@host:~$ iscsiadm -m host -H 1 -C chap -o show
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 0
host.auth.username_in = <empty>
host.auth.password_in = <empty>
# END RECORD
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 1
host.auth.username = <empty>
host.auth.password = <empty>
# END RECORD
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 2
host.auth.username = testuser
host.auth.password = testpassword
# END RECORD

[...]
```
Finally, existing authentication records can be deleted with the -o delete option:
```
root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o delete
```
The flashnode sub-mode displays and alters a table containing information about the iSCSI targets. Calling this sub-mode without any other options displays an overview of the currently configured flash nodes (i.e. targets) on a particular HBA:
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode
qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002
```
Similar to the previously mentioned host mode, each output line of the flashnode sub-mode contains an index number for each flash node entry (i.e. iSCSI target), which is shown in the first pair of square brackets. With this index number the individual flash node entries are referenced in all further operations.

New flash nodes or target entries can be added with the -o new option. This operation also needs the information on whether the target addressed via the flash node will be reached via IPv4 or IPv6 addresses. This is accomplished with the -A ipv4 or -A ipv6 option:
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode -o new -A ipv4
Create new flashnode for host 1.
New flashnode for host 1 added at index 1.
```
If the operation of adding a new flash node is successful, the index under which the new flash node is addressable is returned.
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode
qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002
qla4xxx: [1] 0.0.0.0:3260,0 <empty>
```
Unlike authentication records, the flash node or target records contain a lot more parameters. They can be displayed by selecting a specific record by its index with the -x <flashnode_idx> option. The -o show option is the default and is thus optional:
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1
# BEGIN RECORD 2.0-873
flashnode.session.auto_snd_tgt_disable = 0
flashnode.session.discovery_session = 0
flashnode.session.portal_type = ipv4
flashnode.session.entry_enable = 0
flashnode.session.immediate_data = 0
flashnode.session.initial_r2t = 0
flashnode.session.data_seq_in_order = 1
flashnode.session.data_pdu_in_order = 1
flashnode.session.chap_auth_en = 1
flashnode.session.discovery_logout_en = 0
flashnode.session.bidi_chap_en = 0
flashnode.session.discovery_auth_optional = 0
flashnode.session.erl = 0
flashnode.session.first_burst_len = 0
flashnode.session.def_time2wait = 0
flashnode.session.def_time2retain = 0
flashnode.session.max_outstanding_r2t = 0
flashnode.session.isid = 000e1e17da2c
flashnode.session.tsid = 0
flashnode.session.max_burst_len = 0
flashnode.session.def_taskmgmt_tmo = 10
flashnode.session.targetalias = <empty>
flashnode.session.targetname = <empty>
flashnode.session.discovery_parent_idx = 0
flashnode.session.discovery_parent_type = Sendtarget
flashnode.session.tpgt = 0
flashnode.session.chap_out_idx = 2
flashnode.session.chap_in_idx = 65535
flashnode.session.username = <empty>
flashnode.session.username_in = <empty>
flashnode.session.password = <empty>
flashnode.session.password_in = <empty>
flashnode.session.is_boot_target = 0
flashnode.conn[0].is_fw_assigned_ipv6 = 0
flashnode.conn[0].header_digest_en = 0
flashnode.conn[0].data_digest_en = 0
flashnode.conn[0].snack_req_en = 0
flashnode.conn[0].tcp_timestamp_stat = 0
flashnode.conn[0].tcp_nagle_disable = 0
flashnode.conn[0].tcp_wsf_disable = 0
flashnode.conn[0].tcp_timer_scale = 0
flashnode.conn[0].tcp_timestamp_en = 0
flashnode.conn[0].fragment_disable = 0
flashnode.conn[0].max_xmit_dlength = 0
flashnode.conn[0].max_recv_dlength = 65536
flashnode.conn[0].keepalive_tmo = 0
flashnode.conn[0].port = 3260
flashnode.conn[0].ipaddress = 0.0.0.0
flashnode.conn[0].redirect_ipaddr = 0.0.0.0
flashnode.conn[0].max_segment_size = 0
flashnode.conn[0].local_port = 0
flashnode.conn[0].ipv4_tos = 0
flashnode.conn[0].ipv6_traffic_class = 0
flashnode.conn[0].ipv6_flow_label = 0
flashnode.conn[0].link_local_ipv6 = <empty>
flashnode.conn[0].tcp_xmit_wsf = 0
flashnode.conn[0].tcp_recv_wsf = 0
flashnode.conn[0].statsn = 0
flashnode.conn[0].exp_statsn = 0
# END RECORD
```
From the various parameters of a flash node or target record, the following are the most relevant in day to day use:
- flashnode.session.chap_auth_en: Controls whether the initiator should authenticate against the target. This is enabled by default.
- flashnode.session.bidi_chap_en: Controls whether the target should also authenticate itself against the initiator. This is disabled by default.
- flashnode.session.targetname: The IQN of the target to be logged into and to be accessed.
- flashnode.session.chap_out_idx: The index number (i.e. the value of the host.auth.tbl_idx parameter) of the authentication record to be used for authentication of the initiator against the target.
- flashnode.conn[0].port: The TCP port of the target portal. The default is port 3260.
- flashnode.conn[0].ipaddress: The IP address of the target portal.
The parameter pairs flashnode.session.username/flashnode.session.password and flashnode.session.username_in/flashnode.session.password_in are handled differently than all the other parameters. They are not set or updated directly, but are rather filled in automatically. This is done by setting the respective flashnode.session.chap_out_idx or flashnode.session.chap_in_idx parameter to a value which references the index (i.e. the value host.auth.tbl_idx parameter) of an appropriate authentication record.

Parameters of existing flash nodes or target entries can be set or updated with the -o update option. The particular record of which the parameters are to be set or to be updated is selected with the -x <flashnode_idx> option. This references an index number gathered from the list of flash nodes or a index number returned at the time of creation of a particular flash node. Multiple parameters can be set or updated with a single iscsiadm command by calling it with multiple pairs of -n <parameter-name> and -v <parameter-value>:
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o update -n flashnode.session.chap_out_idx -v 2 -n flashnode.session.targetname -v iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003 -n flashnode.conn[0].ipaddress -v 10.0.0.2
```
The flash node or target entry updated by this command is shown below in a cut-down fashion for brevity:
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1
# BEGIN RECORD 2.0-873
[...]
flashnode.session.chap_auth_en = 1
flashnode.session.discovery_logout_en = 0
flashnode.session.bidi_chap_en = 0
[...]
flashnode.session.targetname = iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003
[...]
flashnode.session.chap_out_idx = 4
flashnode.session.chap_in_idx = 65535
flashnode.session.username = testuser
flashnode.session.username_in = <empty>
flashnode.session.password = testpassword
flashnode.session.password_in = <empty>
[...]
flashnode.conn[0].port = 3260
flashnode.conn[0].ipaddress = 10.0.0.2
flashnode.conn[0].redirect_ipaddr = 0.0.0.0
[...]
# END RECORD
```
Once configured, login and logout actions can be performed on the flash node (i.e. target) with the repective command options:
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o login
root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o logout
```
Unfortunately the iscsiadm command will return with a success status on the login and logout actions once it has passed them successfully to the HBA. This does not reflect on the status of the actual login and logout actions subsequently taken by the HBA against the target configured in the respective flash node! To my knowledge there is currently no information passed back to the command line about the result of the login and logout actions at the HBA levels.

Newly established as well as already existing iSCSI sessions via the hardware initiator which were set up with the login action shown above, are shown along with all other Open-iSCSI session information in the output of the iscsiadm command in its session mode:
```
root@host:~$ iscsiadm -m session
qla4xxx: [1] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash)
qla4xxx: [2] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash)

[...]
```
The fact that the session information is about a iSCSI session established via a hardware initiator is only signified by the flash label in the parentheses at the end of each line. In case of a iSCSI session established via a software initiator, the label in the parentheses reads non-flash.

Finally, existing flash nodes can be deleted with the -o delete option:
```
root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o delete
```

The records from both, the chap and the flashnode table, are stored in the HBAs flash memory. For the limits on how many entries can be stored in each table, see the specification of the particular HBA.

In my opinion, the integration of management of the QLogic hardware initators into the Open-iSCSI iscsiadm command improves and simplifies the administration and configuration a lot over the previous IOCTL based management method via the QLogic QConvergeConsole (qaucli). It finally opens management access to the QLogic hardware initiators to non-“enterprise Linux distributions” like Debian. Definately a big step in the right direction! The importance of using a current version of Open-iSCSI can – in my experience – not be stressed enough. Building and maintaining a package based on a current version from the projects GitHub repository is definitely worth the effort.

One thing i couldn't get to work though was the incoming part of a bi-directional CHAP authentication. In this scenario, not only does the initiator authenticate itself at the target for a iSCSI session to be successfully established, the target also has to authenticate itself against the initiator. My initial thought was, that a setup with bi-directional CHAP authentication should be easily accomplished by just performing the following three steps:

creating an incoming authentication record with the value of the parameters host.auth.username_in and host.auth.password_in set to the respective values configured at the target storage system.
setting the value of the flash node parameter flashnode.session.bidi_chap_en to 1.
setting the value of the flash node parameter flashnode.session.chap_in_idx to the value of the parameter host.auth.tbl_idx, gathered from the newly created incoming authentication record in step 1.

The first two of the above tasks were indeed easily accomplished. The third one seemed easy too, but turned out to be more of a challenge. Setting the flash node parameter flashnode.session.chap_in_idx to the value of the host.auth.tbl_idx parameter from the previously created incoming authentication record just didn't work. Any attempt to change the default value 65535 failed. Neither was the flashnode.session.username_in/flashnode.session.password_in parameter pair automatically updated with the values from the parameters host.auth.username_in and host.auth.password_in. Oddly enough bi-directional CHAP authentication worked as long as there only was one storage system with one set of incoming authentication credentials! Adding another set of flash nodes for a second storage system with its own set of incoming authentication credentials would cause the bi-directional CHAP authentication to fail for all targets on this second storage system.

Being unable to debug this weird behaviour any further on my own, i turned to the Open-iSCSI mailing list for help. See the thread on the Open-iSCSI mailing list for the details. Don't be confused by the fact that the thread at the mailing list was initially about getting to work the network communication with the use of jumbo frames. This initial issue was resolved for me by switching to a more recent Open-iSCSI version as already mentioned above. My last question there was not answered publicly on the mailing list, but Adheer Chandravanshi from development at QLogic got in touch via email. Here's his explanation of the observed behaviour:

[…]

I see that you have added multiple incoming CHAP entries for both hosts 1 and 2.
But looks like even in case of multiple incoming CHAP entries in flash only the first entry takes effect for that particular host for all
the target node entries in flash.
This seems to be a limitation with the flash target node entries.
So you cannot map different incoming CHAP entry for different target nodes in HBA flash.

In your case, only following incoming CHAP entry will be effective for Host 1 as it's the first incoming chap entry. Same goes for Host
2.
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 2
host.auth.username_in = <username-from-target1>
host.auth.password_in = <password-from-target1>
# END RECORD

Try using only one incoming CHAP entry per host, you can have different outgoing CHAP entries though for each flash target node.

[…]

To sum this up, the QLogic HBAs basically use a first match approach when it comes to the incoming part of a bi-directional CHAP authentication. After finding the first incoming authentication record that is configured, it uses the credentials stored there. Any other – and possibly more suitable – records for incoming authentication are ignored. There's also no way to override this behaviour on a case by case basis via the flash node entries (i.e. iSCSI targets).

In my humble opinion this is a rather serious limitation of the security features in the QLogic hardware initiators. No security policy i have ever encountered in any organisation would allow for the reuse of authentication credentials over different systems. Unfortunately i have no further information as to why the implementation turned out this way. Maybe there was no feature request for this yet, maybe it was just an oversight or maybe there is a limitation in the hardware, preventing a more flexible implementation. Unfortunately my reply to the above email with an inquiry whether such a feature would possibly be implemented in future firmware versions has – up to now – not been answered.

0 Comments | 2015-11-13 written by Frank Fegert (XING) | Permanentlink
Tags:

2015-10-01 // Integration of Dell PowerConnect M-Series Switches with RANCID

An – almost – out of the box integration of Dell PowerConnect M-Series switches (specifically M6348 and M8024-k) with version 3.2 of the popular, open source switch and router configuration management tool RANCID.

RANCID has, in the previous version 2.3, already been able to integrate Dell PowerConnect switches through the use of the custom ''dlogin'' and ''drancid'' script addons. There are already several howtos on the net on how to use those addons and also slightly tweaked versions of them here and here.

With version 3.2 of RANCID, either build from source or installed pre-packaged e.g. from Debian testing (stretch), everything has gotten much more straight forward. The only modification needed now is a small patch to the script ''/usr/lib/rancid/bin/srancid'', adapting its output postprocessing part to those particular switch models. This modification is necessary in order to remove constantly changing output – in this case uptime and temperature information – from the commands show version and show system, which are issued by srancid for those particular switch models. For the purpose of clarification, here are output samples from the show version:

switch1# show version

System Description................ Dell Ethernet Switch
System Up Time.................... 90 days, 04h:48m:41s
System Contact.................... <contact email>
System Name....................... <system name>
System Location................... <location>
Burned In MAC Address............. F8B1.566E.4AFB
System Object ID.................. 1.3.6.1.4.1.674.10895.3041
System Model ID................... PCM8024-k
Machine Type...................... PowerConnect M8024-k

unit image1      image2      current-active next-active
---- ----------- ----------- -------------- --------------
1    5.1.3.7     5.1.8.2     image2         image2
2    5.1.3.7     5.1.8.2     image2         image2

and show system:

switch1# show system

System Description: Dell Ethernet Switch
System Up Time: 90 days, 04h:48m:19s
System Contact: <contact email>
System Name: <system name>
System Location: <location>
Burned In MAC Address: F8B1.566E.4AFB
System Object ID: 1.3.6.1.4.1.674.10895.3041
System Model ID: PCM8024-k
Machine Type: PowerConnect M8024-k
Temperature Sensors:

Unit     Description       Temperature    Status
                            (Celsius)
----     -----------       -----------    ------
1        System            39             Good
2        System            39             Good

Power Supplies:

Unit  Description    Status

----  -----------  -----------
NA        NA            NA
NA        NA            NA

commands issued on a stacked pair of Dell PowerConnect M8024-k switches. Without the patch to srancid, the lines starting with System Up Time as well as the lines for each temperature sensor unit would always trigger a configuration change in RANCID, even if there was no real change in configuration. The provided patch adds handling and the subsequent removal of those ever-changing values from the configuration which is stored in the RCS used by RANCID.

Aside from this, the only pitfall with Dell PowerConnect switch models is to follow your intuition and use the RANCID device type dell (see man router.db). This device type is intended to be used with D-Link switches OEMed by Dell and will not work with Dell PowerConnect switch models. The correct device type for Dell PowerConnect switch models is smc, though.

For the sake of completeness, here a full step-by-step configuration example for Dell PowerConnect M6348 and M8024-k switches:

Add the Debian testing (stretch) packet sources to the APT configuration:

root@host:~$ echo 'deb http://ftp.debian.org:80/debian testing main contrib non-free' >> /etc/apt/sources.list.d/testing.list
root@host:~$ apt-get update

Install RANCID v3.2.x from Debian testing (stretch):
```
root@host:~$ apt-get -y install rancid/testing
```
Optional: In case Subversion should be used as a revision control system (RCS) to store the switch configuration, install it:
```
root@host:~$ apt-get -y install subversion
```

Download and apply the patch to the script ''/usr/lib/rancid/bin/srancid'' for proper handling of the system information and configuration output of Dell PowerConnect M6348 and M8024-k switches:

srancid.patch

--- /usr/lib/rancid/bin/srancid.orig	2015-05-07 00:00:19.000000000 +0200
+++ /usr/lib/rancid/bin/srancid	2015-09-24 16:02:23.379156524 +0200
@@ -49,6 +49,8 @@
 #
 # Code tested and working fine on these models:
 #
+#	DELL PowerConnect M8024 / M8024-k
+#	DELL PowerConnect M6348
 #	DELL PowerConnect 62xx
 #	DELL 34xx (partially; configuration is incomplete)
 #
@@ -174,6 +176,7 @@
 sub Dir {
     print STDERR "    In Dir: $_" if ($debug);
 
+	ProcessHistory("COMMENTS","keysort","D1","!Directory contents:\n");
     while (<INPUT>) {
 	s/^\s+\015//g;
 	tr/\015//d;
@@ -184,6 +187,7 @@
 
 	ProcessHistory("COMMENTS","keysort","D1","! $_");
     }
+	ProcessHistory("COMMENTS","keysort","D1","! \n");
     return(0);
 }
 
@@ -198,6 +202,9 @@
 	# pager remnants like: ^H^H^H    ^H^H^H content
 	s/[\b]+\s*[\b]*//g;
 
+	# Remove Uptime
+	/ up time/i && next;
+
 	ProcessHistory("COMMENTS","keysort","B1","! $_");
     }
     return(0);
@@ -218,9 +225,12 @@
 	/ up time/i && next;
 
 	# filter temperature sensor info for Dell 6428 stacks
-	/Temperature Sensors:/ && next;
+	/Temperature Sensors:/ &&
+        ProcessHistory("COMMENTS","keysort","C1","\n! $_") &&
+        next;
 	if (/Temperature \(Celsius\)/ &&
-	    ProcessHistory("COMMENTS","keysort","C1","! Unit\tStatus\n")) {
+	    ProcessHistory("COMMENTS","keysort","C1","! Unit\tStatus\n") &&
+	    ProcessHistory("COMMENTS","keysort","C1","! ----\t------\n")) {
 	    while (<INPUT>) {
 		s/^\s+\015//g;
 		tr/\015//d;
@@ -228,6 +238,26 @@
 		ProcessHistory("COMMENTS","keysort","C1","! $1\t$2\n");
 		/^\s*$/ && last;
 	    }
+    } 
+    # Filter temperature sensor info for Dell M6348 and M8024 blade switches
+    #
+    # M6348 and M8024 sample lines:
+    #   Unit     Description       Temperature    Status
+    #                               (Celsius)
+    #   ----     -----------       -----------    ------
+    #   1        System            39             Good
+    #   2        System            39             Good
+    elsif (/Temperature/ &&
+	    ProcessHistory("COMMENTS","keysort","C1","! Unit\tDescription\tStatus\n") &&
+	    ProcessHistory("COMMENTS","keysort","C1","! ----\t-----------\t------\n")) {
+	    while (<INPUT>) {
+		/\(celsius\)/i && next;
+		s/^\s+\015//g;
+		tr/\015//d;
+		/(\d+)\s+(\w+)\s+\d+\s+(.*)$/ &&
+		ProcessHistory("COMMENTS","keysort","C1","! $1\t$2\t\t$3\n");
+		/^\s*$/ && last;
+	    }
 	}
 
 	/system description: (.*)/i &&
@@ -242,6 +272,7 @@
 sub ShowVlan {
     print STDERR "    In ShowVlan: $_" if ($debug);
 
+	ProcessHistory("COMMENTS","keysort","D1","!VLAN definitions:\n");
     while (<INPUT>) {
 	s/^\s+\015//g;
 	tr/\015//d;
@@ -254,6 +285,7 @@
 	/ up time/i && next;
 	ProcessHistory("COMMENTS","keysort","D1","! $_");
     }
+	ProcessHistory("COMMENTS","keysort","D1","! \n");
     return(0);
 }

root@host:~$ patch < srancid.patch

Edit the global RANCID configuration:
```
root@host:~$ vi /etc/rancid/rancid.conf
```
Select the RCS (CVS, SVN or Git) of your choice. In this example SVN is used:
```
RCSSYS=svn; export RCSSYS
```
Define a name for your Dell PowerConnect device group in the LIST_OF_GROUPS configuration variable. In this example we'll use the name dell-sw:
```
LIST_OF_GROUPS="dell-sw"; export LIST_OF_GROUPS
```
Create the cloginrc configuration file, which will contain the login information for your Dell PowerConnect devices and some default values:
```
root@host:~$ touch /etc/rancid/cloginrc
root@host:~$ chmod 660 /etc/rancid/cloginrc
root@host:~$ chown root:rancid /etc/rancid/cloginrc
root@host:~$ vi /etc/rancid/cloginrc
```
Example:
```
add user        dell-switch-1           dell-user
add password    dell-switch-1           <login-passwort> <enable-passwort>
add noenable    dell-switch-1           0

[...]

add user        *                       <default-user>
add password    *                       <default-login-passwort>
add noenable    *                       1
add method      *                       ssh
```
For the device named dell-switch-1 login as user dell-user with the password <login-passwort>. Only for this system change into the enable mode (add noenable dell-switch-1 0) of the switch and use the password <enable-passwort> to do so.

For all other systems, login as user <default-user> with the password <default-login-passwort>. Deactivate the use of the enable mode on all systems (noenable * 1). The login method for all systems is via SSH.

Since the cloginrc configuration file is parsed in a first-match fashion, the default values must always be at the bottom of the file.
Determine the name of the device type. See man router.db and /etc/rancid/rancid.types.base. In this example and in the general case of Dell PowerConnect M6348 and M8024-k switches the name of the device type is smc:
```
root@host:~$ grep smc /etc/rancid/rancid.types.base
smc;script;srancid
smc;login;hlogin
```
From this we can also determine the login script hlogin and the postprocessing script srancid, which will be used for this device type.

Change to the user rancid:

root@host:~$ su - rancid

Create a symbolic link to the login configuration previously created in /etc/rancid/:
```
rancid@host:~$ ln -s /etc/rancid/cloginrc /var/lib/rancid/.cloginrc
```

Initialize the directory structure for the RCS (CVS, SVN or Git) selected above. This will automatically be done for each device group configured in the LIST_OF_GROUPS configuration variable. The example shown here only creates the directory structure for the device group dell-sw defined above:

rancid@host:~$ /usr/lib/rancid/bin/rancid-cvs
Committed revision 1.
Checked out revision 1.
Updating '.':
At revision 1.
A         configs
Adding         configs

Committed revision 2.
A         router.db
Adding         router.db
Transmitting file data .
Committed revision 3.

rancid@host:~$ find /var/lib/rancid/dell-sw/
/var/lib/rancid/dell-sw
/var/lib/rancid/dell-sw/configs
/var/lib/rancid/dell-sw/router.db
/var/lib/rancid/dell-sw/routers.all
/var/lib/rancid/dell-sw/routers.down
/var/lib/rancid/dell-sw/routers.up
/var/lib/rancid/dell-sw/.svn
/var/lib/rancid/dell-sw/.svn/entries
/var/lib/rancid/dell-sw/.svn/format
/var/lib/rancid/dell-sw/.svn/pristine
/var/lib/rancid/dell-sw/.svn/pristine/da
/var/lib/rancid/dell-sw/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base
/var/lib/rancid/dell-sw/.svn/tmp
/var/lib/rancid/dell-sw/.svn/wc.db

Add switch devices by their hostname to the configuration file router.db of the corresponding device group:
```
rancid@host:~$ vi /var/lib/rancid/dell-sw/router.db
```
In this example the device group dell-sw, the device type smc and the system dell-switch-1:
```
dell-switch-1;smc;up;A comment describing the system dell-switch-1
```

Perform a login test with the previously determined login script hlogin on the newly defined system dell-switch-1. The following example output shows the steps that should automatically be performed by the hlogin expect script. No manual intervention should be necessary.

rancid@host:~$ /usr/lib/rancid/bin/hlogin dell-switch-1
dell-switch-1
spawn ssh -c 3des -x -l root dell-switch-1
The authenticity of host 'dell-switch-1 (<ip address>)' can't be established.
RSA key fingerprint is <rsa key fingerprint>
Are you sure you want to continue connecting (yes/no)?
Host dell-switch-1 added to the list of known hosts.
yes
Warning: Permanently added 'dell-switch-1,<ip address>' (RSA) to the list of known hosts.
root@dell-switch-1's password:

dell-switch-1>enable
Password:**********

dell-switch-1#

Finish the login test by manually logging out of the system:
```
dell-switch-1#exit
dell-switch-1>exit
rancid@host:~$ 
```
Manually perform a initial RANCID run to make sure everything works as expected:
```
rancid@host:~$ rancid-run
```
If everything ran successfully, there should now be a file /var/lib/rancid/dell-sw/configs/dell-switch-1 containing the output of the commands show version, show system and show running-config for the system dell-switch-1.

Create the email aliases necessary for the proper delivery of the emails generated by RANCID. Again in this example for the device group dell-sw:
```
root@host:~$ vi /etc/aliases
```
```
rancid-dell-sw:       <email>@<domain>
rancid-admin-dell-sw: <email>@<domain>
```
Recreate your aliases DB. In case postfix is used as an MTA:
```
root@host:~$ postalias /etc/aliases
```
Enable the RANCID cron jobs. Adjust the execution times and intervals according to your needs:
```
root@host:~$ vi /etc/cron.d/rancid
```

Some final words: The contents of the directories /var/lib/rancid/<device group>/ and /var/lib/rancid/<device group>/configs/ are maintained in the RCS – CVS, SVN or Git – of your choice. You can operate on those directories with the usual commands of the selected RCS. There are also some really nice and intuitive web frontends to the RCS of choice. For me, the combination of SVN as RCS and WebSVN as a web frontend worked out very well.

0 Comments | 2015-10-01 written by Frank Fegert (XING) | Permanentlink
Tags:

2015-08-15 // Dell EqualLogic PS Series - Security

An initial setup is always a good opportunity to take a more in-depth look at systems, while they are not yet in production. In this case i'm looking at one of the advertised security measures of Dell EqualLogic PS Series systems.

We're currently in the process of implementing iSCSI-based Dell EqualLogic PS Series systems. Specifically we're using PS-M4110E and PS-M4110X models in several Dell M1000e blade chassis. The EqualLogic storages and the hosts they provide storage space for are being connected with PowerConnect M8024-K switches. The switches are located in the slots B1 and B2 of the blade chassis and are dedicated for iSCSI traffic with 10 gigabit ethernet.

Although the separation of SAN and LAN could be considered enough security, i was really pleased to read in the Dell EqualLogic Group Manager Administrator's Manual - PS Series Firmware Version 7.0 (110-6152-EN-R1) that the folks at EqualLogic thought otherwise and added further security measures in order to protect the EqualLogic PS Series systems from unauthorized access to its management functions over the iSCSI network. A quote from page 76 of said document reads:

[…]

About Dedicated Management Networks

For increased security, or if your environment requires the separation of management traffic and iSCSI traffic, you can configure a dedicated management network (DMN) that is used only for administrative access to the group. The management network is separate from the network that handles iSCSI traffic to the group.

- Without a dedicated management network (the default configuration), administrators connect to the group IP address for both administrative access to the group and iSCSI initiator access to iSCSI targets (volumes and snapshots).
- With a dedicated management network, administrators do not use the group IP address for administrative access to the group. Instead, administrators connect to the management network address. All iSCSI traffic, including traffic by replication partners, and access to Dell EqualLogic Auto-Snapshot Manager/Linux Edition (ASM/LE), Dell EqualLogic Auto-Snapshot Manager/Microsoft Edition (ASM/ME), and Dell EqualLogic Virtual Storage Manager for VMware (formerly ASM/VE), continues to use the group IP address. SAN Headquarters can connect to the group using either the management network address or the iSCSI address.

[…]

And further on page 78 of the same document:

[…]

When you complete the management network configuration, administrators cannot log in to the group using the group IP address. Instead, administrators must use the new management IP address. Any open GUI or CLI sessions using the group IP address eventually time out and close.
After configuring a dedicated management network, you might need to:
- Inform administrators of the new management network IP address.
- If you run the Group manager GUI as a standalone application and have a shortcut on the computer's desktop, the group address in the shortcut is not updated with the new management address. You must uninstall and then reinstall the GUI application.
- If you are running SAN Headquarters, you must update the group IP address in the application to the dedicated management address. For more information, see the SAN Headquarters documentation.

[…]

Judging from those two sections it would appear that the EqualLogic PS Series systems have no or at least a very small attack surface on the – potentially untrustworthy – host-facing network dedicated to iSCSI traffic. In open systems this is usually achieved by binding the services which are necessary for the management function to a specific network interface, instead of letting them listen on all available interfaces.

Using a DMN and thus separating the management traffic from the iSCSI traffic resulted in our case in the following configuration example:

group1-grp(member_group1)> eth show
Name ifType          ifSpeed    Mtu  Ipaddress                     Status Errors DCB
---- --------------- ---------- ---- ----------------------------- ------ ------ ------
eth0 ethernet-csmacd 10 Gbps    9000 10.0.0.1                      up     0      off
eth1 ethernet-csmacd 100 Mbps   1500 123.123.123.123               up     0      off

group1-grp(member_group1)> eth select 0 show
_______________________________ Eth Information _______________________________
Name: eth0
Status: up
Changed: Mon Jul 20 12:23:15 2015
Type: ethernet-csmacd
DesiredStatus: up
Mtu: 9000
Speed: 10 Gbps
HardwareAddress: B0:83:FE:CC:52:C1
IPAddress: 10.0.0.1
NetMask: 255.255.0.0
IPv6Address:
Description: group1 iSCSI Interface
SupportsManagement: no
ManagementStatus: normal
DCB: off

group1-grp(member_group1)> eth select 1 show
_______________________________ Eth Information _______________________________
Name: eth1
Status: up
Changed: Wed Jul 29 08:33:35 2015
Type: ethernet-csmacd
DesiredStatus: up
Mtu: 1500
Speed: 100 Mbps
HardwareAddress: B0:83:FE:CC:52:C2
IPAddress: 123.123.123.123
NetMask: 255.255.255.0
IPv6Address:
Description: group1 Management Interface
SupportsManagement: only
ManagementStatus: mgmt
DCB: off

Here the lines with the SupportsManagement options could be construed in the way, that management access is not possible over the eth0 interface, which in this configuration is the connection to the host-facing iSCSI network.

From previous lessons learned not to be overly trusty of lofty vendor promises, i decided to double check this with the simple use of the nmap network and port scanner. The results of the TCP and UDP port scans against both the member and the group IP address are shown below.

Member IP - TCP scan:

root@host:~$ nmap -sS -p 0-65535 10.0.0.1

Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-15 11:14 CEST
Nmap scan report for ******** (10.0.0.1)
Host is up (0.000097s latency).
Not shown: 9991 closed ports
PORT     STATE SERVICE
21/tcp   open  ftp
22/tcp   open  ssh
80/tcp   open  http
443/tcp  open  https
2606/tcp open  netmon
3002/tcp open  exlm-agent
3003/tcp open  cgms
3260/tcp open  iscsi
9876/tcp open  sd
20002/tcp open  commtact-http
20003/tcp open  unknown
25555/tcp open  unknown
MAC Address: B0:83:FE:CC:52:C1 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 107.66 seconds

Member IP - UDP scan:

root@host:~$ nmap -sU -p 0-65535 10.0.0.1

Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-20 09:27 CEST
Nmap scan report for ******** (10.0.0.1)
Host is up (0.000086s latency).
Not shown: 65532 closed ports
PORT      STATE         SERVICE
0/udp     open|filtered unknown
123/udp   open          ntp
161/udp   open          snmp
65519/udp open|filtered unknown
MAC Address: B0:83:FE:CC:52:C1 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 2.55 seconds

Group IP - TCP scan:

root@host:~$ nmap -sS -p 0-65535 10.0.0.2

Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-15 11:17 CEST
Nmap scan report for ******** (10.0.0.2)
Host is up (0.00010s latency).
Not shown: 9991 closed ports
PORT     STATE SERVICE
21/tcp   open  ftp
22/tcp   open  ssh
80/tcp   open  http
443/tcp  open  https
2606/tcp open  netmon
3002/tcp open  exlm-agent
3003/tcp open  cgms
3260/tcp open  iscsi
9876/tcp open  sd
20002/tcp open  commtact-http
20003/tcp open  unknown
25555/tcp open  unknown
MAC Address: B0:83:FE:CC:52:C1 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 103.71 seconds

Group IP - UDP scan:

root@host:~$ nmap -sU -p 0-65535 10.0.0.2

Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-20 09:28 CEST
Nmap scan report for ******** (10.0.0.2)
Host is up (0.00013s latency).
Not shown: 65532 closed ports
PORT      STATE         SERVICE
0/udp     open|filtered unknown
123/udp   open          ntp
161/udp   open          snmp
65519/udp open|filtered unknown
MAC Address: B0:83:FE:CC:52:C1 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 3.07 seconds

These scan results are pretty disappointing with regard to the expected additional security measures mentioned above.

An actual login test via SSH and FTP was successful and i guess a login via Telnet would have been successful too, if the protocol hadn't already been disabled in our configuration. This means there is also no filter on the application layer preventing access to the management functions, which wouldn't have been recognized by the simple port scan above.

As described earlier it shouldn't be too hard binding the services which are necessary for the management function to a specific network interface. I can't help but wonder why EqualLogic didn't follow through with this, especially since it's already described in the product manual. Maybe there were too many feature requests from the customer side or even technical requirements to better integrate the EqualLogic PS Series systems with certain host systems. Even then i wonder why EqualLogic didn't at least provide some means of selectively enabling or disabling the access to the management function from the iSCSI network with some kind of configuration parameter. In any case, the situation at hand – describing one thing and implementing another – seems to be the least favorable one to me.

0 Comments | 2015-08-15 written by Frank Fegert (XING) | Permanentlink
Tags: