bityard Blog

// Check_MK Monitoring - Open-iSCSI

The Open-iSCSI project provides a high-performance, transport independent, implementation of RFC 3720 iSCSI for Linux. It allows remote access to SCSI targets via TCP/IP over several different transport technologies. This article introduces a new Check_MK service check to monitor the status of Open-iSCSI sessions as well as the monitoring of several statistical metrics on Open-iSCSI sessions and iSCSI hardware initiator hosts.

For the impatient and TL;DR here is the Check_MK package of the Open-iSCSI monitoring checks:

Open-iSCSI monitoring checks (Compatible with Check_MK versions 1.2.8 and later)

The sources are to be found in my Check_MK repository on GitHub


The Check_MK service check to monitor Open-iSCSI consists of two major parts, an agent plugin and three check plugins.

The first part, a Check_MK agent plugin named open-iscsi, is a simple Bash shell script. It calls the Open-iSCSI administration tool iscsiadm in order to retrieve a list of currently active iSCSI sessions. The exact call to iscsiadm to retrieve the session list is:

/usr/bin/iscsiadm -m session -P 1

If there are any active iSCSI sessions, the open-iscsi agent plugin also tries to collect several statistics for each iSCSI session. This is done by another call to iscsiadm for each iSCSI Session ${SID}, which is shown in the following example:

/usr/bin/iscsiadm -m session -r ${SID} -s

Unfortunately, the iSCSI session statistics are currently only supported for Open-iSCSI software initiators or dependent hardware iSCSI initiators like the Broadcom BCM577xx or BCM578xx adapters which are covered by the bnx2i kernel module. See Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell and Backporting Open-iSCSI to Debian 8 "Jessie" for additional information on those dependent hardware iSCSI initiators.

For hardware iSCSI initiators, like the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs, which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware, there is currently no support for iSCSI session statistics. Instead, the open-iscsi agent plugin collects several global statistics on each iSOE host ${HST} which is covered by the qla4xxx kernel module with the command shown in the following example:

/usr/bin/iscsiadm -m host -H ${HST} -C stats

The output of the above commands is parsed and reformated by the agent plugin for easier processing in the check plugins. The following example shows the agent plugin output for a system with two BCM578xx dependent hardware iSCSI initiators:

<<<open-iscsi_sessions>>>
bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:bf:2d eth2 10.0.3.52 LOGGED_IN LOGGED_IN NO_CHANGE
bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:c2:34 eth3 10.0.3.53 LOGGED_IN LOGGED_IN NO_CHANGE

<<<open-iscsi_session_stats>>>
[session stats f8:ca:b8:7d:bf:2d iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>]
txdata_octets: 40960
rxdata_octets: 461171313
noptx_pdus: 0
scsicmd_pdus: 153967
tmfcmd_pdus: 0
login_pdus: 0
text_pdus: 0
dataout_pdus: 0
logout_pdus: 0
snack_pdus: 0
noprx_pdus: 0
scsirsp_pdus: 153967
tmfrsp_pdus: 0
textrsp_pdus: 0
datain_pdus: 112420
logoutrsp_pdus: 0
r2t_pdus: 0
async_pdus: 0
rjt_pdus: 0
digest_err: 0
timeout_err: 0

[session stats f8:ca:b8:7d:c2:34 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>]
txdata_octets: 16384
rxdata_octets: 255666052
noptx_pdus: 0
scsicmd_pdus: 84312
tmfcmd_pdus: 0
login_pdus: 0
text_pdus: 0
dataout_pdus: 0
logout_pdus: 0
snack_pdus: 0
noprx_pdus: 0
scsirsp_pdus: 84312
tmfrsp_pdus: 0
textrsp_pdus: 0
datain_pdus: 62418
logoutrsp_pdus: 0
r2t_pdus: 0
async_pdus: 0
rjt_pdus: 0
digest_err: 0
timeout_err: 0

The next example shows the agent plugin output for a system with two QLogic 8200 Series hardware iSCSI initiators:

<<<open-iscsi_sessions>>>
qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7d.ipv4.0 none 10.0.3.50 LOGGED_IN Unknown Unknown
qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7e.ipv4.0 none 10.0.3.51 LOGGED_IN Unknown Unknown

<<<open-iscsi_host_stats>>>
[host stats f8:ca:b8:7d:c1:7d iqn.2000-04.com.qlogic:isp8214.000e1e3574ac.4]
mactx_frames: 563454
mactx_bytes: 52389948
mactx_multicast_frames: 877513
mactx_broadcast_frames: 0
mactx_pause_frames: 0
mactx_control_frames: 0
mactx_deferral: 0
mactx_excess_deferral: 0
mactx_late_collision: 0
mactx_abort: 0
mactx_single_collision: 0
mactx_multiple_collision: 0
mactx_collision: 0
mactx_frames_dropped: 0
mactx_jumbo_frames: 0
macrx_frames: 1573455
macrx_bytes: 440845678
macrx_unknown_control_frames: 0
macrx_pause_frames: 0
macrx_control_frames: 0
macrx_dribble: 0
macrx_frame_length_error: 0
macrx_jabber: 0
macrx_carrier_sense_error: 0
macrx_frame_discarded: 0
macrx_frames_dropped: 1755017
mac_crc_error: 0
mac_encoding_error: 0
macrx_length_error_large: 0
macrx_length_error_small: 0
macrx_multicast_frames: 0
macrx_broadcast_frames: 0
iptx_packets: 508160
iptx_bytes: 29474232
iptx_fragments: 0
iprx_packets: 401785
iprx_bytes: 354673156
iprx_fragments: 0
ip_datagram_reassembly: 0
ip_invalid_address_error: 0
ip_error_packets: 0
ip_fragrx_overlap: 0
ip_fragrx_outoforder: 0
ip_datagram_reassembly_timeout: 0
ipv6tx_packets: 0
ipv6tx_bytes: 0
ipv6tx_fragments: 0
ipv6rx_packets: 0
ipv6rx_bytes: 0
ipv6rx_fragments: 0
ipv6_datagram_reassembly: 0
ipv6_invalid_address_error: 0
ipv6_error_packets: 0
ipv6_fragrx_overlap: 0
ipv6_fragrx_outoforder: 0
ipv6_datagram_reassembly_timeout: 0
tcptx_segments: 508160
tcptx_bytes: 19310736
tcprx_segments: 401785
tcprx_byte: 346637456
tcp_duplicate_ack_retx: 1
tcp_retx_timer_expired: 1
tcprx_duplicate_ack: 0
tcprx_pure_ackr: 0
tcptx_delayed_ack: 106449
tcptx_pure_ack: 106489
tcprx_segment_error: 0
tcprx_segment_outoforder: 0
tcprx_window_probe: 0
tcprx_window_update: 695915
tcptx_window_probe_persist: 0
ecc_error_correction: 0
iscsi_pdu_tx: 401697
iscsi_data_bytes_tx: 29225
iscsi_pdu_rx: 401697
iscsi_data_bytes_rx: 327355963
iscsi_io_completed: 101
iscsi_unexpected_io_rx: 0
iscsi_format_error: 0
iscsi_hdr_digest_error: 0
iscsi_data_digest_error: 0
iscsi_sequence_error: 0

[host stats f8:ca:b8:7d:c1:7e iqn.2000-04.com.qlogic:isp8214.000e1e3574ad.5]
mactx_frames: 563608
mactx_bytes: 52411412
mactx_multicast_frames: 877517
mactx_broadcast_frames: 0
mactx_pause_frames: 0
mactx_control_frames: 0
mactx_deferral: 0
mactx_excess_deferral: 0
mactx_late_collision: 0
mactx_abort: 0
mactx_single_collision: 0
mactx_multiple_collision: 0
mactx_collision: 0
mactx_frames_dropped: 0
mactx_jumbo_frames: 0
macrx_frames: 1573572
macrx_bytes: 441630442
macrx_unknown_control_frames: 0
macrx_pause_frames: 0
macrx_control_frames: 0
macrx_dribble: 0
macrx_frame_length_error: 0
macrx_jabber: 0
macrx_carrier_sense_error: 0
macrx_frame_discarded: 0
macrx_frames_dropped: 1755017
mac_crc_error: 0
mac_encoding_error: 0
macrx_length_error_large: 0
macrx_length_error_small: 0
macrx_multicast_frames: 0
macrx_broadcast_frames: 0
iptx_packets: 508310
iptx_bytes: 29490504
iptx_fragments: 0
iprx_packets: 401925
iprx_bytes: 355436636
iprx_fragments: 0
ip_datagram_reassembly: 0
ip_invalid_address_error: 0
ip_error_packets: 0
ip_fragrx_overlap: 0
ip_fragrx_outoforder: 0
ip_datagram_reassembly_timeout: 0
ipv6tx_packets: 0
ipv6tx_bytes: 0
ipv6tx_fragments: 0
ipv6rx_packets: 0
ipv6rx_bytes: 0
ipv6rx_fragments: 0
ipv6_datagram_reassembly: 0
ipv6_invalid_address_error: 0
ipv6_error_packets: 0
ipv6_fragrx_overlap: 0
ipv6_fragrx_outoforder: 0
ipv6_datagram_reassembly_timeout: 0
tcptx_segments: 508310
tcptx_bytes: 19323952
tcprx_segments: 401925
tcprx_byte: 347398136
tcp_duplicate_ack_retx: 2
tcp_retx_timer_expired: 4
tcprx_duplicate_ack: 0
tcprx_pure_ackr: 0
tcptx_delayed_ack: 106466
tcptx_pure_ack: 106543
tcprx_segment_error: 0
tcprx_segment_outoforder: 0
tcprx_window_probe: 0
tcprx_window_update: 696035
tcptx_window_probe_persist: 0
ecc_error_correction: 0
iscsi_pdu_tx: 401787
iscsi_data_bytes_tx: 37970
iscsi_pdu_rx: 401791
iscsi_data_bytes_rx: 328112050
iscsi_io_completed: 127
iscsi_unexpected_io_rx: 0
iscsi_format_error: 0
iscsi_hdr_digest_error: 0
iscsi_data_digest_error: 0
iscsi_sequence_error: 0

Although a simple Bash shell script, the agent plugin open-iscsi has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands iscsiadm, sed, tr and egrep. On Debian based systems, the necessary packages can be installed with the following command:

root@host:~# apt-get install coreutils grep open-iscsi sed

The second part of the Check_MK service check for Open-iSCSI provides the necessary check logic through individual inventory and check functions. This is implemented in the three Check_MK check plugins open-iscsi_sessions, open-iscsi_host_stats and open-iscsi_session_stats, which will be discussed separately in the following sections.

Open-iSCSI Session Status

The check plugin open-iscsi_sessions is responsible for the monitoring of individual iSCSI sessions and their internal session states. Upon inventory this check plugin creates a service check for each pair of iSCSI network interface name and IQN of the iSCSI target volume. Unlike the iSCSI session ID, which changes over time (e.g. after iSCSI logout and login), this pair uniquely identifies a iSCSI session on a host. During normal check execution, the list of currently active iSCSI sessions on a host is compared to the list of active iSCSI sessions gathered during inventory on that host. If a session is missing or if the session has an erroneous internal state, an alarm is raised accordingly.

For all types of initiators – software, dependent hardware and hardware – there is the state session_state which can take on the following values:

ISCSI_STATE_FREE
ISCSI_STATE_LOGGED_IN
ISCSI_STATE_FAILED
ISCSI_STATE_TERMINATE
ISCSI_STATE_IN_RECOVERY
ISCSI_STATE_RECOVERY_FAILED
ISCSI_STATE_LOGGING_OUT

An alarm is raised if the session is in any state other than ISCSI_STATE_LOGGED_IN. For software and dependent hardware initiators there are two additional states – connection_state and internal_state. The state connection_state can take on the values:

FREE
TRANSPORT WAIT
IN LOGIN
LOGGED IN
IN LOGOUT
LOGOUT REQUESTED
CLEANUP WAIT

and internal_state can take on the values:

NO CHANGE
CLEANUP       
REOPEN
REDIRECT

In addition to the above session_state, an alarm is raised if the connection_state is in any other state than LOGGED IN and internal_state is in any other state than NO CHANGE.

No performance data is currently reported by this check.

Open-iSCSI Hosts Statistics

The check plugin open-iscsi_host_stats is responsible for the monitoring of the global statistics on a iSOE host. Upon inventory this check plugin creates a service check for each pair of MAC address and iSCSI network interface name. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is determined for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.

With the additional WATO plugin open-iscsi_host_stats.py it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSOE host statistics levels can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Storage, Filesystems and Files
         -> Open-iSCSI Host Statistics
            -> Create Rule in Folder ...
               -> The levels for the Open-iSCSI host statistics values
                  [x] The levels for the number of transmitted MAC/Layer2 frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 bytes on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 bytes on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 multicast frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 multicast frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 broadcast frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 broadcast frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 pause frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 pause frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 control frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 dropped frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 dropped frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 abort frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 jumbo frames on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 late transmit collisions on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 single transmit collisions on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 multiple transmit collisions on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 collisions on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 dribble on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 frame length errors on an iSOE host.
                  [x] The levels for the number of discarded received MAC/Layer2 frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 jabber on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 carrier sense errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 CRC errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 encoding errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 length too large errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 length too small errors on an iSOE host.
                  [x] The levels for the number of transmitted IP packets on an iSOE host.
                  [x] The levels for the number of received IP packets on an iSOE host.
                  [x] The levels for the number of transmitted IP bytes on an iSOE host.
                  [x] The levels for the number of received IP bytes on an iSOE host.
                  [x] The levels for the number of transmitted IP fragments on an iSOE host.
                  [x] The levels for the number of received IP fragments on an iSOE host.
                  [x] The levels for the number of IP datagram reassemblies on an iSOE host.
                  [x] The levels for the number of IP invalid address errors on an iSOE host.
                  [x] The levels for the number of IP packet errors on an iSOE host.
                  [x] The levels for the number of IP fragmentation overlaps on an iSOE host.
                  [x] The levels for the number of IP fragmentation out-of-order on an iSOE host.
                  [x] The levels for the number of IP datagram reassembly timeouts on an iSOE host.
                  [x] The levels for the number of transmitted IPv6 packets on an iSOE host.
                  [x] The levels for the number of received IPv6 packets on an iSOE host.
                  [x] The levels for the number of transmitted IPv6 bytes on an iSOE host.
                  [x] The levels for the number of received IPv6 bytes on an iSOE host.
                  [x] The levels for the number of transmitted IPv6 fragments on an iSOE host.
                  [x] The levels for the number of received IPv6 fragments on an iSOE host.
                  [x] The levels for the number of IPv6 datagram reassemblies on an iSOE host.
                  [x] The levels for the number of IPv6 invalid address errors on an iSOE host.
                  [x] The levels for the number of IPv6 packet errors on an iSOE host.
                  [x] The levels for the number of IPv6 fragmentation overlaps on an iSOE host.
                  [x] The levels for the number of IPv6 fragmentation out-of-order on an iSOE host.
                  [x] The levels for the number of IPv6 datagram reassembly timeouts on an iSOE host.
                  [x] The levels for the number of transmitted TCP segments on an iSOE host.
                  [x] The levels for the number of received TCP segments on an iSOE host.
                  [x] The levels for the number of transmitted TCP bytes on an iSOE host.
                  [x] The levels for the number of received TCP bytes on an iSOE host.
                  [x] The levels for the number of duplicate TCP ACK retransmits on an iSOE host.
                  [x] The levels for the number of received TCP retransmit timer expiries on an iSOE host.
                  [x] The levels for the number of received TCP duplicate ACKs on an iSOE host.
                  [x] The levels for the number of received TCP pure ACKs on an iSOE host.
                  [x] The levels for the number of transmitted TCP delayed ACKs on an iSOE host.
                  [x] The levels for the number of transmitted TCP pure ACKs on an iSOE host.
                  [x] The levels for the number of received TCP segment errors on an iSOE host.
                  [x] The levels for the number of received TCP segment out-of-order on an iSOE host.
                  [x] The levels for the number of received TCP window probe on an iSOE host.
                  [x] The levels for the number of received TCP window update on an iSOE host.
                  [x] The levels for the number of transmitted TCP window probe persist on an iSOE host.
                  [x] The levels for the number of transmitted iSCSI PDUs on an iSOE host.
                  [x] The levels for the number of received iSCSI PDUs on an iSOE host.
                  [x] The levels for the number of transmitted iSCSI Bytes on an iSOE host.
                  [x] The levels for the number of received iSCSI Bytes on an iSOE host.
                  [x] The levels for the number of iSCSI I/Os completed on an iSOE host.
                  [x] The levels for the number of iSCSI unexpected I/Os on an iSOE host.
                  [x] The levels for the number of iSCSI format errors on an iSOE host.
                  [x] The levels for the number of iSCSI header digest (CRC) errors on an iSOE host.
                  [x] The levels for the number of iSCSI data digest (CRC) errors on an iSOE host.
                  [x] The levels for the number of iSCSI sequence errors on an iSOE host.
                  [x] The levels for the number of ECC error corrections on an iSOE host.

The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions (iSCSI Session Status) and open-iscsi_host_stats (iSCSI Host Stats) service checks over two QLogic 8200 Series hardware iSCSI initiators:

Status output example for open-iscsi_sessions and open-iscsi_host_stats service checks over QLogic 8200 Series hardware iSCSI initiators

This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state – in this example LOGGED_IN – is shown. There are also two iSCSI Host Stats service check items in the example, which are pairs of MAC addresses and iSCSI network interface names. For each of those items the current throughput rate on the MAC, IP/IPv6, TCP and iSCSI protocol layer is shown. The throughput rate on the MAC protocol layer is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.

The following three images show examples of the PNP4Nagios graphs for the open-iscsi_host_stats (iSCSI Host Stats) service check.

Example PNP4Nagios graphs for a open-iscsi_host_stats service check (MAC Frames, Traffic, MAC Errors)

The middle graph shows a combined view of the throughput rate for received and transmitted traffic on the different MAC, IP/IPv6, TCP and iSCSI protocol layers. The upper graph shows the throughput rate for various frame types on the MAC protocol layer. The lower graph shows the rate for various error frame types on the MAC protocol layer.

Example PNP4Nagios graphs for a open-iscsi_host_stats service check (IP Packets and Fragments, IP Errors, TCP Segments)

The upper graph shows the throughput rate for received and transmitted traffic on the IP/IPv6 protocol layer. The middle graph shows the rate for various error packet types on the IP/IPv6 protocol layer. The lower graph shows the throughput rate for received and transmitted traffic on the TCP protocol layer.

Example PNP4Nagios graphs for a open-iscsi_host_stats service check (TCP Errors, ECC Error Correction, iSCSI PDUs, iSCSI Errors)

The first graph shows the rate for various protocol control and error segment types on the TCP protocol layer. The second graph shows the rate of ECC error corrections that occured on the QLogic 8200 Series hardware iSCSI initiator. The third graph shows the throughput rate for received and transmitted traffic on the iSCSI protocol layer. The fourth and last graph shows the rate for various control and error PDUs on the iSCSI protocol layer.

Open-iSCSI Session Statistics

The check plugin open-iscsi_session_stats is responsible for the monitoring of the statistics on individual iSCSI sessions. Upon inventory this check plugin creates a service check for each pair of MAC address of the network interface and IQN of the iSCSI target volume. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is collected for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.

With the additional WATO plugin open-iscsi_session_stats.py it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSCSI session statistics levels can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Storage, Filesystems and Files
         -> Open-iSCSI Session Statistics
            -> Create Rule in Folder ...
               -> The levels for the Open-iSCSI session statistics values
                  [x] The levels for the number of transmitted bytes in an Open-iSCSI session
                  [x] The levels for the number of received bytes in an Open-iSCSI session
                  [x] The levels for the number of digest (CRC) errors in an Open-iSCSI session
                  [x] The levels for the number of timeout errors in an Open-iSCSI session
                  [x] The levels for the number of transmitted NOP commands in an Open-iSCSI session
                  [x] The levels for the number of received NOP commands in an Open-iSCSI session
                  [x] The levels for the number of transmitted SCSI command requests in an Open-iSCSI session
                  [x] The levels for the number of received SCSI command reponses in an Open-iSCSI session
                  [x] The levels for the number of transmitted task management function commands in an Open-iSCSI session
                  [x] The levels for the number of received task management function responses in an Open-iSCSI session
                  [x] The levels for the number of transmitted login requests in an Open-iSCSI session
                  [x] The levels for the number of transmitted logout requests in an Open-iSCSI session
                  [x] The levels for the number of received logout responses in an Open-iSCSI session
                  [x] The levels for the number of transmitted text PDUs in an Open-iSCSI session
                  [x] The levels for the number of received text PDUs in an Open-iSCSI session
                  [x] The levels for the number of transmitted data PDUs in an Open-iSCSI session
                  [x] The levels for the number of received data PDUs in an Open-iSCSI session
                  [x] The levels for the number of transmitted single negative ACKs in an Open-iSCSI session
                  [x] The levels for the number of received ready to transfer PDUs in an Open-iSCSI session
                  [x] The levels for the number of received reject PDUs in an Open-iSCSI session
                  [x] The levels for the number of received asynchronous messages in an Open-iSCSI session

The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions (iSCSI Session Status) and open-iscsi_session_stats (iSCSI Session Stats) service checks over two BCM578xx dependent hardware iSCSI initiators:

Status output example for open-iscsi_sessions and open-iscsi_session_stats service checks over BCM578xx dependent hardware iSCSI initiators

This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state, connection_state and internal_state – in this example with the respective values LOGGED_IN, LOGGED_IN and NO_CHANGE – are shown. There are also an equivalent number of iSCSI Session Stats service check items in the example, which are also pairs of MAC addresses of the network interfaces and IQNs of the iSCSI target volumes. For each of those items the current throughput rate of the individual iSCSI session is shown. As long as the rate of the digest (CRC) and timeout error counters is zero, the string no protocol errors is displayed. Otherwise the name and throughput rate of any non-zero error counter is shown. The throughput rate of the iSCSI session is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.

The following image shows an example of the three PNP4Nagios graphs for a single open-iscsi_session_stats (iSCSI Session Stats) service check.

Example PNP4Nagios graphs for a single open-iscsi_session_stats service check

The upper graph shows the throughput rate for received and transmitted traffic of the iSCSI session. The middle graph shows the rate for received and transmitted iSCSI PDUs, broken down by the different types of PDUs on the iSCSI protocol layer. The lower graph shows the rate for the digest (CRC) and timeout errors on the iSCSI protocol layer.

The described Check_MK service check to monitor the status of Open-iSCSI sessions, Open-iSCSI session metrics and iSCSI hardware initiator host metrics has been verified to work with version 2.0.874-2~bpo8+1 of the open-iscsi package from the backports repository of Debian stable (Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.

I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.

// Check_MK Monitoring - XFS Filesystem Quotas

XFS, like most other modern filesystems, offers the ability to configure and use disk quotas. Usually those quotas set limits with regard to the amount of allocatable disk space or number of filesystem objects. Those limits are in most filesystems bound to users or groups of users. XFS is one of the few filesystems which offer quota limits on a per directory basis. In the case of XFS directory quotas are implemented via projects. This article introduces a new Check_MK service check to monitor the use of XFS filesystem quotas with a strong focus on directory based quotas.

For the impatient and TL;DR here is the Check_MK package of the XFS filesystem quota monitoring checks:

XFS filesystem quota monitoring checks (Compatible with Check_MK versions 1.2.6 and earlier)
XFS filesystem quota monitoring checks (Compatible with Check_MK versions 1.2.8 and later)

The sources are to be found in my Check_MK repository on GitHub


The Check_MK service check to monitor XFS filesystem quotas consists of two major parts, an agent plugin and a check plugin.

The Check_MK agent plugin named xfs_quota is a simple Bash shell script. It calls the XFS quota administration tool xfs_quota in order to retrieve a report of the current quota usage. The exact call to xfs_quota is:

/usr/sbin/xfs_quota -x -c 'report -p -b -i -a'

which currently reports only block and inode quotas of projects or directories on all availables XFS filesystems. An example output of the above command is:

Project quota on /srv/xfs (/dev/mapper/vg00-xfs)
                               Blocks                                          Inodes                     
Project ID       Used       Soft       Hard    Warn/Grace           Used       Soft       Hard    Warn/ Grace     
---------- -------------------------------------------------- -------------------------------------------------- 
test1               0          0       1024     00 [--------]          1          0          0     00 [--------]
test2               0          0       2048     00 [--------]          1          0          0     00 [--------]
test3               0          0       3072     00 [--------]          1          0          0     00 [--------]

The output of the above command is parsed and reformated by the agent plugin for easier processing in the check plugin. The above example output would thus be transformed into the agent plugin output shown in the following example:

<<<xfs_quota>>>
/srv/xfs:/dev/mapper/vg00-xfs:test1:0:0:1024:1:0:0
/srv/xfs:/dev/mapper/vg00-xfs:test2:0:0:2048:1:0:0
/srv/xfs:/dev/mapper/vg00-xfs:test3:0:0:3072:1:0:0

Although a simple Bash shell script, the agent plugin xfs_quota has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands xfs_quota, sed and egrep. On Debian based systems, the necessary packages can be installed with the following command:

root@host:~# apt-get install xfsprogs sed grep

The second part, a Check_MK check plugin also named xfs_quota, provides the necessary inventory and check functions. Upon inventory it creates a service check for each pair of XFS filesystem mountpoint and quota project ID. During normal check execution, the number of used XFS filesystem blocks and inodes are determined for each inventorized item (pair of XFS filesystem mountpoint and quota project ID). If the hard and soft, block and inode quotas for particular item are all set to zero, no further checks are carried out and only performance data is reported by the check. If either one of hard or soft, block or inode quota is set to a non-zero value, the number of remaining free XFS filesystem blocks or inodes is compared to warning and critical threshold values and an alarm is raised accordingly.

With the additional WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for blocks_hard and blocks_soft are zero (0) free blocks for both warning and critical thresholds. The default values for inodes_hard and inodes_soft are zero (0) free inodes for both warning and critical thresholds. The configuration options for the free block or inode levels can be found under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Storage, Filesystems and Files
         -> XFS Quota Utilization
            -> Create Rule in Folder ...
               -> The levels for the soft/hard block/inode quotas on XFS filesystems
                  [x] The levels for the hard block quotas on XFS filesystems
                  [x] The levels for the soft block quotas on XFS filesystems
                  [x] The levels for the hard inode quotas on XFS filesystems
                  [x] The levels for the soft inode quotas on XFS filesystems 

The following image shows a status output example for several xfs_quota service checks from the WATO WebUI:

Status output example for xfs_quota service checks

This example shows several service check items, which again are pairs of XFS filesystem mountpoints (here: /backup) and anonymized quota project IDs. For each item the number of blocks and inodes used are shown along with the appropriate hard and soft quota values. The number of blocks and inodes used are also visualized in the perf-o-meter, blocks on a logarithmic scale growing from the middle to the left, inodes on a logarithmic scale growing from the middle to the right.

The following image shows an example of the two PNP4Nagios graphs for a single service check:

Example PNP4Nagios graph for a single xfs_quota service check

The upper graph shows the number of used XFS filesystem blocks for a pair of XFS filesystem mountpoints (here: /backup) and a - again anonymized - quota project ID. The lower graph shows the number of used inodes for the same pair. Both graphs show warning and critical thresholds values, which are in this example at their default value of zero. If configured - like in the upper graph of the example - block or inode quotas are also shown as blue horizontal lines in the respective graphs.

The described Check_MK service check to monitor XFS filesystem quotas has been verified to work with version 3.2.1 of the xfsprogs package on Debian stable (Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.

I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.

// Dell iDRAC and Limitation in Certificate Signing Request Field Lengths

Current versions of the Dell iDRAC have limitations with regard to the field length for the SSL CSR. These limitations violate RFC 5280 and lead to an invalid CSR and possibly an invalid SSL certificate.


The Dell iDRAC allows the protection of browser based access via HTTPS. In order to facilitate this, SSL certificates are being used. Initially and by default there is a self-signed SSL certificate installed on the iDRAC which is signed by the Dell CA. For additional security this self-signed SSL certificate can be replaced with one provided by the systems administrator. There are two ways to create an own SSL certificate for the Dell iDRAC.

The first one is to use the iDRACs integrated facility to create a SSL CSR. The appropriate form can be found in the iDRAC WebUI under:

iDRAC Settings → Network → SSL → Generate Certificate Signing Request (CSR) → Next

After filling in the fields Common Name, Organization Name, Organization Unit, Locality, State Name, Country Code and Email with the appropriate values and clicking on Generate a CSR is generated in the background and after a short time offered for download. This CSR can now be fed into a CA – own or service provider operated – in order to generate the actual certificate. The generated certificate is then uploaded into the iDRAC via the WebUI.

So far so easy, and done countless times without any issues. Until the day, one of the values of the aforementioned fields Common Name, Organization Name, Organization Unit, Locality, State Name, Country Code or Email for the CSR changes to a significantly longer string. In my case the value of Organization Name changed from 11 characters to 60 characters. It appeares that unfortunately the developers of the iDRAC didn't expect values of that length, although those are absolutely in conformance with the appropriate standard RFC 5280 and well within the upper bounds mentioned there:

[...]
--  specifications of Upper Bounds MUST be regarded as mandatory
--  from Annex B of ITU-T X.411 Reference Definition of MTS Parameter
--  Upper Bounds

-- Upper Bounds
ub-name INTEGER ::= 32768
ub-common-name INTEGER ::= 64
ub-locality-name INTEGER ::= 128
ub-state-name INTEGER ::= 128
ub-organization-name INTEGER ::= 64
ub-organizational-unit-name INTEGER ::= 64
[...]
ub-emailaddress-length INTEGER ::= 255
ub-common-name-length INTEGER ::= 64
ub-country-name-alpha-length INTEGER ::= 2
[...]

-- Note - upper bounds on string types, such as TeletexString, are
-- measured in characters.  Excepting PrintableString or IA5String, a
-- significantly greater number of octets will be required to hold
-- such a value.  As a minimum, 16 octets, or twice the specified
-- upper bound, whichever is the larger, should be allowed for
-- TeletexString.  For UTF8String or UniversalString at least four
-- times the upper bound should be allowed.
[...]

Even worse, the iDRAC doesn't issue an error or a warning if it's given an exceedingly long value. On the contrary, it silently accepts the longer value and just cuts off everything after the 48th or 51st character.

I've checked various versions of the Dell iDRAC and encountered the issue on at least the following Dell PowerEdge hardware and iDRAC software versions:

Host Model iDRAC iDRAC Version
host1 R815 6 1.95 (Build 05)
host2 R815 6 2.85 (Build 04)
host3 M915 6 3.75 (Build 5)
host4 M620 7 2.21.21.21
host5 M630 8 2.30.30.30
host6 M630 8 2.41.40.40

Here are some examples of the Subject: from the CSRs generated on different Dell PowerEdge hardware and iDRAC software versions:

user@host:~$ for FL in *.csr; do echo -n "${FL}: "; openssl req -text -in ${FL} -noout | grep "Subject: "; done
host1.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host2.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host3.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, FQDN/emailAddress=EMAIL
host4.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host5.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host6.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL

The values in the different fields have been anonymized with the characters of the field name, but have otherwise been preserved in the length they appeared in the CSR. The observed behaviour is mostly consistent, with only one exception where the field value is strangely cut off after the 51st character.

Needless to say, cropping the field values like this results in an invalid CSR which in turn leads to issues in the certificate generation process at the CA.

After raising a support case with Dell, the problem was verified and the issue was acknowledged as a defect. A future iDRAC update is supposed to fix this issue. Unfortunately i have no information what version that will be nor when it will be released.

In the meantime, until there is a proper fix for this issue, the second way can be used. This is to manually create all three components – an own SSL private key, an own CSR and a certificate – for the Dell iDRAC. Unfortunately this involves significantly more steps, which are described below.

  • If necessary install some additional software packages. For Debian based systems:

    root@host:~# apt-get install openssl wsmancli rpl
    
  • In the following steps and commands, replace <FQDN> and <IP> with the FQDN and IP address of the Dell iDRAC where necessary.

  • Create a SSL private key:

    user@host:~$ openssl genrsa -out <FQDN>.key.pem 4096
    
  • Generate a CSR for the FQDN and IP address of the Dell iDRAC. Replace <C>, <ST>, <L>, <O>, <OU>, and <EMAIL> with the country, state, locality, organization, organizational-unit and a valid email address respectively.

    user@host:~$ openssl req -new -sha256 -nodes -key <FQDN>.key.pem -out <FQDN>.csr.pem -subj '/C=<C>/ST=<ST>/L=<L>/O=<O>/OU=<OU>/CN=<FDQN>/emailAddress=<EMAIL>' -config <(
    cat <<-EOF
    [req]
    req_extensions = req_ext
    distinguished_name = dn
    [ dn ]
    [ req_ext ]
    subjectAltName = @alt_names
    [alt_names]
    DNS.1 = <FQDN>
    IP.1 = <IP>
    EOF
    )
    

    Additional DNS.X = <FQDN> or IP.X = <IP> lines can be added if necessary. Increment the value of X accordingly.

  • With the generated CSR, request a certificate from your own CA or the one of your service provider. The range of different CAs and organizational processes involved is just too broad to cover this step in greater detail.

  • Merge the received certificate and the private key which was generated in one of the previous steps:

    user@host:~$ cat <FQDN>.cert.pem <FQDN>.key.pem > <FQDN>.certkey.pem
    
  • Convert the combined certificate and the private key into the PKCS #12 format:

    user@host:~$ openssl pkcs12 -export -in <FQDN>.certkey.pem -out <FQDN>.certkey.pfx
    

    This command will request a password at the command line which will later on be needed again.

  • Convert the PKCS #12 file into a base64-encoded file:

    user@host:~$ openssl base64 -in <FQDN>.certkey.pfx -out <FQDN>.certkey.p12
    
  • Prepare the base64-encoded PKCS #12 file for the upload to the Dell iDRAC over its SOAP-interface:

    user@host:~$ echo "<p:SetCertificateAndPrivateKey_INPUT xmlns:p="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_LCService"><p:Type>server</p:Type><p:PKCS12>" > <FQDN>.certkey.xml
    user@host:~$ cat <FQDN>.certkey.p12 >> <FQDN>.certkey.xml
    user@host:~$ echo "</p:PKCS12><p:PKCS12pin>PASSWORD_AT_PKCS12_EXPORT</p:PKCS12pin></p:SetCertificateAndPrivateKey_INPUT>" >> <FQDN>.certkey.xml
    

    Replace the string PASSWORD_AT_PKCS12_EXPORT with the password that was given in the previous step where the certificate was exported into the PKCS #12 format:

    user@host:~$ rpl PASSWORD_AT_PKCS12_EXPORT <Paßwort> <FQDN>.certkey.xml
    
  • Upload the base64-encoded PKCS #12 file to the Dell iDRAC over its SOAP-interface. Replace <IP> with the IP address of the Dell iDRAC and <USER> and <PASSWORD> with the appropriate credentials of a user allowed to access the Dell iDRAC:

    user@host:~$ wsman invoke -a SetCertificateAndPrivateKey http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_LCService?SystemCreationClassName=DCIM_ComputerSystem,CreationClassName=DCIM_LCService,SystemName=DCIM:ComputerSystem,Name=DCIM:LCService -h <IP> -P 443 -u <USER> -p <PASSWORD> -c dummy.cert -y basic -V -v -J <FQDN>.certkey.xml -j utf-8
    

The last two steps are optional and only necessary if you don't want to or cannot install the Dell OpenManage DRAC Tools, which include the racadm command, or if the remote access via racadm is not or cannot be enabled.

This workaround can also be applied generally and independently of the mentioned bug in the Dell iDRAC in order to create a valid SSL certificate for Dell iDRAC and Dell Blade CMC systems.

// Integration of Dell EqualLogic PS-Series Storages with RANCID

Adding support for Dell EqualLogic PS-Series storage arrays to version 3.5.1 of the popular, open source switch and router configuration management tool RANCID.

For the impatient and TL;DR here are the extensions to RANCID for the management of Dell EqualLogic PS-Series storage arrays:

Login script for Dell EqualLogic PS-Series storage arrays
Perl module to generate, process and save the configuration of Dell EqualLogic PS-Series storage arrays

The sources are to be found in my RANCID repository on GitHub


RANCID has, in its current version 3.5.1, support for a large variety of network devices like routers, switches, load-balancers, etc. Unfortunately there is currently little or no support for the management of storage devices, even though a lot of them offer a command line interface which can be used by RANCID.

Although there probably are a couple of reasons for this, i suppose this is largely due to the fact that network and storage admins are – in most organizations – still in different groups, each with their own set of management and support tools. With RANCID originating from the realm of network administration, probably only few storage admins know about this very valuable tool to begin with. There is probably also very little transfer over from the position of network administrator into the area of storage administration and thus a limited amount of knowledge transfer between those two fields.

This blog post describes how to extend and configure RANCID in order to add support for Dell EqualLogic PS-Series storage arrays. The extensions are based on the – at the time of writing – current version 3.5.1 of RANCID. RANCID can either be build from source or be installed pre-packaged e.g. from the backports repository of Debian stable (jessie). Basically, the extension to RANCID consist of only two files:

Besides those two extensions only a small change to /etc/rancid/rancid.types.conf, as well as the standard RANCID configuration of a new device group and device is necessary. See the following full step-by-step configuration example for Dell EqualLogic PS-Series storage arrays:

  • Add the backports repository of Debian stable (jessie) to the APT configuration:

    root@host:~$ echo 'deb http://http.debian.net/debian jessie-backports main non-free contrib' >> /etc/apt/sources.list.d/jessie-backports.list
    root@host:~$ apt-get update
    
  • Install RANCID v3.5.1 from the backports repository of Debian stable (jessie):

    root@host:~$ apt-get -y install rancid/jessie-backports
    
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    Selected version '3.5.1-1~bpo8+1' (Debian Backports:jessie-backports [amd64]) for 'rancid'
    The following extra packages will be installed:
      expect tcl-expect
    The following NEW packages will be installed:
      expect rancid tcl-expect
    0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
    Need to get 511 kB of archives.
    After this operation, 2,178 kB of additional disk space will be used.
    [...]
    

    Optional: In case Subversion should be used as a revision control system (RCS) to store the switch configuration, install it:

    root@host:~$ apt-get -y install subversion
    
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    The following extra packages will be installed:
      libapr1 libaprutil1 libldap-2.4-2 libsasl2-2 libsasl2-modules libsasl2-modules-db libserf-1-1 libsvn1
    Suggested packages:
      libsasl2-modules-otp libsasl2-modules-ldap libsasl2-modules-sql libsasl2-modules-gssapi-mit libsasl2-modules-gssapi-heimdal subversion-tools db5.3-util patch
    The following NEW packages will be installed:
      libapr1 libaprutil1 libldap-2.4-2 libsasl2-2 libsasl2-modules libsasl2-modules-db libserf-1-1 libsvn1 subversion
    0 upgraded, 9 newly installed, 0 to remove and 0 not upgraded.
    Need to get 2,723 kB of archives.
    After this operation, 9,683 kB of additional disk space will be used.
    [...]
    
  • Download the login script for Dell EqualLogic PS-Series storage arrays and store it under the path /usr/lib/rancid/bin/.

  • Download the Perl module to process and save the configuration of Dell EqualLogic PS-Series storage arrays and store it under the path /usr/share/perl5/rancid/.

  • Edit the global RANCID configuration:

    root@host:~$ vi /etc/rancid/rancid.conf
    

    Select the RCS (CVS, SVN or Git) of your choice. In this example SVN is used:

    RCSSYS=svn; export RCSSYS

    Define a name for your Dell EqualLogic device group in the LIST_OF_GROUPS configuration variable. In this example we'll use the name dell-storage:

    LIST_OF_GROUPS="dell-storage"; export LIST_OF_GROUPS
  • Create the cloginrc configuration file, which will contain the login information for your Dell EqualLogic PS-Series devices and some default values:

    root@host:~$ touch /etc/rancid/cloginrc
    root@host:~$ chmod 660 /etc/rancid/cloginrc
    root@host:~$ chown root:rancid /etc/rancid/cloginrc
    root@host:~$ vi /etc/rancid/cloginrc
    

    Example:

    add user        dell-eql-1      dell-user
    add password    dell-eql-1      <login-passwort>
    
    [...]
    
    add user        *               <default-user>
    add password    *               <default-login-passwort>
    add method      *               ssh

    For the device named dell-eql-1 login as user dell-user with the password <login-passwort>.

    For all other systems, login as user <default-user> with the password <default-login-passwort>. The login method for all systems is via SSH.

    Since the cloginrc configuration file is parsed in a first-match fashion, the default values must always be at the bottom of the file.
  • Add a new device type for Dell EqualLogic PS-Series storage arrays to the RANCID configuration. See man router.db and /etc/rancid/rancid.types.conf. In this example and in the general case of Dell Dell EqualLogic PS-Series storage arrays the name of the device type is equallogic:

    root@host:~$ vi /etc/rancid/rancid.types.conf
    

    Here we set the login script to be used to the new eqllogin. The postprocessing script is set rancid -t equallogic in order to call the new Perl module equallogic, which will do the actual processing. The command to be issued on the Dell EqualLogic device is set to save-config -verbose. The -verbose part is essential here, otherwise the configuration of the device will only be saved to a file on the Dell EqualLogic device and not be printed to the terminal:

    equallogic;login;eqllogin
    equallogic;script;rancid -t equallogic
    equallogic;module;equallogic
    equallogic;inloop;equallogic::inloop
    equallogic;command;equallogic::SaveConfiguration;save-config -verbose
  • Change to the user rancid:

    root@host:~$ su - rancid
    
    • Create a symbolic link to the login configuration previously created in /etc/rancid/:

      rancid@host:~$ ln -s /etc/rancid/cloginrc /var/lib/rancid/.cloginrc
      
    • Initialize the directory structure for the RCS (CVS, SVN or Git) selected above. This will automatically be done for each device group configured in the LIST_OF_GROUPS configuration variable. The example shown here only creates the directory structure for the device group dell-storage defined above:

      rancid@host:~$ /usr/lib/rancid/bin/rancid-cvs
      Committed revision 1.
      Checked out revision 1.
      Updating '.':
      At revision 1.
      A         configs
      Adding         configs
      
      Committed revision 2.
      A         router.db
      Adding         router.db
      Transmitting file data .
      Committed revision 3.
      
      rancid@host:~$ find /var/lib/rancid/dell-storage/
      /var/lib/rancid/dell-storage
      /var/lib/rancid/dell-storage/configs
      /var/lib/rancid/dell-storage/router.db
      /var/lib/rancid/dell-storage/routers.all
      /var/lib/rancid/dell-storage/routers.down
      /var/lib/rancid/dell-storage/routers.up
      /var/lib/rancid/dell-storage/.svn
      /var/lib/rancid/dell-storage/.svn/entries
      /var/lib/rancid/dell-storage/.svn/format
      /var/lib/rancid/dell-storage/.svn/pristine
      /var/lib/rancid/dell-storage/.svn/pristine/da
      /var/lib/rancid/dell-storage/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base
      /var/lib/rancid/dell-storage/.svn/tmp
      /var/lib/rancid/dell-storage/.svn/wc.db
      
    • Add Dell EqualLogic storage devices by their hostname to the configuration file router.db of the corresponding device group:

      rancid@host:~$ vi /var/lib/rancid/dell-storage/router.db
      

      In this example the device group dell-storage, the device type equallogic and the system dell-eql-1:

      dell-eql-1;equallogic;up;A comment describing the system dell-eql-1
    • Perform a login test with the previously configured new login script eqllogin for Dell EqualLogic devices on the newly defined system dell-eql-1. The following example output shows the steps that should automatically be performed by the eqllogin expect script. No manual intervention should be necessary.

      rancid@host:~$ /usr/lib/rancid/bin/eqllogin dell-eql-1
      spawn ssh -x -l grpadmin dell-eql-1
      The authenticity of host 'dell-eql-1 (<ip address>)' can't be established.
      RSA key fingerprint is <rsa key fingerprint>
      Are you sure you want to continue connecting (yes/no)?  yes
      Host dell-eql-1 added to the list of known hosts.
      Warning: Permanently added 'dell-eql-1,<ip address>' (RSA) to the list of known hosts.
      grpadmin@dell-eql-1's password: 
      Last login: Thu Dec  8 22:32:39 2016 from <ip address> on ttyp1
       
      
                 Welcome to Group Manager
      
              Copyright 2001-2015 Dell Inc.
      
      
      
      dell-eql-1-grp> 
      
    • Finish the login test by manually logging out of the system:

      dell-eql-1-grp> logout
      Do you really want to logout? (y/n) [n]y
      
      dell-eql-1-grp> Connection to dell-eql-1 closed.
      
    • Manually perform an initial RANCID run to make sure everything works as expected:

      rancid@host:~$ rancid-run
      

      If everything ran successfully, there should now be a file /var/lib/rancid/dell-storage/configs/dell-eql-1 containing the output of the command save-config -verbose for the system dell-eql-1.

  • Create the email aliases necessary for the proper delivery of the emails generated by RANCID. Again in this example for the device group dell-storage:

    root@host:~$ vi /etc/aliases
    
    rancid-dell-storage:       <email>@<domain>
    rancid-admin-dell-storage: <email>@<domain>

    Recreate your aliases DB. In case postfix is used as an MTA:

    root@host:~$ postalias /etc/aliases
    
  • Enable the RANCID cron jobs. Adjust the execution times and intervals according to your needs:

    root@host:~$ vi /etc/cron.d/rancid
    

Some final words: The contents of the directories /var/lib/rancid/<device group>/ and /var/lib/rancid/<device group>/configs/ are maintained in the RCS – CVS, SVN or Git – of your choice. You can operate on those directories with the usual commands of the selected RCS. There are also some really nice and intuitive web frontends to the RCS of choice. For me, the combination of SVN as RCS and WebSVN as a web frontend worked out very well.

// Backporting Open-iSCSI to Debian 8 "Jessie"

Starting with the Debian open-iscsi release 2.0.874-1, which is now available in the backports reprository for Debian 8 "Jessie", the manual backport of Open-iSCSI described below is no longer necessary.

The Debian Open-iSCSI package is now based on current upstream version of Open-iSCSI. Open-iSCSIs iscsiuio is now provided through its own Debian package. Several improvements (Git commit d05fe0e1, Git commit 6004a7e7) have been made in handling hardware initiator based iSCSI sessions.

Thanks to Christian Seiler for his work on bringing the Debian Open-iSCSI package up to a current upstream version and for helping to sort our some issues related to the use of hardware initiators!

In the previous article Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell i mentioned using a backported version of Open-iSCSI on Debian 8 (“Jessie”). This new post describes the backport and the changes provided by it in greater detail. All the changes to the original Debian package from “unstable” (“Sid”) can be found in my Debian Open-iSCSI Packaging repository on GitHub.

Starting point was a clone of the Debian Open-iSCSI Packaging repository at Git commit df150d90. Mind though, that in the meantime between creating the backport and writing this, the Debian Open-iSCSI maintainers have been busy and a more recent version of the Debian Open-iSCSI package from “unstable” (“Sid”) is now available.

Within this particular version of the Debian Open-iSCSI package, i first enabled the build of Open-iSCSIs iscsiuio. On the one hand, this was done in order to ensure that the iscsiuio code would successfully build even at this old level of the Open-iSCSI code. On the other hand, this would be used as a differentiator for any issues surfacing later on, after the move to the more recent upstream Open-iSCSI sources, indicating the root cause of those would then solely be with the newer upstream version of Open-iSCSI. Some integration into the general system environment was also added at this point. In detail the changes were:

  • Git commit 32c96e6c removes the Debian patch 05-disable-iscsiuio.patch which disables the build of iscsiuio.

  • Git commit 984344a1 enables the build of iscsiuio, extends the cleanup build targets and adds iscsiuio to the dh_systemd build targets.

  • Git commit 89d845a9 adds the results from the successful build – the iscsiuio binary, the iscsiuio manual page, a readme file and a logrotate configuration file – to the Debian package. It also adds the kernel modules bnx2i and cnic to the list of kernel modules to be loaded at installation time.

  • Git commit 89195bbe adds the systemd service and socket unit files for iscsiuio. Those files have been taken from this discussion on the Open-iSCSI mailing list and have slightly been altered.

With the above changes a intermediary package was build for testing purposes. During the following tests sometimes all currently mounted filesystems – even those distinctly not based on iSCSI volumes – would suddenly be unmounted. For some filesystems this would succeed, for others, like e.g. the /var and the root filesystem, this would fail due to them being currently in use. The issue particularly occured while stopping the open-iscsi service either via its regular init script or via its systemd service. This is usually done at system shutdown or during uninstall of the Open-iSCSI package. Tracking down the root cause of this issue led to an unhandled case in the umountiscsi.sh script, which is called while stopping the open-iscsi service. Specifically, the following code section is responsible for the observed behaviour:

debian/extra/umountiscsi.sh
256    if [ $HAVE_LVM -eq 1 ] ; then
257        # Look for all LVM volume groups that have a backing store
258        # on any iSCSI device we found. Also, add $LVMGROUPS set in
259        # /etc/default/open-iscsi (for more complicated stacking
260        # configurations we don't automatically detect).
261        for _vg in $(cd /dev ; $PVS --noheadings -o vg_name $iscsi_disks $iscsi_partitions $iscsi_multipath_disks $iscsi_multipath_partitions 2>/dev/null) $LVMGROUPS ; do
262            add_to_set iscsi_lvm_vgs "$_vg"
263        done

The heuristic of the umountiscsi.sh script are trying to identify iSCSI based disk devices which are valid candidates for proper deactivation upon system shutdown. It turned out that in LVM based setups where there are currently no iSCSI based disk devices present, the variables $iscsi_disks, $iscsi_partitions, $iscsi_multipath_disks and $iscsi_multipath_partitions are left empty by the scripts logic. In line 261 in the above code snippet, this leads to a call to the pvs --noheadings -o vg_name command without any additional arguments limiting its output of volume groups. Hence, the returned output is instead a complete list of all volume groups currently present on the system. Based on this list, the associated logical volumes for each volume group are determined and added to the list of devices to be unmounted. Finally all devices in this list are actually unmounted.

Without making too invasive changes to the script logic of umountiscsi.sh a quick'n'dirty solution was to introduce a check before the call to pvs which would determine whether the variables $iscsi_disks, $iscsi_partitions, $iscsi_multipath_disks and $iscsi_multipath_partitions are all empty. If this is the case, the call to pvs is simply skipped. The following patch shows the necessary code changes which are also available in Git commit 5118af7f:

umountiscsi.sh.patch
diff --git a/debian/extra/umountiscsi.sh b/debian/extra/umountiscsi.sh
index 1206fa1..485069c 100755
--- a/debian/extra/umountiscsi.sh
+++ b/debian/extra/umountiscsi.sh
@@ -258,9 +258,11 @@ enumerate_iscsi_devices() {
                # on any iSCSI device we found. Also, add $LVMGROUPS set in
                # /etc/default/open-iscsi (for more complicated stacking
                # configurations we don't automatically detect).
-               for _vg in $(cd /dev ; $PVS --noheadings -o vg_name $iscsi_disks $iscsi_partitions $iscsi_multipath_disks $iscsi_multipath_partitions 2>/dev/null) $LVMGROUPS ; do
-                       add_to_set iscsi_lvm_vgs "$_vg"
-               done
+               if [ -n "$iscsi_disks" -o -n "$iscsi_partitions" -o -n "$iscsi_multipath_disks" -o -n "$iscsi_multipath_partitions" ]; then
+                   for _vg in $(cd /dev ; $PVS --noheadings -o vg_name $iscsi_disks $iscsi_partitions $iscsi_multipath_disks $iscsi_multipath_partitions 2>/dev/null) $LVMGROUPS ; do
+                           add_to_set iscsi_lvm_vgs "$_vg"
+                   done
+               fi
 
                # $iscsi_lvm_vgs is now unique list
                for _vg in $iscsi_lvm_vgs ; do

After this was fixed, the last step was to finally move to the more recent upstream Open-iSCSI sources. In detail the changes in this last step were:

  • Git commit f5ab51ff moves the code to version 2.0.873+git1.1dfb88a4 which is based upon the upstream Git commit 1dfb88a4. This is the last commit before the externalization of the Open-iSNS library. Since i didn't want to also backport the Open-iSNS packages from Debian “unstable” (“Sid”), i decided to just skip the next two upstream commits 76832662 and c6d1117b and stick with the locally delivered Open-iSNS library.

  • Git commit 8c1e6974 removes the local Debian patches 01_spelling-errors-and-manpage-hyphen-fixes.patch, 02_make-iscsistart-a-dynamic-binary.patch and 03_respect-build-flags.patch which have already been merged into the more recent upstream Open-iSCSI sources. The remaining local Debian patches were renamed and reordered to 01_fix_iscsi_path.patch, 02_var-lock_var-run_transition.patch and 03_makefile_reproducibility_issues.patch. A whole bunch of new patches named {04,05,06,07,08,09,10,11,12,13,14,15}_upstream_git_commit_<Git commit ID>.patch were added in order to bring the sources up to the – by then most recent – upstream Git commit 0fa43f29.

  • Git commit d051dece removes some files from the Debian package, which were dynamically generated during the build of iscsiuio.

  • Finally Git commit 0fabb948 deals with the issue described in Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell.

With the steps and changes described above, a backported version of Open-iSCSI using its most recent sources was created as a package for Debian 8 (“Jessie”). This package also supports offloaded iSCSI connections via the Broadcom BCM577xx and BCM578xx iSOEs with the use of iscsiuio. The package has been in production use for over a month now and no major issues – neither with the newer upstream Open-iSCSI sources, nor with use of Broadcom BCM577xx and BCM578xx iSOEs through iscsiuio – have emerged so far.

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information