2017-04-12 // Check_MK Monitoring - Open-iSCSI
The Open-iSCSI project provides a high-performance, transport independent, implementation of RFC 3720 iSCSI for Linux. It allows remote access to SCSI targets via TCP/IP over several different transport technologies. This article introduces a new Check_MK service check to monitor the status of Open-iSCSI sessions as well as the monitoring of several statistical metrics on Open-iSCSI sessions and iSCSI hardware initiator hosts.
For the impatient and TL;DR here is the Check_MK package of the Open-iSCSI monitoring checks:
Open-iSCSI monitoring checks (Compatible with Check_MK versions 1.2.8 and later)
The sources are to be found in my Check_MK repository on GitHub
The Check_MK service check to monitor Open-iSCSI consists of two major parts, an agent plugin and three check plugins.
The first part, a Check_MK agent plugin named open-iscsi
, is a simple Bash shell script. It calls the Open-iSCSI administration tool iscsiadm
in order to retrieve a list of currently active iSCSI sessions. The exact call to iscsiadm
to retrieve the session list is:
/usr/bin/iscsiadm -m session -P 1
If there are any active iSCSI sessions, the open-iscsi
agent plugin also tries to collect several statistics for each iSCSI session. This is done by another call to iscsiadm
for each iSCSI Session ${SID}
, which is shown in the following example:
/usr/bin/iscsiadm -m session -r ${SID} -s
Unfortunately, the iSCSI session statistics are currently only supported for Open-iSCSI software initiators or dependent hardware iSCSI initiators like the Broadcom BCM577xx or BCM578xx adapters which are covered by the bnx2i
kernel module. See Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell and Backporting Open-iSCSI to Debian 8 "Jessie" for additional information on those dependent hardware iSCSI initiators.
For hardware iSCSI initiators, like the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs, which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware, there is currently no support for iSCSI session statistics. Instead, the open-iscsi
agent plugin collects several global statistics on each iSOE host ${HST}
which is covered by the qla4xxx
kernel module with the command shown in the following example:
/usr/bin/iscsiadm -m host -H ${HST} -C stats
The output of the above commands is parsed and reformated by the agent plugin for easier processing in the check plugins. The following example shows the agent plugin output for a system with two BCM578xx dependent hardware iSCSI initiators:
<<<open-iscsi_sessions>>> bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:bf:2d eth2 10.0.3.52 LOGGED_IN LOGGED_IN NO_CHANGE bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:c2:34 eth3 10.0.3.53 LOGGED_IN LOGGED_IN NO_CHANGE <<<open-iscsi_session_stats>>> [session stats f8:ca:b8:7d:bf:2d iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 40960 rxdata_octets: 461171313 noptx_pdus: 0 scsicmd_pdus: 153967 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 153967 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 112420 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0 [session stats f8:ca:b8:7d:c2:34 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 16384 rxdata_octets: 255666052 noptx_pdus: 0 scsicmd_pdus: 84312 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 84312 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 62418 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0
The next example shows the agent plugin output for a system with two QLogic 8200 Series hardware iSCSI initiators:
<<<open-iscsi_sessions>>> qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7d.ipv4.0 none 10.0.3.50 LOGGED_IN Unknown Unknown qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7e.ipv4.0 none 10.0.3.51 LOGGED_IN Unknown Unknown <<<open-iscsi_host_stats>>> [host stats f8:ca:b8:7d:c1:7d iqn.2000-04.com.qlogic:isp8214.000e1e3574ac.4] mactx_frames: 563454 mactx_bytes: 52389948 mactx_multicast_frames: 877513 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573455 macrx_bytes: 440845678 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508160 iptx_bytes: 29474232 iptx_fragments: 0 iprx_packets: 401785 iprx_bytes: 354673156 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508160 tcptx_bytes: 19310736 tcprx_segments: 401785 tcprx_byte: 346637456 tcp_duplicate_ack_retx: 1 tcp_retx_timer_expired: 1 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106449 tcptx_pure_ack: 106489 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 695915 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401697 iscsi_data_bytes_tx: 29225 iscsi_pdu_rx: 401697 iscsi_data_bytes_rx: 327355963 iscsi_io_completed: 101 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0 [host stats f8:ca:b8:7d:c1:7e iqn.2000-04.com.qlogic:isp8214.000e1e3574ad.5] mactx_frames: 563608 mactx_bytes: 52411412 mactx_multicast_frames: 877517 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573572 macrx_bytes: 441630442 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508310 iptx_bytes: 29490504 iptx_fragments: 0 iprx_packets: 401925 iprx_bytes: 355436636 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508310 tcptx_bytes: 19323952 tcprx_segments: 401925 tcprx_byte: 347398136 tcp_duplicate_ack_retx: 2 tcp_retx_timer_expired: 4 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106466 tcptx_pure_ack: 106543 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 696035 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401787 iscsi_data_bytes_tx: 37970 iscsi_pdu_rx: 401791 iscsi_data_bytes_rx: 328112050 iscsi_io_completed: 127 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0
Although a simple Bash shell script, the agent plugin open-iscsi
has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands iscsiadm
, sed
, tr
and egrep
. On Debian based systems, the necessary packages can be installed with the following command:
root@host:~# apt-get install coreutils grep open-iscsi sed
The second part of the Check_MK service check for Open-iSCSI provides the necessary check logic through individual inventory and check functions. This is implemented in the three Check_MK check plugins open-iscsi_sessions
, open-iscsi_host_stats
and open-iscsi_session_stats
, which will be discussed separately in the following sections.
Open-iSCSI Session Status
The check plugin open-iscsi_sessions
is responsible for the monitoring of individual iSCSI sessions and their internal session states. Upon inventory this check plugin creates a service check for each pair of iSCSI network interface name and IQN of the iSCSI target volume. Unlike the iSCSI session ID, which changes over time (e.g. after iSCSI logout and login), this pair uniquely identifies a iSCSI session on a host. During normal check execution, the list of currently active iSCSI sessions on a host is compared to the list of active iSCSI sessions gathered during inventory on that host. If a session is missing or if the session has an erroneous internal state, an alarm is raised accordingly.
For all types of initiators – software, dependent hardware and hardware – there is the state session_state
which can take on the following values:
ISCSI_STATE_FREE ISCSI_STATE_LOGGED_IN ISCSI_STATE_FAILED ISCSI_STATE_TERMINATE ISCSI_STATE_IN_RECOVERY ISCSI_STATE_RECOVERY_FAILED ISCSI_STATE_LOGGING_OUT
An alarm is raised if the session is in any state other than ISCSI_STATE_LOGGED_IN
. For software and dependent hardware initiators there are two additional states – connection_state
and internal_state
. The state connection_state
can take on the values:
FREE TRANSPORT WAIT IN LOGIN LOGGED IN IN LOGOUT LOGOUT REQUESTED CLEANUP WAIT
and internal_state
can take on the values:
NO CHANGE CLEANUP REOPEN REDIRECT
In addition to the above session_state
, an alarm is raised if the connection_state
is in any other state than LOGGED IN
and internal_state
is in any other state than NO CHANGE
.
No performance data is currently reported by this check.
Open-iSCSI Hosts Statistics
The check plugin open-iscsi_host_stats
is responsible for the monitoring of the global statistics on a iSOE host. Upon inventory this check plugin creates a service check for each pair of MAC address and iSCSI network interface name. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is determined for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_host_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSOE host statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Host Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI host statistics values [x] The levels for the number of transmitted MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of received MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 abort frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 jumbo frames on an iSOE host. [x] The levels for the number of MAC/Layer2 late transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 single transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 multiple transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 collisions on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dribble on an iSOE host. [x] The levels for the number of received MAC/Layer2 frame length errors on an iSOE host. [x] The levels for the number of discarded received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 jabber on an iSOE host. [x] The levels for the number of received MAC/Layer2 carrier sense errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 CRC errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 encoding errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too large errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too small errors on an iSOE host. [x] The levels for the number of transmitted IP packets on an iSOE host. [x] The levels for the number of received IP packets on an iSOE host. [x] The levels for the number of transmitted IP bytes on an iSOE host. [x] The levels for the number of received IP bytes on an iSOE host. [x] The levels for the number of transmitted IP fragments on an iSOE host. [x] The levels for the number of received IP fragments on an iSOE host. [x] The levels for the number of IP datagram reassemblies on an iSOE host. [x] The levels for the number of IP invalid address errors on an iSOE host. [x] The levels for the number of IP packet errors on an iSOE host. [x] The levels for the number of IP fragmentation overlaps on an iSOE host. [x] The levels for the number of IP fragmentation out-of-order on an iSOE host. [x] The levels for the number of IP datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted IPv6 packets on an iSOE host. [x] The levels for the number of received IPv6 packets on an iSOE host. [x] The levels for the number of transmitted IPv6 bytes on an iSOE host. [x] The levels for the number of received IPv6 bytes on an iSOE host. [x] The levels for the number of transmitted IPv6 fragments on an iSOE host. [x] The levels for the number of received IPv6 fragments on an iSOE host. [x] The levels for the number of IPv6 datagram reassemblies on an iSOE host. [x] The levels for the number of IPv6 invalid address errors on an iSOE host. [x] The levels for the number of IPv6 packet errors on an iSOE host. [x] The levels for the number of IPv6 fragmentation overlaps on an iSOE host. [x] The levels for the number of IPv6 fragmentation out-of-order on an iSOE host. [x] The levels for the number of IPv6 datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted TCP segments on an iSOE host. [x] The levels for the number of received TCP segments on an iSOE host. [x] The levels for the number of transmitted TCP bytes on an iSOE host. [x] The levels for the number of received TCP bytes on an iSOE host. [x] The levels for the number of duplicate TCP ACK retransmits on an iSOE host. [x] The levels for the number of received TCP retransmit timer expiries on an iSOE host. [x] The levels for the number of received TCP duplicate ACKs on an iSOE host. [x] The levels for the number of received TCP pure ACKs on an iSOE host. [x] The levels for the number of transmitted TCP delayed ACKs on an iSOE host. [x] The levels for the number of transmitted TCP pure ACKs on an iSOE host. [x] The levels for the number of received TCP segment errors on an iSOE host. [x] The levels for the number of received TCP segment out-of-order on an iSOE host. [x] The levels for the number of received TCP window probe on an iSOE host. [x] The levels for the number of received TCP window update on an iSOE host. [x] The levels for the number of transmitted TCP window probe persist on an iSOE host. [x] The levels for the number of transmitted iSCSI PDUs on an iSOE host. [x] The levels for the number of received iSCSI PDUs on an iSOE host. [x] The levels for the number of transmitted iSCSI Bytes on an iSOE host. [x] The levels for the number of received iSCSI Bytes on an iSOE host. [x] The levels for the number of iSCSI I/Os completed on an iSOE host. [x] The levels for the number of iSCSI unexpected I/Os on an iSOE host. [x] The levels for the number of iSCSI format errors on an iSOE host. [x] The levels for the number of iSCSI header digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI data digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI sequence errors on an iSOE host. [x] The levels for the number of ECC error corrections on an iSOE host.
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_host_stats
(iSCSI Host Stats) service checks over two QLogic 8200 Series hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
– in this example LOGGED_IN
– is shown. There are also two iSCSI Host Stats service check items in the example, which are pairs of MAC addresses and iSCSI network interface names. For each of those items the current throughput rate on the MAC, IP/IPv6, TCP and iSCSI protocol layer is shown. The throughput rate on the MAC protocol layer is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following three images show examples of the PNP4Nagios graphs for the open-iscsi_host_stats
(iSCSI Host Stats) service check.
The middle graph shows a combined view of the throughput rate for received and transmitted traffic on the different MAC, IP/IPv6, TCP and iSCSI protocol layers. The upper graph shows the throughput rate for various frame types on the MAC protocol layer. The lower graph shows the rate for various error frame types on the MAC protocol layer.
The upper graph shows the throughput rate for received and transmitted traffic on the IP/IPv6 protocol layer. The middle graph shows the rate for various error packet types on the IP/IPv6 protocol layer. The lower graph shows the throughput rate for received and transmitted traffic on the TCP protocol layer.
The first graph shows the rate for various protocol control and error segment types on the TCP protocol layer. The second graph shows the rate of ECC error corrections that occured on the QLogic 8200 Series hardware iSCSI initiator. The third graph shows the throughput rate for received and transmitted traffic on the iSCSI protocol layer. The fourth and last graph shows the rate for various control and error PDUs on the iSCSI protocol layer.
Open-iSCSI Session Statistics
The check plugin open-iscsi_session_stats
is responsible for the monitoring of the statistics on individual iSCSI sessions. Upon inventory this check plugin creates a service check for each pair of MAC address of the network interface and IQN of the iSCSI target volume. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is collected for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_session_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSCSI session statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Session Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI session statistics values [x] The levels for the number of transmitted bytes in an Open-iSCSI session [x] The levels for the number of received bytes in an Open-iSCSI session [x] The levels for the number of digest (CRC) errors in an Open-iSCSI session [x] The levels for the number of timeout errors in an Open-iSCSI session [x] The levels for the number of transmitted NOP commands in an Open-iSCSI session [x] The levels for the number of received NOP commands in an Open-iSCSI session [x] The levels for the number of transmitted SCSI command requests in an Open-iSCSI session [x] The levels for the number of received SCSI command reponses in an Open-iSCSI session [x] The levels for the number of transmitted task management function commands in an Open-iSCSI session [x] The levels for the number of received task management function responses in an Open-iSCSI session [x] The levels for the number of transmitted login requests in an Open-iSCSI session [x] The levels for the number of transmitted logout requests in an Open-iSCSI session [x] The levels for the number of received logout responses in an Open-iSCSI session [x] The levels for the number of transmitted text PDUs in an Open-iSCSI session [x] The levels for the number of received text PDUs in an Open-iSCSI session [x] The levels for the number of transmitted data PDUs in an Open-iSCSI session [x] The levels for the number of received data PDUs in an Open-iSCSI session [x] The levels for the number of transmitted single negative ACKs in an Open-iSCSI session [x] The levels for the number of received ready to transfer PDUs in an Open-iSCSI session [x] The levels for the number of received reject PDUs in an Open-iSCSI session [x] The levels for the number of received asynchronous messages in an Open-iSCSI session
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_session_stats
(iSCSI Session Stats) service checks over two BCM578xx dependent hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
, connection_state
and internal_state
– in this example with the respective values LOGGED_IN
, LOGGED_IN
and NO_CHANGE
– are shown. There are also an equivalent number of iSCSI Session Stats service check items in the example, which are also pairs of MAC addresses of the network interfaces and IQNs of the iSCSI target volumes. For each of those items the current throughput rate of the individual iSCSI session is shown. As long as the rate of the digest (CRC) and timeout error counters is zero, the string no protocol errors is displayed. Otherwise the name and throughput rate of any non-zero error counter is shown. The throughput rate of the iSCSI session is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following image shows an example of the three PNP4Nagios graphs for a single open-iscsi_session_stats
(iSCSI Session Stats) service check.
The upper graph shows the throughput rate for received and transmitted traffic of the iSCSI session. The middle graph shows the rate for received and transmitted iSCSI PDUs, broken down by the different types of PDUs on the iSCSI protocol layer. The lower graph shows the rate for the digest (CRC) and timeout errors on the iSCSI protocol layer.
The described Check_MK service check to monitor the status of Open-iSCSI sessions, Open-iSCSI session metrics and iSCSI hardware initiator host metrics has been verified to work with version 2.0.874-2~bpo8+1 of the open-iscsi
package from the backports repository of Debian stable
(Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.
I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.
2015-11-13 // QLogic iSCSI HBA and Limitations in Bi-Directional Authentication
In the past the QLogic QConvergeConsole (qaucli
) was used as an administration tool for the hardware initiator part of the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs. Unfortunately this tool was only supported on the so-called “enterprise Linux distributions” like RHEL and SLES. If you were running any other Linux distribution like e.g. Debian or even one of the BSD distributions you were out of luck.
Thankfully QLogic addressed this support issue indirectly, by first announcing and since then by actually moving from a IOCTL based management method towards the Open-iSCSI based management method via the iscsiadm
command. The announcement QLogic iSCSI Solution for Transitioning to the Open-iSCSI Model and the User's Guide IOCTL to Open-iSCSI Interface can be found at the QLogic web site.
While trying to test and use the new management method for the hardware initiator via the Open-iSCSI iscsiadm
command, i soon ran into the issue that the packaged version of Open-iSCSI, which is shipped with Debian Wheezy, is based on the last stable release v2.0.873 from Open-iSCSI and is thus hopelessly out of date. The Open-iSCSI package shipped with Debian Jessie is a bit better, since it's already based on a newer version from the projects GitHub repository. Still, the Git commit used there dates back to August 23rd of 2013, which is also fairly old. After updating my system to Debian Jessie, i soon decided to rebuild the Open-iSCSI package from a much more recent version from the projects GitHub repository. With this, the management of the QLogic hardware initiators worked very well via the Open-iSCSI iscsiadm
command and its now enhanced host
mode.
In the host
mode there are now three sub-modes chap
, flashnode
, stats
. See man iscsiadm
and /usr/share/doc/open-iscsi/README.gz
for more details on how to use them. By first calling the host mode without any sub-mode, iscsiadm
prints a list of available iSCSI HBAs along with the host number – shown in the first pair of square brackets – associated with each host by the OS kernel:
root@host:~$ iscsiadm -m host qla4xxx: [1] 10.0.0.5,[84:8f:69:35:fc:70],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2c.4 qla4xxx: [2] 10.0.0.6,[84:8f:69:35:fc:71],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2d.5
The host number – in the above example 1
and 2
– is used in the following examples showing the three sub-modes:
The
stats
sub-mode displays various statistics values, like e.g. TCP/IP and iSCSI sessions, of the given HBA port:root@host:~$ iscsiadm -m host -H 1 -C stats Host Statistics: mactx_frames: 2351750 mactx_bytes: 233065914 mactx_multicast_frames: 1209409 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 4037613 macrx_bytes: 1305799553 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 2409752 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 1694187 iptx_bytes: 112412836 iptx_fragments: 0 iprx_packets: 1446806 iprx_bytes: 721191324 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 1694187 tcptx_bytes: 69463008 tcprx_segments: 1446806 tcprx_byte: 692255204 tcp_duplicate_ack_retx: 8 tcp_retx_timer_expired: 28 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 247594 tcptx_pure_ack: 247710 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 2248673 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 1446486 iscsi_data_bytes_tx: 30308 iscsi_pdu_rx: 1446510 iscsi_data_bytes_rx: 622721801 iscsi_io_completed: 253632 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0
The
chap
sub-mode displays and alters a table containing authentication information. Calling this sub-mode with the-o show
option displays the current contents of the table:root@host:~$ iscsiadm -m host -H 1 -C chap -o show # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 0 host.auth.username_in = <empty> host.auth.password_in = <empty> # END RECORD # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 1 host.auth.username = <empty> host.auth.password = <empty> # END RECORD [...]
Why
show
isn't the default option in the context of thechap
sub-mode, like it is in many otheriscsiadm
modes and sub-modes is something i haven't quite understood yet. Maybe it's a security measure to not accidentially divulge sensitive information, maybe it has just been overlooked by the developers.Usually, there are already two initial records with the indexes
0
and1
present on a HBA. As shown in the example above, each authentication record consists of three parameters. A record indexhost.auth.tbl_idx
to reference it, a usernamehost.auth.username
orhost.auth.username_in
and a passwordhost.auth.password
orhost.auth.password_in
. Depending on whether the record is used for outgoing authentication of an initiator against a target or the other way around for incoming authentication of a target against an initiator, the parameter pairsusername
/password
orusername_in
/password_in
are used. Apparently both types of parameter pairs – incoming and outgoing – cannot be mixed together in a single record. My guess is that this isn't a limitation in Open-iSCSI, but rather a limitation in the specification and/or of the underlying hardware.New authentication records can be added with the
-o new
option:root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o new
root@host:~$ iscsiadm -m host -H 1 -C chap -o show # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 0 host.auth.username_in = <empty> host.auth.password_in = <empty> # END RECORD # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 1 host.auth.username = <empty> host.auth.password = <empty> # END RECORD # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 2 host.auth.username = <empty> host.auth.password = <empty> # END RECORD [...]
Parameters of existing authentication records can be set or updated with the
-o update
option. The particular record to be set or to be updated is selected in with the-x <host.auth.tbl_idx>
option, which references the recordshost.auth.tbl_idx
value. Multiple parameters can be set or updated with a singleiscsiadm
command by calling it with multiple pairs of-n <parameter-name>
and-v <parameter-value>
:root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o update -n host.auth.username -v testuser -n host.auth.password -v testpassword
root@host:~$ iscsiadm -m host -H 1 -C chap -o show # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 0 host.auth.username_in = <empty> host.auth.password_in = <empty> # END RECORD # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 1 host.auth.username = <empty> host.auth.password = <empty> # END RECORD # BEGIN RECORD 2.0-873 host.auth.tbl_idx = 2 host.auth.username = testuser host.auth.password = testpassword # END RECORD [...]
Finally, existing authentication records can be deleted with the
-o delete
option:root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o delete
The
flashnode
sub-mode displays and alters a table containing information about the iSCSI targets. Calling this sub-mode without any other options displays an overview of the currently configured flash nodes (i.e. targets) on a particular HBA:root@host:~$ iscsiadm -m host -H 1 -C flashnode qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002
Similar to the previously mentioned
host
mode, each output line of theflashnode
sub-mode contains an index number for each flash node entry (i.e. iSCSI target), which is shown in the first pair of square brackets. With this index number the individual flash node entries are referenced in all further operations.New flash nodes or target entries can be added with the
-o new
option. This operation also needs the information on whether the target addressed via the flash node will be reached via IPv4 or IPv6 addresses. This is accomplished with the-A ipv4
or-A ipv6
option:root@host:~$ iscsiadm -m host -H 1 -C flashnode -o new -A ipv4 Create new flashnode for host 1. New flashnode for host 1 added at index 1.
If the operation of adding a new flash node is successful, the index under which the new flash node is addressable is returned.
root@host:~$ iscsiadm -m host -H 1 -C flashnode qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 qla4xxx: [1] 0.0.0.0:3260,0 <empty>
Unlike authentication records, the flash node or target records contain a lot more parameters. They can be displayed by selecting a specific record by its index with the
-x <flashnode_idx>
option. The-o show
option is the default and is thus optional:root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 # BEGIN RECORD 2.0-873 flashnode.session.auto_snd_tgt_disable = 0 flashnode.session.discovery_session = 0 flashnode.session.portal_type = ipv4 flashnode.session.entry_enable = 0 flashnode.session.immediate_data = 0 flashnode.session.initial_r2t = 0 flashnode.session.data_seq_in_order = 1 flashnode.session.data_pdu_in_order = 1 flashnode.session.chap_auth_en = 1 flashnode.session.discovery_logout_en = 0 flashnode.session.bidi_chap_en = 0 flashnode.session.discovery_auth_optional = 0 flashnode.session.erl = 0 flashnode.session.first_burst_len = 0 flashnode.session.def_time2wait = 0 flashnode.session.def_time2retain = 0 flashnode.session.max_outstanding_r2t = 0 flashnode.session.isid = 000e1e17da2c flashnode.session.tsid = 0 flashnode.session.max_burst_len = 0 flashnode.session.def_taskmgmt_tmo = 10 flashnode.session.targetalias = <empty> flashnode.session.targetname = <empty> flashnode.session.discovery_parent_idx = 0 flashnode.session.discovery_parent_type = Sendtarget flashnode.session.tpgt = 0 flashnode.session.chap_out_idx = 2 flashnode.session.chap_in_idx = 65535 flashnode.session.username = <empty> flashnode.session.username_in = <empty> flashnode.session.password = <empty> flashnode.session.password_in = <empty> flashnode.session.is_boot_target = 0 flashnode.conn[0].is_fw_assigned_ipv6 = 0 flashnode.conn[0].header_digest_en = 0 flashnode.conn[0].data_digest_en = 0 flashnode.conn[0].snack_req_en = 0 flashnode.conn[0].tcp_timestamp_stat = 0 flashnode.conn[0].tcp_nagle_disable = 0 flashnode.conn[0].tcp_wsf_disable = 0 flashnode.conn[0].tcp_timer_scale = 0 flashnode.conn[0].tcp_timestamp_en = 0 flashnode.conn[0].fragment_disable = 0 flashnode.conn[0].max_xmit_dlength = 0 flashnode.conn[0].max_recv_dlength = 65536 flashnode.conn[0].keepalive_tmo = 0 flashnode.conn[0].port = 3260 flashnode.conn[0].ipaddress = 0.0.0.0 flashnode.conn[0].redirect_ipaddr = 0.0.0.0 flashnode.conn[0].max_segment_size = 0 flashnode.conn[0].local_port = 0 flashnode.conn[0].ipv4_tos = 0 flashnode.conn[0].ipv6_traffic_class = 0 flashnode.conn[0].ipv6_flow_label = 0 flashnode.conn[0].link_local_ipv6 = <empty> flashnode.conn[0].tcp_xmit_wsf = 0 flashnode.conn[0].tcp_recv_wsf = 0 flashnode.conn[0].statsn = 0 flashnode.conn[0].exp_statsn = 0 # END RECORD
From the various parameters of a flash node or target record, the following are the most relevant in day to day use:
flashnode.session.chap_auth_en: Controls whether the initiator should authenticate against the target. This is enabled by default.
flashnode.session.bidi_chap_en: Controls whether the target should also authenticate itself against the initiator. This is disabled by default.
flashnode.session.targetname: The IQN of the target to be logged into and to be accessed.
flashnode.session.chap_out_idx: The index number (i.e. the value of the
host.auth.tbl_idx
parameter) of the authentication record to be used for authentication of the initiator against the target.flashnode.conn[0].port: The TCP port of the target portal. The default is port 3260.
flashnode.conn[0].ipaddress: The IP address of the target portal.
The parameter pairs
flashnode.session.username
/flashnode.session.password
andflashnode.session.username_in
/flashnode.session.password_in
are handled differently than all the other parameters. They are not set or updated directly, but are rather filled in automatically. This is done by setting the respectiveflashnode.session.chap_out_idx
orflashnode.session.chap_in_idx
parameter to a value which references the index (i.e. the valuehost.auth.tbl_idx
parameter) of an appropriate authentication record.Parameters of existing flash nodes or target entries can be set or updated with the
-o update
option. The particular record of which the parameters are to be set or to be updated is selected with the-x <flashnode_idx>
option. This references an index number gathered from the list of flash nodes or a index number returned at the time of creation of a particular flash node. Multiple parameters can be set or updated with a singleiscsiadm
command by calling it with multiple pairs of-n <parameter-name>
and-v <parameter-value>
:root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o update -n flashnode.session.chap_out_idx -v 2 -n flashnode.session.targetname -v iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003 -n flashnode.conn[0].ipaddress -v 10.0.0.2
The flash node or target entry updated by this command is shown below in a cut-down fashion for brevity:
root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 # BEGIN RECORD 2.0-873 [...] flashnode.session.chap_auth_en = 1 flashnode.session.discovery_logout_en = 0 flashnode.session.bidi_chap_en = 0 [...] flashnode.session.targetname = iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003 [...] flashnode.session.chap_out_idx = 4 flashnode.session.chap_in_idx = 65535 flashnode.session.username = testuser flashnode.session.username_in = <empty> flashnode.session.password = testpassword flashnode.session.password_in = <empty> [...] flashnode.conn[0].port = 3260 flashnode.conn[0].ipaddress = 10.0.0.2 flashnode.conn[0].redirect_ipaddr = 0.0.0.0 [...] # END RECORD
Once configured,
login
andlogout
actions can be performed on the flash node (i.e. target) with the repective command options:root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o login root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o logout
Unfortunately the
iscsiadm
command will return with a success status on thelogin
andlogout
actions once it has passed them successfully to the HBA. This does not reflect on the status of the actuallogin
andlogout
actions subsequently taken by the HBA against the target configured in the respective flash node! To my knowledge there is currently no information passed back to the command line about the result of thelogin
andlogout
actions at the HBA levels.Newly established as well as already existing iSCSI sessions via the hardware initiator which were set up with the
login
action shown above, are shown along with all other Open-iSCSI session information in the output of theiscsiadm
command in itssession
mode:root@host:~$ iscsiadm -m session qla4xxx: [1] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash) qla4xxx: [2] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash) [...]
The fact that the session information is about a iSCSI session established via a hardware initiator is only signified by the
flash
label in the parentheses at the end of each line. In case of a iSCSI session established via a software initiator, the label in the parentheses readsnon-flash
.Finally, existing flash nodes can be deleted with the
-o delete
option:root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o delete
The records from both, the chap
and the flashnode
table, are stored in the HBAs flash memory. For the limits on how many entries can be stored in each table, see the specification of the particular HBA.
In my opinion, the integration of management of the QLogic hardware initators into the Open-iSCSI iscsiadm
command improves and simplifies the administration and configuration a lot over the previous IOCTL based management method via the QLogic QConvergeConsole (qaucli
). It finally opens management access to the QLogic hardware initiators to non-“enterprise Linux distributions” like Debian. Definately a big step in the right direction! The importance of using a current version of Open-iSCSI can – in my experience – not be stressed enough. Building and maintaining a package based on a current version from the projects GitHub repository is definitely worth the effort.
One thing i couldn't get to work though was the incoming part of a bi-directional CHAP authentication. In this scenario, not only does the initiator authenticate itself at the target for a iSCSI session to be successfully established, the target also has to authenticate itself against the initiator. My initial thought was, that a setup with bi-directional CHAP authentication should be easily accomplished by just performing the following three steps:
creating an incoming authentication record with the value of the parameters
host.auth.username_in
andhost.auth.password_in
set to the respective values configured at the target storage system.setting the value of the flash node parameter
flashnode.session.bidi_chap_en
to1
.setting the value of the flash node parameter
flashnode.session.chap_in_idx
to the value of the parameterhost.auth.tbl_idx
, gathered from the newly created incoming authentication record in step 1.
The first two of the above tasks were indeed easily accomplished. The third one seemed easy too, but turned out to be more of a challenge. Setting the flash node parameter flashnode.session.chap_in_idx
to the value of the host.auth.tbl_idx
parameter from the previously created incoming authentication record just didn't work. Any attempt to change the default value 65535
failed. Neither was the flashnode.session.username_in
/flashnode.session.password_in
parameter pair automatically updated with the values from the parameters host.auth.username_in
and host.auth.password_in
. Oddly enough bi-directional CHAP authentication worked as long as there only was one storage system with one set of incoming authentication credentials! Adding another set of flash nodes for a second storage system with its own set of incoming authentication credentials would cause the bi-directional CHAP authentication to fail for all targets on this second storage system.
Being unable to debug this weird behaviour any further on my own, i turned to the Open-iSCSI mailing list for help. See the thread on the Open-iSCSI mailing list for the details. Don't be confused by the fact that the thread at the mailing list was initially about getting to work the network communication with the use of jumbo frames. This initial issue was resolved for me by switching to a more recent Open-iSCSI version as already mentioned above. My last question there was not answered publicly on the mailing list, but Adheer Chandravanshi from development at QLogic got in touch via email. Here's his explanation of the observed behaviour:
[…]
I see that you have added multiple incoming CHAP entries for both hosts 1 and 2.
But looks like even in case of multiple incoming CHAP entries in flash only the first entry takes effect for that particular host for all
the target node entries in flash.
This seems to be a limitation with the flash target node entries.
So you cannot map different incoming CHAP entry for different target nodes in HBA flash.
In your case, only following incoming CHAP entry will be effective for Host 1 as it's the first incoming chap entry. Same goes for Host
2.# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 2
host.auth.username_in = <username-from-target1>
host.auth.password_in = <password-from-target1>
# END RECORD
Try using only one incoming CHAP entry per host, you can have different outgoing CHAP entries though for each flash target node.
[…]
To sum this up, the QLogic HBAs basically use a first match approach when it comes to the incoming part of a bi-directional CHAP authentication. After finding the first incoming authentication record that is configured, it uses the credentials stored there. Any other – and possibly more suitable – records for incoming authentication are ignored. There's also no way to override this behaviour on a case by case basis via the flash node entries (i.e. iSCSI targets).
In my humble opinion this is a rather serious limitation of the security features in the QLogic hardware initiators. No security policy i have ever encountered in any organisation would allow for the reuse of authentication credentials over different systems. Unfortunately i have no further information as to why the implementation turned out this way. Maybe there was no feature request for this yet, maybe it was just an oversight or maybe there is a limitation in the hardware, preventing a more flexible implementation. Unfortunately my reply to the above email with an inquiry whether such a feature would possibly be implemented in future firmware versions has – up to now – not been answered.