2017-04-12 // Check_MK Monitoring - Open-iSCSI
The Open-iSCSI project provides a high-performance, transport independent, implementation of RFC 3720 iSCSI for Linux. It allows remote access to SCSI targets via TCP/IP over several different transport technologies. This article introduces a new Check_MK service check to monitor the status of Open-iSCSI sessions as well as the monitoring of several statistical metrics on Open-iSCSI sessions and iSCSI hardware initiator hosts.
For the impatient and TL;DR here is the Check_MK package of the Open-iSCSI monitoring checks:
Open-iSCSI monitoring checks (Compatible with Check_MK versions 1.2.8 and later)
The sources are to be found in my Check_MK repository on GitHub
The Check_MK service check to monitor Open-iSCSI consists of two major parts, an agent plugin and three check plugins.
The first part, a Check_MK agent plugin named open-iscsi
, is a simple Bash shell script. It calls the Open-iSCSI administration tool iscsiadm
in order to retrieve a list of currently active iSCSI sessions. The exact call to iscsiadm
to retrieve the session list is:
/usr/bin/iscsiadm -m session -P 1
If there are any active iSCSI sessions, the open-iscsi
agent plugin also tries to collect several statistics for each iSCSI session. This is done by another call to iscsiadm
for each iSCSI Session ${SID}
, which is shown in the following example:
/usr/bin/iscsiadm -m session -r ${SID} -s
Unfortunately, the iSCSI session statistics are currently only supported for Open-iSCSI software initiators or dependent hardware iSCSI initiators like the Broadcom BCM577xx or BCM578xx adapters which are covered by the bnx2i
kernel module. See Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell and Backporting Open-iSCSI to Debian 8 "Jessie" for additional information on those dependent hardware iSCSI initiators.
For hardware iSCSI initiators, like the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs, which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware, there is currently no support for iSCSI session statistics. Instead, the open-iscsi
agent plugin collects several global statistics on each iSOE host ${HST}
which is covered by the qla4xxx
kernel module with the command shown in the following example:
/usr/bin/iscsiadm -m host -H ${HST} -C stats
The output of the above commands is parsed and reformated by the agent plugin for easier processing in the check plugins. The following example shows the agent plugin output for a system with two BCM578xx dependent hardware iSCSI initiators:
<<<open-iscsi_sessions>>> bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:bf:2d eth2 10.0.3.52 LOGGED_IN LOGGED_IN NO_CHANGE bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:c2:34 eth3 10.0.3.53 LOGGED_IN LOGGED_IN NO_CHANGE <<<open-iscsi_session_stats>>> [session stats f8:ca:b8:7d:bf:2d iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 40960 rxdata_octets: 461171313 noptx_pdus: 0 scsicmd_pdus: 153967 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 153967 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 112420 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0 [session stats f8:ca:b8:7d:c2:34 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 16384 rxdata_octets: 255666052 noptx_pdus: 0 scsicmd_pdus: 84312 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 84312 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 62418 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0
The next example shows the agent plugin output for a system with two QLogic 8200 Series hardware iSCSI initiators:
<<<open-iscsi_sessions>>> qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7d.ipv4.0 none 10.0.3.50 LOGGED_IN Unknown Unknown qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7e.ipv4.0 none 10.0.3.51 LOGGED_IN Unknown Unknown <<<open-iscsi_host_stats>>> [host stats f8:ca:b8:7d:c1:7d iqn.2000-04.com.qlogic:isp8214.000e1e3574ac.4] mactx_frames: 563454 mactx_bytes: 52389948 mactx_multicast_frames: 877513 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573455 macrx_bytes: 440845678 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508160 iptx_bytes: 29474232 iptx_fragments: 0 iprx_packets: 401785 iprx_bytes: 354673156 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508160 tcptx_bytes: 19310736 tcprx_segments: 401785 tcprx_byte: 346637456 tcp_duplicate_ack_retx: 1 tcp_retx_timer_expired: 1 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106449 tcptx_pure_ack: 106489 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 695915 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401697 iscsi_data_bytes_tx: 29225 iscsi_pdu_rx: 401697 iscsi_data_bytes_rx: 327355963 iscsi_io_completed: 101 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0 [host stats f8:ca:b8:7d:c1:7e iqn.2000-04.com.qlogic:isp8214.000e1e3574ad.5] mactx_frames: 563608 mactx_bytes: 52411412 mactx_multicast_frames: 877517 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573572 macrx_bytes: 441630442 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508310 iptx_bytes: 29490504 iptx_fragments: 0 iprx_packets: 401925 iprx_bytes: 355436636 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508310 tcptx_bytes: 19323952 tcprx_segments: 401925 tcprx_byte: 347398136 tcp_duplicate_ack_retx: 2 tcp_retx_timer_expired: 4 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106466 tcptx_pure_ack: 106543 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 696035 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401787 iscsi_data_bytes_tx: 37970 iscsi_pdu_rx: 401791 iscsi_data_bytes_rx: 328112050 iscsi_io_completed: 127 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0
Although a simple Bash shell script, the agent plugin open-iscsi
has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands iscsiadm
, sed
, tr
and egrep
. On Debian based systems, the necessary packages can be installed with the following command:
root@host:~# apt-get install coreutils grep open-iscsi sed
The second part of the Check_MK service check for Open-iSCSI provides the necessary check logic through individual inventory and check functions. This is implemented in the three Check_MK check plugins open-iscsi_sessions
, open-iscsi_host_stats
and open-iscsi_session_stats
, which will be discussed separately in the following sections.
Open-iSCSI Session Status
The check plugin open-iscsi_sessions
is responsible for the monitoring of individual iSCSI sessions and their internal session states. Upon inventory this check plugin creates a service check for each pair of iSCSI network interface name and IQN of the iSCSI target volume. Unlike the iSCSI session ID, which changes over time (e.g. after iSCSI logout and login), this pair uniquely identifies a iSCSI session on a host. During normal check execution, the list of currently active iSCSI sessions on a host is compared to the list of active iSCSI sessions gathered during inventory on that host. If a session is missing or if the session has an erroneous internal state, an alarm is raised accordingly.
For all types of initiators – software, dependent hardware and hardware – there is the state session_state
which can take on the following values:
ISCSI_STATE_FREE ISCSI_STATE_LOGGED_IN ISCSI_STATE_FAILED ISCSI_STATE_TERMINATE ISCSI_STATE_IN_RECOVERY ISCSI_STATE_RECOVERY_FAILED ISCSI_STATE_LOGGING_OUT
An alarm is raised if the session is in any state other than ISCSI_STATE_LOGGED_IN
. For software and dependent hardware initiators there are two additional states – connection_state
and internal_state
. The state connection_state
can take on the values:
FREE TRANSPORT WAIT IN LOGIN LOGGED IN IN LOGOUT LOGOUT REQUESTED CLEANUP WAIT
and internal_state
can take on the values:
NO CHANGE CLEANUP REOPEN REDIRECT
In addition to the above session_state
, an alarm is raised if the connection_state
is in any other state than LOGGED IN
and internal_state
is in any other state than NO CHANGE
.
No performance data is currently reported by this check.
Open-iSCSI Hosts Statistics
The check plugin open-iscsi_host_stats
is responsible for the monitoring of the global statistics on a iSOE host. Upon inventory this check plugin creates a service check for each pair of MAC address and iSCSI network interface name. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is determined for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_host_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSOE host statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Host Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI host statistics values [x] The levels for the number of transmitted MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of received MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 abort frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 jumbo frames on an iSOE host. [x] The levels for the number of MAC/Layer2 late transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 single transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 multiple transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 collisions on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dribble on an iSOE host. [x] The levels for the number of received MAC/Layer2 frame length errors on an iSOE host. [x] The levels for the number of discarded received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 jabber on an iSOE host. [x] The levels for the number of received MAC/Layer2 carrier sense errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 CRC errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 encoding errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too large errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too small errors on an iSOE host. [x] The levels for the number of transmitted IP packets on an iSOE host. [x] The levels for the number of received IP packets on an iSOE host. [x] The levels for the number of transmitted IP bytes on an iSOE host. [x] The levels for the number of received IP bytes on an iSOE host. [x] The levels for the number of transmitted IP fragments on an iSOE host. [x] The levels for the number of received IP fragments on an iSOE host. [x] The levels for the number of IP datagram reassemblies on an iSOE host. [x] The levels for the number of IP invalid address errors on an iSOE host. [x] The levels for the number of IP packet errors on an iSOE host. [x] The levels for the number of IP fragmentation overlaps on an iSOE host. [x] The levels for the number of IP fragmentation out-of-order on an iSOE host. [x] The levels for the number of IP datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted IPv6 packets on an iSOE host. [x] The levels for the number of received IPv6 packets on an iSOE host. [x] The levels for the number of transmitted IPv6 bytes on an iSOE host. [x] The levels for the number of received IPv6 bytes on an iSOE host. [x] The levels for the number of transmitted IPv6 fragments on an iSOE host. [x] The levels for the number of received IPv6 fragments on an iSOE host. [x] The levels for the number of IPv6 datagram reassemblies on an iSOE host. [x] The levels for the number of IPv6 invalid address errors on an iSOE host. [x] The levels for the number of IPv6 packet errors on an iSOE host. [x] The levels for the number of IPv6 fragmentation overlaps on an iSOE host. [x] The levels for the number of IPv6 fragmentation out-of-order on an iSOE host. [x] The levels for the number of IPv6 datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted TCP segments on an iSOE host. [x] The levels for the number of received TCP segments on an iSOE host. [x] The levels for the number of transmitted TCP bytes on an iSOE host. [x] The levels for the number of received TCP bytes on an iSOE host. [x] The levels for the number of duplicate TCP ACK retransmits on an iSOE host. [x] The levels for the number of received TCP retransmit timer expiries on an iSOE host. [x] The levels for the number of received TCP duplicate ACKs on an iSOE host. [x] The levels for the number of received TCP pure ACKs on an iSOE host. [x] The levels for the number of transmitted TCP delayed ACKs on an iSOE host. [x] The levels for the number of transmitted TCP pure ACKs on an iSOE host. [x] The levels for the number of received TCP segment errors on an iSOE host. [x] The levels for the number of received TCP segment out-of-order on an iSOE host. [x] The levels for the number of received TCP window probe on an iSOE host. [x] The levels for the number of received TCP window update on an iSOE host. [x] The levels for the number of transmitted TCP window probe persist on an iSOE host. [x] The levels for the number of transmitted iSCSI PDUs on an iSOE host. [x] The levels for the number of received iSCSI PDUs on an iSOE host. [x] The levels for the number of transmitted iSCSI Bytes on an iSOE host. [x] The levels for the number of received iSCSI Bytes on an iSOE host. [x] The levels for the number of iSCSI I/Os completed on an iSOE host. [x] The levels for the number of iSCSI unexpected I/Os on an iSOE host. [x] The levels for the number of iSCSI format errors on an iSOE host. [x] The levels for the number of iSCSI header digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI data digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI sequence errors on an iSOE host. [x] The levels for the number of ECC error corrections on an iSOE host.
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_host_stats
(iSCSI Host Stats) service checks over two QLogic 8200 Series hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
– in this example LOGGED_IN
– is shown. There are also two iSCSI Host Stats service check items in the example, which are pairs of MAC addresses and iSCSI network interface names. For each of those items the current throughput rate on the MAC, IP/IPv6, TCP and iSCSI protocol layer is shown. The throughput rate on the MAC protocol layer is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following three images show examples of the PNP4Nagios graphs for the open-iscsi_host_stats
(iSCSI Host Stats) service check.
The middle graph shows a combined view of the throughput rate for received and transmitted traffic on the different MAC, IP/IPv6, TCP and iSCSI protocol layers. The upper graph shows the throughput rate for various frame types on the MAC protocol layer. The lower graph shows the rate for various error frame types on the MAC protocol layer.
The upper graph shows the throughput rate for received and transmitted traffic on the IP/IPv6 protocol layer. The middle graph shows the rate for various error packet types on the IP/IPv6 protocol layer. The lower graph shows the throughput rate for received and transmitted traffic on the TCP protocol layer.
The first graph shows the rate for various protocol control and error segment types on the TCP protocol layer. The second graph shows the rate of ECC error corrections that occured on the QLogic 8200 Series hardware iSCSI initiator. The third graph shows the throughput rate for received and transmitted traffic on the iSCSI protocol layer. The fourth and last graph shows the rate for various control and error PDUs on the iSCSI protocol layer.
Open-iSCSI Session Statistics
The check plugin open-iscsi_session_stats
is responsible for the monitoring of the statistics on individual iSCSI sessions. Upon inventory this check plugin creates a service check for each pair of MAC address of the network interface and IQN of the iSCSI target volume. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is collected for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_session_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSCSI session statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Session Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI session statistics values [x] The levels for the number of transmitted bytes in an Open-iSCSI session [x] The levels for the number of received bytes in an Open-iSCSI session [x] The levels for the number of digest (CRC) errors in an Open-iSCSI session [x] The levels for the number of timeout errors in an Open-iSCSI session [x] The levels for the number of transmitted NOP commands in an Open-iSCSI session [x] The levels for the number of received NOP commands in an Open-iSCSI session [x] The levels for the number of transmitted SCSI command requests in an Open-iSCSI session [x] The levels for the number of received SCSI command reponses in an Open-iSCSI session [x] The levels for the number of transmitted task management function commands in an Open-iSCSI session [x] The levels for the number of received task management function responses in an Open-iSCSI session [x] The levels for the number of transmitted login requests in an Open-iSCSI session [x] The levels for the number of transmitted logout requests in an Open-iSCSI session [x] The levels for the number of received logout responses in an Open-iSCSI session [x] The levels for the number of transmitted text PDUs in an Open-iSCSI session [x] The levels for the number of received text PDUs in an Open-iSCSI session [x] The levels for the number of transmitted data PDUs in an Open-iSCSI session [x] The levels for the number of received data PDUs in an Open-iSCSI session [x] The levels for the number of transmitted single negative ACKs in an Open-iSCSI session [x] The levels for the number of received ready to transfer PDUs in an Open-iSCSI session [x] The levels for the number of received reject PDUs in an Open-iSCSI session [x] The levels for the number of received asynchronous messages in an Open-iSCSI session
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_session_stats
(iSCSI Session Stats) service checks over two BCM578xx dependent hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
, connection_state
and internal_state
– in this example with the respective values LOGGED_IN
, LOGGED_IN
and NO_CHANGE
– are shown. There are also an equivalent number of iSCSI Session Stats service check items in the example, which are also pairs of MAC addresses of the network interfaces and IQNs of the iSCSI target volumes. For each of those items the current throughput rate of the individual iSCSI session is shown. As long as the rate of the digest (CRC) and timeout error counters is zero, the string no protocol errors is displayed. Otherwise the name and throughput rate of any non-zero error counter is shown. The throughput rate of the iSCSI session is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following image shows an example of the three PNP4Nagios graphs for a single open-iscsi_session_stats
(iSCSI Session Stats) service check.
The upper graph shows the throughput rate for received and transmitted traffic of the iSCSI session. The middle graph shows the rate for received and transmitted iSCSI PDUs, broken down by the different types of PDUs on the iSCSI protocol layer. The lower graph shows the rate for the digest (CRC) and timeout errors on the iSCSI protocol layer.
The described Check_MK service check to monitor the status of Open-iSCSI sessions, Open-iSCSI session metrics and iSCSI hardware initiator host metrics has been verified to work with version 2.0.874-2~bpo8+1 of the open-iscsi
package from the backports repository of Debian stable
(Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.
I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.
2016-08-28 // Backporting Open-iSCSI to Debian 8 "Jessie"
The Debian Open-iSCSI package is now based on current upstream version of Open-iSCSI. Open-iSCSIs iscsiuio is now provided through its own Debian package. Several improvements (Git commit d05fe0e1, Git commit 6004a7e7) have been made in handling hardware initiator based iSCSI sessions.
Thanks to Christian Seiler for his work on bringing the Debian Open-iSCSI package up to a current upstream version and for helping to sort our some issues related to the use of hardware initiators!
In the previous article Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell i mentioned using a backported version of Open-iSCSI on Debian 8 (“Jessie”). This new post describes the backport and the changes provided by it in greater detail. All the changes to the original Debian package from “unstable” (“Sid”) can be found in my Debian Open-iSCSI Packaging repository on GitHub.
Starting point was a clone of the Debian Open-iSCSI Packaging repository at Git commit df150d90. Mind though, that in the meantime between creating the backport and writing this, the Debian Open-iSCSI maintainers have been busy and a more recent version of the Debian Open-iSCSI package from “unstable” (“Sid”) is now available.
Within this particular version of the Debian Open-iSCSI package, i first enabled the build of Open-iSCSIs iscsiuio. On the one hand, this was done in order to ensure that the iscsiuio code would successfully build even at this old level of the Open-iSCSI code. On the other hand, this would be used as a differentiator for any issues surfacing later on, after the move to the more recent upstream Open-iSCSI sources, indicating the root cause of those would then solely be with the newer upstream version of Open-iSCSI. Some integration into the general system environment was also added at this point. In detail the changes were:
Git commit 32c96e6c removes the Debian patch
05-disable-iscsiuio.patch
which disables the build of iscsiuio.Git commit 984344a1 enables the build of iscsiuio, extends the cleanup build targets and adds iscsiuio to the dh_systemd build targets.
Git commit 89d845a9 adds the results from the successful build – the iscsiuio binary, the iscsiuio manual page, a readme file and a logrotate configuration file – to the Debian package. It also adds the kernel modules
bnx2i
andcnic
to the list of kernel modules to be loaded at installation time.Git commit 89195bbe adds the systemd service and socket unit files for iscsiuio. Those files have been taken from this discussion on the Open-iSCSI mailing list and have slightly been altered.
With the above changes a intermediary package was build for testing purposes. During the following tests sometimes all currently mounted filesystems – even those distinctly not based on iSCSI volumes – would suddenly be unmounted. For some filesystems this would succeed, for others, like e.g. the /var
and the root filesystem, this would fail due to them being currently in use. The issue particularly occured while stopping the open-iscsi
service either via its regular init script or via its systemd service. This is usually done at system shutdown or during uninstall of the Open-iSCSI package. Tracking down the root cause of this issue led to an unhandled case in the umountiscsi.sh
script, which is called while stopping the open-iscsi
service. Specifically, the following code section is responsible for the observed behaviour:
- debian/extra/umountiscsi.sh
256 if [ $HAVE_LVM -eq 1 ] ; then 257 # Look for all LVM volume groups that have a backing store 258 # on any iSCSI device we found. Also, add $LVMGROUPS set in 259 # /etc/default/open-iscsi (for more complicated stacking 260 # configurations we don't automatically detect). 261 for _vg in $(cd /dev ; $PVS --noheadings -o vg_name $iscsi_disks $iscsi_partitions $iscsi_multipath_disks $iscsi_multipath_partitions 2>/dev/null) $LVMGROUPS ; do 262 add_to_set iscsi_lvm_vgs "$_vg" 263 done
The heuristic of the umountiscsi.sh
script are trying to identify iSCSI based disk devices which are valid candidates for proper deactivation upon system shutdown. It turned out that in LVM based setups where there are currently no iSCSI based disk devices present, the variables $iscsi_disks
, $iscsi_partitions
, $iscsi_multipath_disks
and $iscsi_multipath_partitions
are left empty by the scripts logic. In line 261 in the above code snippet, this leads to a call to the pvs --noheadings -o vg_name
command without any additional arguments limiting its output of volume groups. Hence, the returned output is instead a complete list of all volume groups currently present on the system. Based on this list, the associated logical volumes for each volume group are determined and added to the list of devices to be unmounted. Finally all devices in this list are actually unmounted.
Without making too invasive changes to the script logic of umountiscsi.sh
a quick'n'dirty solution was to introduce a check before the call to pvs
which would determine whether the variables $iscsi_disks
, $iscsi_partitions
, $iscsi_multipath_disks
and $iscsi_multipath_partitions
are all empty. If this is the case, the call to pvs
is simply skipped. The following patch shows the necessary code changes which are also available in Git commit 5118af7f:
- umountiscsi.sh.patch
diff --git a/debian/extra/umountiscsi.sh b/debian/extra/umountiscsi.sh index 1206fa1..485069c 100755 --- a/debian/extra/umountiscsi.sh +++ b/debian/extra/umountiscsi.sh @@ -258,9 +258,11 @@ enumerate_iscsi_devices() { # on any iSCSI device we found. Also, add $LVMGROUPS set in # /etc/default/open-iscsi (for more complicated stacking # configurations we don't automatically detect). - for _vg in $(cd /dev ; $PVS --noheadings -o vg_name $iscsi_disks $iscsi_partitions $iscsi_multipath_disks $iscsi_multipath_partitions 2>/dev/null) $LVMGROUPS ; do - add_to_set iscsi_lvm_vgs "$_vg" - done + if [ -n "$iscsi_disks" -o -n "$iscsi_partitions" -o -n "$iscsi_multipath_disks" -o -n "$iscsi_multipath_partitions" ]; then + for _vg in $(cd /dev ; $PVS --noheadings -o vg_name $iscsi_disks $iscsi_partitions $iscsi_multipath_disks $iscsi_multipath_partitions 2>/dev/null) $LVMGROUPS ; do + add_to_set iscsi_lvm_vgs "$_vg" + done + fi # $iscsi_lvm_vgs is now unique list for _vg in $iscsi_lvm_vgs ; do
After this was fixed, the last step was to finally move to the more recent upstream Open-iSCSI sources. In detail the changes in this last step were:
Git commit f5ab51ff moves the code to version 2.0.873+git1.1dfb88a4 which is based upon the upstream Git commit 1dfb88a4. This is the last commit before the externalization of the Open-iSNS library. Since i didn't want to also backport the Open-iSNS packages from Debian “unstable” (“Sid”), i decided to just skip the next two upstream commits 76832662 and c6d1117b and stick with the locally delivered Open-iSNS library.
Git commit 8c1e6974 removes the local Debian patches
01_spelling-errors-and-manpage-hyphen-fixes.patch
,02_make-iscsistart-a-dynamic-binary.patch
and03_respect-build-flags.patch
which have already been merged into the more recent upstream Open-iSCSI sources. The remaining local Debian patches were renamed and reordered to01_fix_iscsi_path.patch
,02_var-lock_var-run_transition.patch
and03_makefile_reproducibility_issues.patch
. A whole bunch of new patches named{04,05,06,07,08,09,10,11,12,13,14,15}_upstream_git_commit_<Git commit ID>.patch
were added in order to bring the sources up to the – by then most recent – upstream Git commit 0fa43f29.Git commit d051dece removes some files from the Debian package, which were dynamically generated during the build of iscsiuio.
Finally Git commit 0fabb948 deals with the issue described in Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell.
With the steps and changes described above, a backported version of Open-iSCSI using its most recent sources was created as a package for Debian 8 (“Jessie”). This package also supports offloaded iSCSI connections via the Broadcom BCM577xx and BCM578xx iSOEs with the use of iscsiuio. The package has been in production use for over a month now and no major issues – neither with the newer upstream Open-iSCSI sources, nor with use of Broadcom BCM577xx and BCM578xx iSOEs through iscsiuio – have emerged so far.
2016-08-05 // Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell
Open-iSCSIs tool iscsiuio
, which is used to configure and manage Broadcom BCM577xx and BCM578xx iSCSI offload engines (iSOE), currently crashes with a segmentation fault upon target login on Intel Broadwell based systems. Comparable system setups, based on the older Intel Haswell chips do not show this issue.
In the past i've been using QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware. Unfortunately the QLogic 8200 Series network adapters are no longer available for Dell M-Series blade servers. The alternatives offered by Dell are the Intel X520 and Intel X540 series adapters, or the Broadcom BCM57810S series adapters. Instead of using the Intel X520/X540, which provide no iSOE at all, i decided to go with the Broadcom BCM57810S, which at least provide some kind of iSOE. According to the VMware terminology the Broadcom BCM57810S are dependent hardware iSCSI Initiators. Dependent in this context means, that the iSOE does not implement all the necessary features and thus cannot perform all the tasks (e.g. TCP/IP stack, configuration and session management, authentication, etc.) necessary for target handling by itself. Rather, some of these tasks are provided by a third party on which this kind of iSOE depends on. In case of the Broadcom BCM57810S this third party is the iscsiuio
daemon, which has for some time been part of the Open-iSCSI project. Simply put, the iscsiuio
daemon acts as an intermediary between the iscsid
on the one side and the QLogic1) NetXtreme II driver (kernel module bnx2
or bnx2x
) and the QLogic2) CNIC driver (kernel module cnic
) on the other side, facilitating the creation and overall management of offloaded iSCSI sessions. Very simplified, the flow of information is as follows:
iscsiadm
←→ iscsid
←→ iscsiuio
←→ bnx2
/bnx2x
←→ cnic
←→ Broadcom BCM57810S adapter ←→ Network ←→ Target
In my environment the Broadcom BCM57810S adapters are installed and used on six hosts (host1
and host{5,6,7,8,9}
). They all connect to the same Dell EqualLogic storage systems in the backend, using the same network dedicated to iSCSI traffic. All hosts are Dell M630 blade servers with exactly the same firmware, operating system (Debian 8) and software versions. I'm using a backported version of Open-iSCSI, which is based on Git commit 0fa43f29, but excluding the commits 76832662 and c6d1117b which just implement the externalization of the Open-iSNS code. The systems originate from the same install image, so their configuration is – to a large extent – exactly the same. The only difference between the hosts is that host1
has Intel E5 v3 (aka Haswell) CPUs, while host{5,6,7,8,9}
have Intel E5 v4 (aka Broadwell) CPUs.
On host1
everything works fine, iscsiuio
runs as expected and access to targets via the Broadcom BCM57810S iSOEs is working flawlessly. On host{5,6,7,8,9}
on the other hand, i was getting segmentation faults like the one in the example shown below, while trying to log in to any target.
host5:~# gdb /sbin/iscsiuio GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /sbin/iscsiuio...(no debugging symbols found)...done. (gdb) # run -d 4 -f Starting program: /sbin/iscsiuio -d 4 -f [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". INFO [Wed Jul 27 10:01:45 2016]Initialize logger using log file: /var/log/iscsiuio.log INFO [Wed Jul 27 10:01:45 2016]Started iSCSI uio stack: Ver 0.7.8.2 INFO [Wed Jul 27 10:01:45 2016]Build date: Fri Jul 22 15:40:04 CEST 2016 INFO [Wed Jul 27 10:01:45 2016]Debug mode enabled INFO [Wed Jul 27 10:01:45 2016]Running on sysname: 'Linux', release: '3.16.0-4-amd64', version '#1 SMP Debian 3.16.7-ckt 25-2+deb8u3 (2016-07-02)' machine: 'x86_64' DBG [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2' Version: '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016' DBG [Wed Jul 27 10:01:45 2016]Added 'bnx2' nic library DBG [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2x' Version: '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016' DBG [Wed Jul 27 10:01:45 2016]Added 'bnx2x' nic library [New Thread 0x7ffff760f700 (LWP 4942)] INFO [Wed Jul 27 10:01:45 2016]signal handling thread ready INFO [Wed Jul 27 10:01:45 2016]nic_utils Found host[11]: host11 INFO [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host11/netdev INFO [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host11/netdev INFO [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth3 WARN [Wed Jul 27 10:01:45 2016]Could not find assoicate uio device with eth3 ERR [Wed Jul 27 10:01:45 2016]nic_utils Could not determine UIO name for eth3 INFO [Wed Jul 27 10:01:45 2016]nic_utils Found host[12]: host12 INFO [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host12/netdev INFO [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host12/netdev INFO [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth2 INFO [Wed Jul 27 10:01:45 2016]nic_utils eth2 associated with uio0 INFO [Wed Jul 27 10:01:45 2016]nic_utils NIC not found creating an instance for host_no: 12 eth2 DBG [Wed Jul 27 10:01:45 2016]Could not increase process priority: Success [New Thread 0x7ffff6e0e700 (LWP 4943)] DBG [Wed Jul 27 10:01:45 2016]iscsi_ipc Started iscsid listening thread DBG [Wed Jul 27 10:01:45 2016]iscsi_ipc Waiting for iscsid command INFO [Wed Jul 27 10:01:45 2016]NIC_NL Netlink to CNIC on pid 4938 is ready DBG [Wed Jul 27 10:01:57 2016]iscsi_ipc recv iscsid request: cmd: 1, payload_len: 11720 INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc Received request for 'eth2' to set IP address: '10.0.1.62' VLAN: '0' INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc Using netmask: 0.0.0.0 INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2, using existing NIC INFO [Wed Jul 27 10:01:57 2016]nic_utils looking for uio device for eth2 INFO [Wed Jul 27 10:01:57 2016]nic_utils eth2 associated with uio0 INFO [Wed Jul 27 10:01:57 2016]Done capturing /sys/class/uio/uio0/name INFO [Wed Jul 27 10:01:57 2016]nic_utils eth2: Verified uio name bnx2x_cnic with library bnx2x INFO [Wed Jul 27 10:01:57 2016]eth2: found NIC with library 'bnx2x' INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 library set using transport_name bnx2i INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: requesting configuration using static IP address DBG [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 couldn't find interface with ip_type: 0x2 creating it INFO [Wed Jul 27 10:01:57 2016]nic eth2: Added nic interface for VLAN: 0, protocol: 2 INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: created network interface [New Thread 0x7ffff660d700 (LWP 4947)] WARN [Wed Jul 27 10:01:57 2016]nic_utils eth2: device already disabled: flag: 0x1088 state: 0x1 INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: configuring using static IP IPv4 address :10.0.1.62 INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc netmask: 255.255.255.0 [New Thread 0x7ffff5e0c700 (LWP 4948)] INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 vlan: 0 transport_name:bnx2i INFO [Wed Jul 27 10:01:57 2016]nic_utils eth2: spinning up thread for nic DBG [Wed Jul 27 10:01:57 2016]iscsi_ipc Waiting for iscsid command [New Thread 0x7ffff560b700 (LWP 4949)] DBG [Wed Jul 27 10:01:57 2016]nic eth2: Waiting to be enabled INFO [Wed Jul 27 10:01:57 2016]Created nic thread: eth2 INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: started NIC enable thread state: 0x1 DBG [Wed Jul 27 10:01:57 2016]nic eth2: is now enabled INFO [Wed Jul 27 10:01:57 2016]bnx2x eth2: bnx2x driver using version 1.78.19 ERR [Wed Jul 27 10:01:58 2016]bnx2x /dev/uio0: uio device has been brought up via pid: 4938 on fd: 7 INFO [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/name INFO [Wed Jul 27 10:01:58 2016]bnx2x eth2: Verified is a cnic_uio device DBG [Wed Jul 27 10:01:58 2016]bnx2x eth2: using rx ring size: 15, rx buffer size: 1024 INFO [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/event DBG [Wed Jul 27 10:01:58 2016]bnx2x Chip ID: 168e1000 INFO [Wed Jul 27 10:01:58 2016]nic_id eth2: is found at 03:00.00 INFO [Wed Jul 27 10:01:58 2016]bnx2x eth2: func 0x0, pfid 0x0, client_id 0x88, cid 0x1 DBG [Wed Jul 27 10:01:58 2016]bnx2x eth2: mode = 0x100 INFO [Wed Jul 27 10:01:58 2016]bnx2x eth2: Using mac address: d0:43:1e:51:98:53 INFO [Wed Jul 27 10:01:58 2016]eth2: bnx2x initialized INFO [Wed Jul 27 10:01:58 2016]nic eth2: Initialized ip stack: VLAN: 0 INFO [Wed Jul 27 10:01:58 2016]nic eth2: mac: d0:43:1e:51:98:53 INFO [Wed Jul 27 10:01:58 2016]nic eth2: Using IP address: 10.0.1.62 INFO [Wed Jul 27 10:01:58 2016]nic eth2: Using netmask: 255.255.255.0 INFO [Wed Jul 27 10:01:58 2016]nic eth2: enabled vlan 0 protocol: 2 INFO [Wed Jul 27 10:01:58 2016]nic eth2: entering main nic loop DBG [Wed Jul 27 10:01:58 2016]nic_utils eth2: device enabled [Thread 0x7ffff5e0c700 (LWP 4948) exited] DBG [Wed Jul 27 10:01:59 2016]iscsi_ipc recv iscsid request: cmd: 1, payload_len: 11720 INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc Received request for 'eth2' to set IP address: '10.0.1.62' VLAN: '0' INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc Using netmask: 0.0.0.0 INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2, using existing NIC INFO [Wed Jul 27 10:01:59 2016]nic_utils looking for uio device for eth2 INFO [Wed Jul 27 10:01:59 2016]nic_utils eth2 associated with uio0 INFO [Wed Jul 27 10:01:59 2016]eth2: Have NIC library 'bnx2x' INFO [Wed Jul 27 10:01:59 2016]Done capturing /sys/class/uio/uio0/name INFO [Wed Jul 27 10:01:59 2016]nic_utils eth2: Verified uio name bnx2x_cnic with library bnx2x INFO [Wed Jul 27 10:01:59 2016]eth2: found NIC with library 'bnx2x' INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2 library set using transport_name bnx2i INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: requesting configuration using static IP address INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: using existing network interface INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: IP configuration didn't change using 0x2 INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: NIC already enabled flags: 0x1084 state: 0x4 INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 vlan: 0 transport_name:bnx2i DBG [Wed Jul 27 10:01:59 2016]iscsi_ipc Waiting for iscsid command INFO [Wed Jul 27 10:02:00 2016]NIC_NL Received path_req for host 12 INFO [Wed Jul 27 10:02:00 2016]Done capturing /sys/class/iscsi_host/host12/netdev DBG [Wed Jul 27 10:02:00 2016]NIC_NL Pulled nl event INFO [Wed Jul 27 10:02:00 2016]NIC_NL eth2: Processing 'path_req' DBG [Wed Jul 27 10:02:00 2016]NIC_NL eth2: PATH_REQ with iface_num -1 VLAN 32768 DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: Netlink message with VLAN ID: 0, path MTU: 9000 minor: 0 ip_addr_len: 4 DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: src=10.0.1.62 DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: dst=10.0.1.2 DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: nm=255.255.255.0 INFO [Wed Jul 27 10:02:00 2016]CNIC eth2: Didn't find IPv4: '10.0.1.2' in ARP table DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: Sent cnic arp request for IP: 10.0.1.2 INFO [Wed Jul 27 10:02:00 2016]Found 10.0.1.2 at b0:83:fe:cc:57:bb DBG [Wed Jul 27 10:02:00 2016]CNIC neighbor reply sent back to kernel 10.0.1.62 at b0:83:fe:cc:57:bb with vlan 0 INFO [Wed Jul 27 10:02:00 2016]NIC_NL eth2: 'path_req' operation finished Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff660d700 (LWP 4947)] __lll_unlock_elision (lock=0x55555577fd40, private=0) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 29 ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file or directory. (gdb) # info threads Id Target Id Frame 6 Thread 0x7ffff560b700 (LWP 4949) "iscsiuio" 0x00007ffff76f1ae3 in select () at ../sysdeps/unix/syscall-template.S:81 * 4 Thread 0x7ffff660d700 (LWP 4947) "iscsiuio" __lll_unlock_elision (lock=0x55555577fd40, private=0) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 3 Thread 0x7ffff6e0e700 (LWP 4943) "iscsiuio" 0x00007ffff79c9ccd in accept () at ../sysdeps/unix/syscall-template.S:81 2 Thread 0x7ffff760f700 (LWP 4942) "iscsiuio" do_sigwait (set=<optimized out>, sig=0x7ffff760eeac) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63 1 Thread 0x7ffff7fea700 (LWP 4938) "iscsiuio" 0x00007ffff79c9e9d in recvmsg () at ../sysdeps/unix/syscall-template.S:81 (gdb) # thread apply all bt Thread 6 (Thread 0x7ffff560b700 (LWP 4949)): #0 0x00007ffff76f1ae3 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x000055555555ac06 in ?? () #2 0x000055555555d39e in nic_loop () #3 0x00007ffff79c30a4 in start_thread (arg=0x7ffff560b700) at pthread_create.c:309 #4 0x00007ffff76f887d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 4 (Thread 0x7ffff660d700 (LWP 4947)): #0 __lll_unlock_elision (lock=0x55555577fd40, private=0) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 #1 0x00007ffff79c7007 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:94 #2 0x000055555555e803 in nl_process_handle_thread () #3 0x00007ffff79c30a4 in start_thread (arg=0x7ffff660d700) at pthread_create.c:309 #4 0x00007ffff76f887d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3 (Thread 0x7ffff6e0e700 (LWP 4943)): #0 0x00007ffff79c9ccd in accept () at ../sysdeps/unix/syscall-template.S:81 #1 0x00005555555641b0 in ?? () #2 0x00007ffff79c30a4 in start_thread (arg=0x7ffff6e0e700) at pthread_create.c:309 #3 0x00007ffff76f887d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 2 (Thread 0x7ffff760f700 (LWP 4942)): #0 do_sigwait (set=<optimized out>, sig=0x7ffff760eeac) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63 #1 0x00007ffff79ca693 in __sigwait (set=0x7ffff760eeb0, sig=0x0) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:97 #2 0x000055555555a49c in _start () Thread 1 (Thread 0x7ffff7fea700 (LWP 4938)): #0 0x00007ffff79c9e9d in recvmsg () at ../sysdeps/unix/syscall-template.S:81 #1 0x000055555555e5e9 in ?? () #2 0x000055555555eea8 in nic_nl_open () #3 0x000055555555a1b8 in main ()
The Open-iSCSI command which lead to this segmentation fault was a simple login at a previously defined target node:
host5:~# iscsiadm -m node -T <target iqn> -I <interface> --login
Double-checking the configuration, the firmware and software versions as well as the general hardware setup didn't yield any usable indication as to where the root cause of this issue might be. Searching the web for __lll_unlock_elision
in conjunction with the pthread_*
function calls, led me to the following resources:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574
https://lwn.net/Articles/534758/
Those are pointing towards a CPU (Broadwell and Skylake) specific problem when not carefully using mutexes. The general opinion from there and also other related bug reports seems to be, that the source of such issues is almost always an improper use of mutex locking, which – up to now – has either been tolerated or just by chance not lead to a failure. More recent CPUs and software implementations (e.g. the GNU libc) appear to be less forgiving in this regard. Thus the advice is to change the application behaviour towards a proper use of mutex locking, in order to address such an issue.
The article Intel's Broadwell Xeon E5-2600 v4 chips: So what's in it for you, smartie-pants coders offers a rather nice write-up of the new features introduced in the Intel Broadwell CPUs.
Tracking this issue further down in the Open-iSCSI sources, i ended up in the function nl_process_handle_thread()
in iscsiuio/src/unix/nic_nl.c
and specifically in the following code section:
- iscsiuio/src/unix/nic_nl.c
474 /* NIC specific nl processing thread */ 475 void *nl_process_handle_thread(void *arg) 476 { [...] 483 while (!event_loop_stop) { 484 char *data = NULL; 485 486 rc = pthread_cond_wait(&nic->nl_process_cond, 487 &nic->nl_process_mutex); 488 if (rc != 0) { 489 LOG_ERR("Fatal error in NL processing thread " 490 "during wait[%s]", strerror(rc)); 491 break; 492 } [...] 499 pthread_mutex_unlock(&nic->nl_process_mutex); [...]
Debugging this revealed that the call to pthread_cond_wait()
from the above GDB backtrace output of thread number 4 is the one from line 486 in the above code snippet.
Looking at the pthread_cond_wait()
manpage showed the following constraint for its proper use:
[…]
The pthread_cond_timedwait() and pthread_cond_wait() functions shall
block on a condition variable. They shall be called with mutex locked
by the calling thread or undefined behavior results.
[…]
Although not shown in the above GDB output, this would on occasion – and again, probably just by chance – work on the first pass of the loop. At the end of the loop, at line 499 in the above code snippet, the mutex is then unlocked. Thus the cited constraint from the pthread_cond_wait()
manpage is no longer met on the subsequent passes of the loop. On Intel E5 v3 (aka Haswell) CPUs, this seemed to be tolerated and without any impact. But on Intel E5 v4 (aka Broadwell) – and probably other CPUs implementing HLE and RTM – this causes the observed segmentation fault.
In order to verify my analysis and test this theory, i added a call to pthread_mutex_lock()
right before the call to pthread_cond_wait()
in line 486. The resulting change is available in the Git commit 9f770f9e of my Open-iSCSI fork on Github and also shown in the following patch:
- nic_nl.c.patch
diff --git a/iscsiuio/src/unix/nic_nl.c b/iscsiuio/src/unix/nic_nl.c index 391003f..581ddb0 100644 --- a/iscsiuio/src/unix/nic_nl.c +++ b/iscsiuio/src/unix/nic_nl.c @@ -483,6 +483,7 @@ void *nl_process_handle_thread(void *arg) while (!event_loop_stop) { char *data = NULL; + pthread_mutex_lock(&nic->nl_process_mutex); rc = pthread_cond_wait(&nic->nl_process_cond, &nic->nl_process_mutex); if (rc != 0) {
Posting this on the Open-iSCSI mailing list lead to this discussion. The suggestion from there was an additional change to the error handling code of nl_process_handle_thread()
starting at line 488 in the above above code snippet. This adds proper handling of the locked mutex in case the loop is left due to an error returned from the call to pthread_cond_wait()
. The resulting additional change is available in the Git commit 4191ca6b of my Open-iSCSI fork on Github. The following patch shows the summarised code changes:
- nic_nl.c.patch
diff --git a/iscsiuio/src/unix/nic_nl.c b/iscsiuio/src/unix/nic_nl.c index 391003f..1a920c7 100644 --- a/iscsiuio/src/unix/nic_nl.c +++ b/iscsiuio/src/unix/nic_nl.c @@ -483,9 +483,11 @@ void *nl_process_handle_thread(void *arg) while (!event_loop_stop) { char *data = NULL; + pthread_mutex_lock(&nic->nl_process_mutex); rc = pthread_cond_wait(&nic->nl_process_cond, &nic->nl_process_mutex); if (rc != 0) { + pthread_mutex_unlock(&nic->nl_process_mutex); LOG_ERR("Fatal error in NL processing thread " "during wait[%s]", strerror(rc)); break;
With those two small changes to the sources of Open-iSCSIs iscsiuio
, the iSCSI connections via the Broadcom BCM57810S iSOE do now work flawlessly even on newer Intel E5 v4 (aka Broadwell) based systems. Hopefully the original authors of the iscsiuio
code at Broadcom/QLogic will also take part in the discussion and provide their feedback on the proposed code changes too.