2017-04-12 // Check_MK Monitoring - Open-iSCSI
The Open-iSCSI project provides a high-performance, transport independent, implementation of RFC 3720 iSCSI for Linux. It allows remote access to SCSI targets via TCP/IP over several different transport technologies. This article introduces a new Check_MK service check to monitor the status of Open-iSCSI sessions as well as the monitoring of several statistical metrics on Open-iSCSI sessions and iSCSI hardware initiator hosts.
For the impatient and TL;DR here is the Check_MK package of the Open-iSCSI monitoring checks:
Open-iSCSI monitoring checks (Compatible with Check_MK versions 1.2.8 and later)
The sources are to be found in my Check_MK repository on GitHub
The Check_MK service check to monitor Open-iSCSI consists of two major parts, an agent plugin and three check plugins.
The first part, a Check_MK agent plugin named open-iscsi
, is a simple Bash shell script. It calls the Open-iSCSI administration tool iscsiadm
in order to retrieve a list of currently active iSCSI sessions. The exact call to iscsiadm
to retrieve the session list is:
/usr/bin/iscsiadm -m session -P 1
If there are any active iSCSI sessions, the open-iscsi
agent plugin also tries to collect several statistics for each iSCSI session. This is done by another call to iscsiadm
for each iSCSI Session ${SID}
, which is shown in the following example:
/usr/bin/iscsiadm -m session -r ${SID} -s
Unfortunately, the iSCSI session statistics are currently only supported for Open-iSCSI software initiators or dependent hardware iSCSI initiators like the Broadcom BCM577xx or BCM578xx adapters which are covered by the bnx2i
kernel module. See Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell and Backporting Open-iSCSI to Debian 8 "Jessie" for additional information on those dependent hardware iSCSI initiators.
For hardware iSCSI initiators, like the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs, which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware, there is currently no support for iSCSI session statistics. Instead, the open-iscsi
agent plugin collects several global statistics on each iSOE host ${HST}
which is covered by the qla4xxx
kernel module with the command shown in the following example:
/usr/bin/iscsiadm -m host -H ${HST} -C stats
The output of the above commands is parsed and reformated by the agent plugin for easier processing in the check plugins. The following example shows the agent plugin output for a system with two BCM578xx dependent hardware iSCSI initiators:
<<<open-iscsi_sessions>>> bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:bf:2d eth2 10.0.3.52 LOGGED_IN LOGGED_IN NO_CHANGE bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:c2:34 eth3 10.0.3.53 LOGGED_IN LOGGED_IN NO_CHANGE <<<open-iscsi_session_stats>>> [session stats f8:ca:b8:7d:bf:2d iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 40960 rxdata_octets: 461171313 noptx_pdus: 0 scsicmd_pdus: 153967 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 153967 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 112420 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0 [session stats f8:ca:b8:7d:c2:34 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 16384 rxdata_octets: 255666052 noptx_pdus: 0 scsicmd_pdus: 84312 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 84312 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 62418 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0
The next example shows the agent plugin output for a system with two QLogic 8200 Series hardware iSCSI initiators:
<<<open-iscsi_sessions>>> qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7d.ipv4.0 none 10.0.3.50 LOGGED_IN Unknown Unknown qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7e.ipv4.0 none 10.0.3.51 LOGGED_IN Unknown Unknown <<<open-iscsi_host_stats>>> [host stats f8:ca:b8:7d:c1:7d iqn.2000-04.com.qlogic:isp8214.000e1e3574ac.4] mactx_frames: 563454 mactx_bytes: 52389948 mactx_multicast_frames: 877513 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573455 macrx_bytes: 440845678 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508160 iptx_bytes: 29474232 iptx_fragments: 0 iprx_packets: 401785 iprx_bytes: 354673156 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508160 tcptx_bytes: 19310736 tcprx_segments: 401785 tcprx_byte: 346637456 tcp_duplicate_ack_retx: 1 tcp_retx_timer_expired: 1 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106449 tcptx_pure_ack: 106489 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 695915 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401697 iscsi_data_bytes_tx: 29225 iscsi_pdu_rx: 401697 iscsi_data_bytes_rx: 327355963 iscsi_io_completed: 101 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0 [host stats f8:ca:b8:7d:c1:7e iqn.2000-04.com.qlogic:isp8214.000e1e3574ad.5] mactx_frames: 563608 mactx_bytes: 52411412 mactx_multicast_frames: 877517 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573572 macrx_bytes: 441630442 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508310 iptx_bytes: 29490504 iptx_fragments: 0 iprx_packets: 401925 iprx_bytes: 355436636 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508310 tcptx_bytes: 19323952 tcprx_segments: 401925 tcprx_byte: 347398136 tcp_duplicate_ack_retx: 2 tcp_retx_timer_expired: 4 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106466 tcptx_pure_ack: 106543 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 696035 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401787 iscsi_data_bytes_tx: 37970 iscsi_pdu_rx: 401791 iscsi_data_bytes_rx: 328112050 iscsi_io_completed: 127 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0
Although a simple Bash shell script, the agent plugin open-iscsi
has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands iscsiadm
, sed
, tr
and egrep
. On Debian based systems, the necessary packages can be installed with the following command:
root@host:~# apt-get install coreutils grep open-iscsi sed
The second part of the Check_MK service check for Open-iSCSI provides the necessary check logic through individual inventory and check functions. This is implemented in the three Check_MK check plugins open-iscsi_sessions
, open-iscsi_host_stats
and open-iscsi_session_stats
, which will be discussed separately in the following sections.
Open-iSCSI Session Status
The check plugin open-iscsi_sessions
is responsible for the monitoring of individual iSCSI sessions and their internal session states. Upon inventory this check plugin creates a service check for each pair of iSCSI network interface name and IQN of the iSCSI target volume. Unlike the iSCSI session ID, which changes over time (e.g. after iSCSI logout and login), this pair uniquely identifies a iSCSI session on a host. During normal check execution, the list of currently active iSCSI sessions on a host is compared to the list of active iSCSI sessions gathered during inventory on that host. If a session is missing or if the session has an erroneous internal state, an alarm is raised accordingly.
For all types of initiators – software, dependent hardware and hardware – there is the state session_state
which can take on the following values:
ISCSI_STATE_FREE ISCSI_STATE_LOGGED_IN ISCSI_STATE_FAILED ISCSI_STATE_TERMINATE ISCSI_STATE_IN_RECOVERY ISCSI_STATE_RECOVERY_FAILED ISCSI_STATE_LOGGING_OUT
An alarm is raised if the session is in any state other than ISCSI_STATE_LOGGED_IN
. For software and dependent hardware initiators there are two additional states – connection_state
and internal_state
. The state connection_state
can take on the values:
FREE TRANSPORT WAIT IN LOGIN LOGGED IN IN LOGOUT LOGOUT REQUESTED CLEANUP WAIT
and internal_state
can take on the values:
NO CHANGE CLEANUP REOPEN REDIRECT
In addition to the above session_state
, an alarm is raised if the connection_state
is in any other state than LOGGED IN
and internal_state
is in any other state than NO CHANGE
.
No performance data is currently reported by this check.
Open-iSCSI Hosts Statistics
The check plugin open-iscsi_host_stats
is responsible for the monitoring of the global statistics on a iSOE host. Upon inventory this check plugin creates a service check for each pair of MAC address and iSCSI network interface name. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is determined for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_host_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSOE host statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Host Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI host statistics values [x] The levels for the number of transmitted MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of received MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 abort frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 jumbo frames on an iSOE host. [x] The levels for the number of MAC/Layer2 late transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 single transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 multiple transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 collisions on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dribble on an iSOE host. [x] The levels for the number of received MAC/Layer2 frame length errors on an iSOE host. [x] The levels for the number of discarded received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 jabber on an iSOE host. [x] The levels for the number of received MAC/Layer2 carrier sense errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 CRC errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 encoding errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too large errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too small errors on an iSOE host. [x] The levels for the number of transmitted IP packets on an iSOE host. [x] The levels for the number of received IP packets on an iSOE host. [x] The levels for the number of transmitted IP bytes on an iSOE host. [x] The levels for the number of received IP bytes on an iSOE host. [x] The levels for the number of transmitted IP fragments on an iSOE host. [x] The levels for the number of received IP fragments on an iSOE host. [x] The levels for the number of IP datagram reassemblies on an iSOE host. [x] The levels for the number of IP invalid address errors on an iSOE host. [x] The levels for the number of IP packet errors on an iSOE host. [x] The levels for the number of IP fragmentation overlaps on an iSOE host. [x] The levels for the number of IP fragmentation out-of-order on an iSOE host. [x] The levels for the number of IP datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted IPv6 packets on an iSOE host. [x] The levels for the number of received IPv6 packets on an iSOE host. [x] The levels for the number of transmitted IPv6 bytes on an iSOE host. [x] The levels for the number of received IPv6 bytes on an iSOE host. [x] The levels for the number of transmitted IPv6 fragments on an iSOE host. [x] The levels for the number of received IPv6 fragments on an iSOE host. [x] The levels for the number of IPv6 datagram reassemblies on an iSOE host. [x] The levels for the number of IPv6 invalid address errors on an iSOE host. [x] The levels for the number of IPv6 packet errors on an iSOE host. [x] The levels for the number of IPv6 fragmentation overlaps on an iSOE host. [x] The levels for the number of IPv6 fragmentation out-of-order on an iSOE host. [x] The levels for the number of IPv6 datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted TCP segments on an iSOE host. [x] The levels for the number of received TCP segments on an iSOE host. [x] The levels for the number of transmitted TCP bytes on an iSOE host. [x] The levels for the number of received TCP bytes on an iSOE host. [x] The levels for the number of duplicate TCP ACK retransmits on an iSOE host. [x] The levels for the number of received TCP retransmit timer expiries on an iSOE host. [x] The levels for the number of received TCP duplicate ACKs on an iSOE host. [x] The levels for the number of received TCP pure ACKs on an iSOE host. [x] The levels for the number of transmitted TCP delayed ACKs on an iSOE host. [x] The levels for the number of transmitted TCP pure ACKs on an iSOE host. [x] The levels for the number of received TCP segment errors on an iSOE host. [x] The levels for the number of received TCP segment out-of-order on an iSOE host. [x] The levels for the number of received TCP window probe on an iSOE host. [x] The levels for the number of received TCP window update on an iSOE host. [x] The levels for the number of transmitted TCP window probe persist on an iSOE host. [x] The levels for the number of transmitted iSCSI PDUs on an iSOE host. [x] The levels for the number of received iSCSI PDUs on an iSOE host. [x] The levels for the number of transmitted iSCSI Bytes on an iSOE host. [x] The levels for the number of received iSCSI Bytes on an iSOE host. [x] The levels for the number of iSCSI I/Os completed on an iSOE host. [x] The levels for the number of iSCSI unexpected I/Os on an iSOE host. [x] The levels for the number of iSCSI format errors on an iSOE host. [x] The levels for the number of iSCSI header digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI data digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI sequence errors on an iSOE host. [x] The levels for the number of ECC error corrections on an iSOE host.
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_host_stats
(iSCSI Host Stats) service checks over two QLogic 8200 Series hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
– in this example LOGGED_IN
– is shown. There are also two iSCSI Host Stats service check items in the example, which are pairs of MAC addresses and iSCSI network interface names. For each of those items the current throughput rate on the MAC, IP/IPv6, TCP and iSCSI protocol layer is shown. The throughput rate on the MAC protocol layer is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following three images show examples of the PNP4Nagios graphs for the open-iscsi_host_stats
(iSCSI Host Stats) service check.
The middle graph shows a combined view of the throughput rate for received and transmitted traffic on the different MAC, IP/IPv6, TCP and iSCSI protocol layers. The upper graph shows the throughput rate for various frame types on the MAC protocol layer. The lower graph shows the rate for various error frame types on the MAC protocol layer.
The upper graph shows the throughput rate for received and transmitted traffic on the IP/IPv6 protocol layer. The middle graph shows the rate for various error packet types on the IP/IPv6 protocol layer. The lower graph shows the throughput rate for received and transmitted traffic on the TCP protocol layer.
The first graph shows the rate for various protocol control and error segment types on the TCP protocol layer. The second graph shows the rate of ECC error corrections that occured on the QLogic 8200 Series hardware iSCSI initiator. The third graph shows the throughput rate for received and transmitted traffic on the iSCSI protocol layer. The fourth and last graph shows the rate for various control and error PDUs on the iSCSI protocol layer.
Open-iSCSI Session Statistics
The check plugin open-iscsi_session_stats
is responsible for the monitoring of the statistics on individual iSCSI sessions. Upon inventory this check plugin creates a service check for each pair of MAC address of the network interface and IQN of the iSCSI target volume. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is collected for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_session_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSCSI session statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Session Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI session statistics values [x] The levels for the number of transmitted bytes in an Open-iSCSI session [x] The levels for the number of received bytes in an Open-iSCSI session [x] The levels for the number of digest (CRC) errors in an Open-iSCSI session [x] The levels for the number of timeout errors in an Open-iSCSI session [x] The levels for the number of transmitted NOP commands in an Open-iSCSI session [x] The levels for the number of received NOP commands in an Open-iSCSI session [x] The levels for the number of transmitted SCSI command requests in an Open-iSCSI session [x] The levels for the number of received SCSI command reponses in an Open-iSCSI session [x] The levels for the number of transmitted task management function commands in an Open-iSCSI session [x] The levels for the number of received task management function responses in an Open-iSCSI session [x] The levels for the number of transmitted login requests in an Open-iSCSI session [x] The levels for the number of transmitted logout requests in an Open-iSCSI session [x] The levels for the number of received logout responses in an Open-iSCSI session [x] The levels for the number of transmitted text PDUs in an Open-iSCSI session [x] The levels for the number of received text PDUs in an Open-iSCSI session [x] The levels for the number of transmitted data PDUs in an Open-iSCSI session [x] The levels for the number of received data PDUs in an Open-iSCSI session [x] The levels for the number of transmitted single negative ACKs in an Open-iSCSI session [x] The levels for the number of received ready to transfer PDUs in an Open-iSCSI session [x] The levels for the number of received reject PDUs in an Open-iSCSI session [x] The levels for the number of received asynchronous messages in an Open-iSCSI session
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_session_stats
(iSCSI Session Stats) service checks over two BCM578xx dependent hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
, connection_state
and internal_state
– in this example with the respective values LOGGED_IN
, LOGGED_IN
and NO_CHANGE
– are shown. There are also an equivalent number of iSCSI Session Stats service check items in the example, which are also pairs of MAC addresses of the network interfaces and IQNs of the iSCSI target volumes. For each of those items the current throughput rate of the individual iSCSI session is shown. As long as the rate of the digest (CRC) and timeout error counters is zero, the string no protocol errors is displayed. Otherwise the name and throughput rate of any non-zero error counter is shown. The throughput rate of the iSCSI session is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following image shows an example of the three PNP4Nagios graphs for a single open-iscsi_session_stats
(iSCSI Session Stats) service check.
The upper graph shows the throughput rate for received and transmitted traffic of the iSCSI session. The middle graph shows the rate for received and transmitted iSCSI PDUs, broken down by the different types of PDUs on the iSCSI protocol layer. The lower graph shows the rate for the digest (CRC) and timeout errors on the iSCSI protocol layer.
The described Check_MK service check to monitor the status of Open-iSCSI sessions, Open-iSCSI session metrics and iSCSI hardware initiator host metrics has been verified to work with version 2.0.874-2~bpo8+1 of the open-iscsi
package from the backports repository of Debian stable
(Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.
I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.