bityard Blog

// In-place Upgrade from RHEL 6 to RHEL 7 with separate /usr Filesystem

Officially there is no support for an in-place upgrade from RHEL 6 to RHEL 7 within a Hyper-V VM and with /usr residing on a separate filesystem. The latter issue is described in the Red Hat knowledge base article Why does Red Hat Enterprise Linux 7 in-place upgrade fails if /usr is on separate partition?. With several tweaks in the right places an in-place upgrade with /usr residing on a separate filesystem is still possible. This article describes the necessary steps.

Disclaimer: This guide comes with no fitness for purpose or guarantee whatsoever! I'm not liable for any damage resulting from following the described procedure. Before starting an upgrade ensure that you understand the steps to be performed and have arranged a service downtime as well as a proper fallback scenario and have a working backup in place!
  1. Create a backup and or a snapshot of the system to be upgraded.

  2. Follow the procedure outlined in How do I upgrade from Red Hat Enterprise Linux 6 to Red Hat Enterprise Linux 7?.

  3. The preupgrade assistant will fail with at least the following messages:

    root@rhel6:# preupg
    
    The Preupgrade Assistant is a diagnostics tool
    and does not perform the actual upgrade.
    Do you want to continue? [Y/n]
    y
    Gathering logs used by the Preupgrade Assistant:
    
    [...]
    
    |In-place upgrade requirements for the /usr/ directory                        |fail              |
    |Hyper-V                                                                      |fail              |
    --------------------------------------------------------------------------------------------------
    The tarball with results is stored in '/root/preupgrade-results/preupg_results-170731111749.tar.gz' .
    The latest assessment is stored in the '/root/preupgrade' directory.
    Summary information:
    We have found some critical issues. In-place upgrade or migration is not advised.
    Read the file /root/preupgrade/result.html for more details.
    Read the admin report file /root/preupgrade/result-admin.html for more details.
    Please ensure you have backed up your system and/or data
    before doing a system upgrade to prevent loss of data in
    case the upgrade fails and full re-install of the system
    from installation media is needed.
    Upload results to UI by the command:
    e.g. preupg -u http://example.com:8099/submit/ -r /root/preupgrade-results/preupg_results-170731111749.tar.gz .

    Make sure any other issues or conflicts are resolved and re-run the the preupgrade assistant iteratively until all but the two issues shown above are resolved.

  4. Force starting the upgrade process by ignoring the last two remaining issues shown above:

    root@rhel6:# redhat-upgrade-tool -f --network <VERSION> --instrepo <URL>
  5. After the upgrade is finished do not reboot immediately!

  6. Modify the GRUB configuration in /boot/grub/grub.conf first. This is necessary in order to activate the logical volume that contains the /usr filesystem at boot time:

    root@rhel6:# vi /boot/grub/grub.conf

    In the section title System Upgrade (redhat-upgrade-tool) add a rd.lvm.lv=VG/LV stanza to the kernel line. E.g. for the logical volume lv_usr within the volume group vg00:

    [...]
    title System Upgrade (redhat-upgrade-tool)
            root (hd0,0)
            kernel /vmlinuz-redhat-upgrade-tool [...] rd.lvm.lv=vg00/lv_usr [...]
            initrd /initramfs-redhat-upgrade-tool.img
    [...]
  7. Modify the system init script /etc/rc.d/rc.sysinit in order to prevent file system checks at boot time. Otherwise the startup of the upgrade system will fail, due to the fact that /usr is now already mounted from the boot stage.

    root@rhel6:# vi /etc/rc.d/rc.sysinit

    Comment the fsck command in line 422 of the init script /etc/rc.d/rc.sysinit, like shown in the following example:

    /etc/rc.d/rc.sysinit
    420         STRING=$"Checking filesystems"
    421         echo $STRING
    422         #fsck -T -t noopts=_netdev -A $fsckoptions
    423         rc=$?
    424

    The line numbers are only shown for purpose of illustration. Do not insert the line numbers at the beginning of the line!

  8. Reboot the system to start the actual upgrade process.

  9. After the upgrade to RHEL 7 has finished, the system might hang on the post-upgrade reboot due to the logical volume containing the /usr filesystem not being available. In order to circumvent this issue, stop the automatic boot of the upgraded system and perform a manual boot with modified GRUB parameters.

    • In the GRUB menu navigate to the entry Red Hat Enterprise Linux Server ([…]) and press E to enter the command line editing mode.

    • Navigate to the kernel /vmlinuz[…] entry and again press E.

    • Like above, add a rd.lvm.lv=VG/LV stanza for the logical volume containing the /usr filesystem to the kernel line. E.g. for the logical volume lv_usr within the volume group vg00 add the stanza rd.lvm.lv=vg00/lv_usr.

    • Press RETURN followed by B to start the boot of the RHEL 7 system.

  10. After the RHEL 7 system has sucessfully booted, don't forget to permanently add the changes to the GRUB command line from the previous step to the GRUB configuration /boot/grub/grub.conf:

    root@rhel7:# vi /boot/grub/grub.conf

    For every title section add a rd.lvm.lv=VG/LV stanza to the kernel line. E.g. for the logical volume lv_usr within the volume group vg00:

    [...]
    title Red Hat Enterprise Linux Server 7.3 Rescue 3b8941354040abfa3882f5ca00000039 (3.10.0-514.el7.x86_64)
            [...]
            kernel /vmlinuz-0-rescue-[...] rd.lvm.lv=vg00/lv_usr [...]
            [...]
    title Red Hat Enterprise Linux Server (3.10.0-514.el7.x86_64) 7.3 (Maipo)
            [...]
            kernel /vmlinuz-[...] rd.lvm.lv=vg00/lv_usr [...]
            [...]
    [...]
  11. Finish the procedure outlined in How do I upgrade from Red Hat Enterprise Linux 6 to Red Hat Enterprise Linux 7?.

All done, your system should now have successfully been upgraded from RHEL 6 to RHEL7.

// Check_MK - Race Condition in Processing of Piggyback Data

In current versions of Check_MK – namely 1.2.8p24 and 1.4.0p8 – which are included in the Open Monitoring Distribution, there are several race conditions in the code parts responsible for processing of piggyback data sent by the agents. These race conditions are usually triggered when a host is monitored more than once, e.g. by the use of cluster services or if by chance a manual check on the command line coincides with a scheduled check from the monitoring core. While not critical to the whole monitoring process, the effect of these races are intermittent and annoying UNKNOWN - [Errno 2] No such file or directory errors for the monitored host. This article provides patches for both Check_MK versions in order to deal with the spurious error messages.

The normal order of processing of piggybacked data received from agents seems to be as follows:

  1. The Check_MK server calls the agent on the monitored host.

  2. The agent on the monitored host responds with the output of its own host as well as the piggybacked output from other entities.

  3. The Check_MK server processes the agents response and extracts the piggybacked data.

  4. If there is already data stored from the piggybacked host, it's considered stale and is thus removed.

  5. The piggybacked data is stored in a file named <AGENT_HOSTNAME> in the directory named <PIGGYBACKED_HOSTNAME> under the path ${OMD_ROOT}/var/check_mk/piggyback/ (e.g.: ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME>).

  6. The Check_MK server lists the contents of the directory ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME>. For each file item in the list it processes the data contained in the file.

  7. The Check_MK server removes the files in and subsequently the directory ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/ itself.

In case a host is monitored more than once, e.g. through the use of cluster services, the steps 4 through 7 above can overlap for two or more concurrent monitoring processes. Such an overlap can cause one process at step 4 to delete the directory ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME> while another process is still working on the directory at step 5 or step 6.

The effect of this issue are mainly intermittent and annoying UNKNOWN - [Errno 2] No such file or directory errors in the Check_MK WebUI as well as errors like the following examples in the log files:

2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> An exception occured while processing host "hostA"
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> Traceback (most recent call last):
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     status = command_function(command_tuple)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     return mode_function(hostname, ipaddress)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1290, in do_check
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     do_all_checks_on_host(hostname, ipaddress, only_check_types)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1506, in do_all_checks_on_host
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     info = get_info_for_check(hostname, ipaddress, infotype)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     ignore_check_interval=True)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 541, in get_realhost_info
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     store_persisted_info(hostname, persisted)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 567, in store_persisted_info
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     os.rename("%s.#new" % file_path, file_path)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> OSError: [Errno 2] No such file or directory
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> An exception occured while processing host "hostA"
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> Traceback (most recent call last):
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     status = command_function(command_tuple)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     return mode_function(hostname, ipaddress)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1241, in do_check
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     do_all_checks_on_host(hostname, ipaddress, only_check_types)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1457, in do_all_checks_on_host
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     info = get_info_for_check(hostname, ipaddress, infotype)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     ignore_check_interval=True)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 540, in get_realhost_info
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     store_piggyback_info(hostname, piggybacked)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 651, in store_piggyback_info
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     os.rename(dir + "/.new." + sourcehost, dir + "/" + sourcehost)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> OSError: [Errno 2] No such file or directory

In version 1.4.0p8 of Check_MK there is already a partial fix for the described issue in the form of Werk 4755 (Git commit 77b3bfc3). For both Check_MK versions the following two trivial patches are provided in order to deal with the spurious error messages:

In the current development version of Check_MK – probably to be named 1.5 – there is a rather large amount of code rewrite and movement. From a first glance i couldn't determine if the issue still exists there.

// Check_MK Monitoring - Open-iSCSI

The Open-iSCSI project provides a high-performance, transport independent, implementation of RFC 3720 iSCSI for Linux. It allows remote access to SCSI targets via TCP/IP over several different transport technologies. This article introduces a new Check_MK service check to monitor the status of Open-iSCSI sessions as well as the monitoring of several statistical metrics on Open-iSCSI sessions and iSCSI hardware initiator hosts.

For the impatient and TL;DR here is the Check_MK package of the Open-iSCSI monitoring checks:

Open-iSCSI monitoring checks (Compatible with Check_MK versions 1.2.8 and later)

The sources are to be found in my Check_MK repository on GitHub


The Check_MK service check to monitor Open-iSCSI consists of two major parts, an agent plugin and three check plugins.

The first part, a Check_MK agent plugin named open-iscsi, is a simple Bash shell script. It calls the Open-iSCSI administration tool iscsiadm in order to retrieve a list of currently active iSCSI sessions. The exact call to iscsiadm to retrieve the session list is:

/usr/bin/iscsiadm -m session -P 1

If there are any active iSCSI sessions, the open-iscsi agent plugin also tries to collect several statistics for each iSCSI session. This is done by another call to iscsiadm for each iSCSI Session ${SID}, which is shown in the following example:

/usr/bin/iscsiadm -m session -r ${SID} -s

Unfortunately, the iSCSI session statistics are currently only supported for Open-iSCSI software initiators or dependent hardware iSCSI initiators like the Broadcom BCM577xx or BCM578xx adapters which are covered by the bnx2i kernel module. See Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell and Backporting Open-iSCSI to Debian 8 "Jessie" for additional information on those dependent hardware iSCSI initiators.

For hardware iSCSI initiators, like the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs, which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware, there is currently no support for iSCSI session statistics. Instead, the open-iscsi agent plugin collects several global statistics on each iSOE host ${HST} which is covered by the qla4xxx kernel module with the command shown in the following example:

/usr/bin/iscsiadm -m host -H ${HST} -C stats

The output of the above commands is parsed and reformated by the agent plugin for easier processing in the check plugins. The following example shows the agent plugin output for a system with two BCM578xx dependent hardware iSCSI initiators:

<<<open-iscsi_sessions>>>
bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:bf:2d eth2 10.0.3.52 LOGGED_IN LOGGED_IN NO_CHANGE
bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:c2:34 eth3 10.0.3.53 LOGGED_IN LOGGED_IN NO_CHANGE

<<<open-iscsi_session_stats>>>
[session stats f8:ca:b8:7d:bf:2d iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>]
txdata_octets: 40960
rxdata_octets: 461171313
noptx_pdus: 0
scsicmd_pdus: 153967
tmfcmd_pdus: 0
login_pdus: 0
text_pdus: 0
dataout_pdus: 0
logout_pdus: 0
snack_pdus: 0
noprx_pdus: 0
scsirsp_pdus: 153967
tmfrsp_pdus: 0
textrsp_pdus: 0
datain_pdus: 112420
logoutrsp_pdus: 0
r2t_pdus: 0
async_pdus: 0
rjt_pdus: 0
digest_err: 0
timeout_err: 0

[session stats f8:ca:b8:7d:c2:34 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>]
txdata_octets: 16384
rxdata_octets: 255666052
noptx_pdus: 0
scsicmd_pdus: 84312
tmfcmd_pdus: 0
login_pdus: 0
text_pdus: 0
dataout_pdus: 0
logout_pdus: 0
snack_pdus: 0
noprx_pdus: 0
scsirsp_pdus: 84312
tmfrsp_pdus: 0
textrsp_pdus: 0
datain_pdus: 62418
logoutrsp_pdus: 0
r2t_pdus: 0
async_pdus: 0
rjt_pdus: 0
digest_err: 0
timeout_err: 0

The next example shows the agent plugin output for a system with two QLogic 8200 Series hardware iSCSI initiators:

<<<open-iscsi_sessions>>>
qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7d.ipv4.0 none 10.0.3.50 LOGGED_IN Unknown Unknown
qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7e.ipv4.0 none 10.0.3.51 LOGGED_IN Unknown Unknown

<<<open-iscsi_host_stats>>>
[host stats f8:ca:b8:7d:c1:7d iqn.2000-04.com.qlogic:isp8214.000e1e3574ac.4]
mactx_frames: 563454
mactx_bytes: 52389948
mactx_multicast_frames: 877513
mactx_broadcast_frames: 0
mactx_pause_frames: 0
mactx_control_frames: 0
mactx_deferral: 0
mactx_excess_deferral: 0
mactx_late_collision: 0
mactx_abort: 0
mactx_single_collision: 0
mactx_multiple_collision: 0
mactx_collision: 0
mactx_frames_dropped: 0
mactx_jumbo_frames: 0
macrx_frames: 1573455
macrx_bytes: 440845678
macrx_unknown_control_frames: 0
macrx_pause_frames: 0
macrx_control_frames: 0
macrx_dribble: 0
macrx_frame_length_error: 0
macrx_jabber: 0
macrx_carrier_sense_error: 0
macrx_frame_discarded: 0
macrx_frames_dropped: 1755017
mac_crc_error: 0
mac_encoding_error: 0
macrx_length_error_large: 0
macrx_length_error_small: 0
macrx_multicast_frames: 0
macrx_broadcast_frames: 0
iptx_packets: 508160
iptx_bytes: 29474232
iptx_fragments: 0
iprx_packets: 401785
iprx_bytes: 354673156
iprx_fragments: 0
ip_datagram_reassembly: 0
ip_invalid_address_error: 0
ip_error_packets: 0
ip_fragrx_overlap: 0
ip_fragrx_outoforder: 0
ip_datagram_reassembly_timeout: 0
ipv6tx_packets: 0
ipv6tx_bytes: 0
ipv6tx_fragments: 0
ipv6rx_packets: 0
ipv6rx_bytes: 0
ipv6rx_fragments: 0
ipv6_datagram_reassembly: 0
ipv6_invalid_address_error: 0
ipv6_error_packets: 0
ipv6_fragrx_overlap: 0
ipv6_fragrx_outoforder: 0
ipv6_datagram_reassembly_timeout: 0
tcptx_segments: 508160
tcptx_bytes: 19310736
tcprx_segments: 401785
tcprx_byte: 346637456
tcp_duplicate_ack_retx: 1
tcp_retx_timer_expired: 1
tcprx_duplicate_ack: 0
tcprx_pure_ackr: 0
tcptx_delayed_ack: 106449
tcptx_pure_ack: 106489
tcprx_segment_error: 0
tcprx_segment_outoforder: 0
tcprx_window_probe: 0
tcprx_window_update: 695915
tcptx_window_probe_persist: 0
ecc_error_correction: 0
iscsi_pdu_tx: 401697
iscsi_data_bytes_tx: 29225
iscsi_pdu_rx: 401697
iscsi_data_bytes_rx: 327355963
iscsi_io_completed: 101
iscsi_unexpected_io_rx: 0
iscsi_format_error: 0
iscsi_hdr_digest_error: 0
iscsi_data_digest_error: 0
iscsi_sequence_error: 0

[host stats f8:ca:b8:7d:c1:7e iqn.2000-04.com.qlogic:isp8214.000e1e3574ad.5]
mactx_frames: 563608
mactx_bytes: 52411412
mactx_multicast_frames: 877517
mactx_broadcast_frames: 0
mactx_pause_frames: 0
mactx_control_frames: 0
mactx_deferral: 0
mactx_excess_deferral: 0
mactx_late_collision: 0
mactx_abort: 0
mactx_single_collision: 0
mactx_multiple_collision: 0
mactx_collision: 0
mactx_frames_dropped: 0
mactx_jumbo_frames: 0
macrx_frames: 1573572
macrx_bytes: 441630442
macrx_unknown_control_frames: 0
macrx_pause_frames: 0
macrx_control_frames: 0
macrx_dribble: 0
macrx_frame_length_error: 0
macrx_jabber: 0
macrx_carrier_sense_error: 0
macrx_frame_discarded: 0
macrx_frames_dropped: 1755017
mac_crc_error: 0
mac_encoding_error: 0
macrx_length_error_large: 0
macrx_length_error_small: 0
macrx_multicast_frames: 0
macrx_broadcast_frames: 0
iptx_packets: 508310
iptx_bytes: 29490504
iptx_fragments: 0
iprx_packets: 401925
iprx_bytes: 355436636
iprx_fragments: 0
ip_datagram_reassembly: 0
ip_invalid_address_error: 0
ip_error_packets: 0
ip_fragrx_overlap: 0
ip_fragrx_outoforder: 0
ip_datagram_reassembly_timeout: 0
ipv6tx_packets: 0
ipv6tx_bytes: 0
ipv6tx_fragments: 0
ipv6rx_packets: 0
ipv6rx_bytes: 0
ipv6rx_fragments: 0
ipv6_datagram_reassembly: 0
ipv6_invalid_address_error: 0
ipv6_error_packets: 0
ipv6_fragrx_overlap: 0
ipv6_fragrx_outoforder: 0
ipv6_datagram_reassembly_timeout: 0
tcptx_segments: 508310
tcptx_bytes: 19323952
tcprx_segments: 401925
tcprx_byte: 347398136
tcp_duplicate_ack_retx: 2
tcp_retx_timer_expired: 4
tcprx_duplicate_ack: 0
tcprx_pure_ackr: 0
tcptx_delayed_ack: 106466
tcptx_pure_ack: 106543
tcprx_segment_error: 0
tcprx_segment_outoforder: 0
tcprx_window_probe: 0
tcprx_window_update: 696035
tcptx_window_probe_persist: 0
ecc_error_correction: 0
iscsi_pdu_tx: 401787
iscsi_data_bytes_tx: 37970
iscsi_pdu_rx: 401791
iscsi_data_bytes_rx: 328112050
iscsi_io_completed: 127
iscsi_unexpected_io_rx: 0
iscsi_format_error: 0
iscsi_hdr_digest_error: 0
iscsi_data_digest_error: 0
iscsi_sequence_error: 0

Although a simple Bash shell script, the agent plugin open-iscsi has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands iscsiadm, sed, tr and egrep. On Debian based systems, the necessary packages can be installed with the following command:

root@host:~# apt-get install coreutils grep open-iscsi sed

The second part of the Check_MK service check for Open-iSCSI provides the necessary check logic through individual inventory and check functions. This is implemented in the three Check_MK check plugins open-iscsi_sessions, open-iscsi_host_stats and open-iscsi_session_stats, which will be discussed separately in the following sections.

Open-iSCSI Session Status

The check plugin open-iscsi_sessions is responsible for the monitoring of individual iSCSI sessions and their internal session states. Upon inventory this check plugin creates a service check for each pair of iSCSI network interface name and IQN of the iSCSI target volume. Unlike the iSCSI session ID, which changes over time (e.g. after iSCSI logout and login), this pair uniquely identifies a iSCSI session on a host. During normal check execution, the list of currently active iSCSI sessions on a host is compared to the list of active iSCSI sessions gathered during inventory on that host. If a session is missing or if the session has an erroneous internal state, an alarm is raised accordingly.

For all types of initiators – software, dependent hardware and hardware – there is the state session_state which can take on the following values:

ISCSI_STATE_FREE
ISCSI_STATE_LOGGED_IN
ISCSI_STATE_FAILED
ISCSI_STATE_TERMINATE
ISCSI_STATE_IN_RECOVERY
ISCSI_STATE_RECOVERY_FAILED
ISCSI_STATE_LOGGING_OUT

An alarm is raised if the session is in any state other than ISCSI_STATE_LOGGED_IN. For software and dependent hardware initiators there are two additional states – connection_state and internal_state. The state connection_state can take on the values:

FREE
TRANSPORT WAIT
IN LOGIN
LOGGED IN
IN LOGOUT
LOGOUT REQUESTED
CLEANUP WAIT

and internal_state can take on the values:

NO CHANGE
CLEANUP       
REOPEN
REDIRECT

In addition to the above session_state, an alarm is raised if the connection_state is in any other state than LOGGED IN and internal_state is in any other state than NO CHANGE.

No performance data is currently reported by this check.

Open-iSCSI Hosts Statistics

The check plugin open-iscsi_host_stats is responsible for the monitoring of the global statistics on a iSOE host. Upon inventory this check plugin creates a service check for each pair of MAC address and iSCSI network interface name. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is determined for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.

With the additional WATO plugin open-iscsi_host_stats.py it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSOE host statistics levels can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Storage, Filesystems and Files
         -> Open-iSCSI Host Statistics
            -> Create Rule in Folder ...
               -> The levels for the Open-iSCSI host statistics values
                  [x] The levels for the number of transmitted MAC/Layer2 frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 bytes on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 bytes on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 multicast frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 multicast frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 broadcast frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 broadcast frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 pause frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 pause frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 control frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 dropped frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 dropped frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 abort frames on an iSOE host.
                  [x] The levels for the number of transmitted MAC/Layer2 jumbo frames on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 late transmit collisions on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 single transmit collisions on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 multiple transmit collisions on an iSOE host.
                  [x] The levels for the number of MAC/Layer2 collisions on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 dribble on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 frame length errors on an iSOE host.
                  [x] The levels for the number of discarded received MAC/Layer2 frames on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 jabber on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 carrier sense errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 CRC errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 encoding errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 length too large errors on an iSOE host.
                  [x] The levels for the number of received MAC/Layer2 length too small errors on an iSOE host.
                  [x] The levels for the number of transmitted IP packets on an iSOE host.
                  [x] The levels for the number of received IP packets on an iSOE host.
                  [x] The levels for the number of transmitted IP bytes on an iSOE host.
                  [x] The levels for the number of received IP bytes on an iSOE host.
                  [x] The levels for the number of transmitted IP fragments on an iSOE host.
                  [x] The levels for the number of received IP fragments on an iSOE host.
                  [x] The levels for the number of IP datagram reassemblies on an iSOE host.
                  [x] The levels for the number of IP invalid address errors on an iSOE host.
                  [x] The levels for the number of IP packet errors on an iSOE host.
                  [x] The levels for the number of IP fragmentation overlaps on an iSOE host.
                  [x] The levels for the number of IP fragmentation out-of-order on an iSOE host.
                  [x] The levels for the number of IP datagram reassembly timeouts on an iSOE host.
                  [x] The levels for the number of transmitted IPv6 packets on an iSOE host.
                  [x] The levels for the number of received IPv6 packets on an iSOE host.
                  [x] The levels for the number of transmitted IPv6 bytes on an iSOE host.
                  [x] The levels for the number of received IPv6 bytes on an iSOE host.
                  [x] The levels for the number of transmitted IPv6 fragments on an iSOE host.
                  [x] The levels for the number of received IPv6 fragments on an iSOE host.
                  [x] The levels for the number of IPv6 datagram reassemblies on an iSOE host.
                  [x] The levels for the number of IPv6 invalid address errors on an iSOE host.
                  [x] The levels for the number of IPv6 packet errors on an iSOE host.
                  [x] The levels for the number of IPv6 fragmentation overlaps on an iSOE host.
                  [x] The levels for the number of IPv6 fragmentation out-of-order on an iSOE host.
                  [x] The levels for the number of IPv6 datagram reassembly timeouts on an iSOE host.
                  [x] The levels for the number of transmitted TCP segments on an iSOE host.
                  [x] The levels for the number of received TCP segments on an iSOE host.
                  [x] The levels for the number of transmitted TCP bytes on an iSOE host.
                  [x] The levels for the number of received TCP bytes on an iSOE host.
                  [x] The levels for the number of duplicate TCP ACK retransmits on an iSOE host.
                  [x] The levels for the number of received TCP retransmit timer expiries on an iSOE host.
                  [x] The levels for the number of received TCP duplicate ACKs on an iSOE host.
                  [x] The levels for the number of received TCP pure ACKs on an iSOE host.
                  [x] The levels for the number of transmitted TCP delayed ACKs on an iSOE host.
                  [x] The levels for the number of transmitted TCP pure ACKs on an iSOE host.
                  [x] The levels for the number of received TCP segment errors on an iSOE host.
                  [x] The levels for the number of received TCP segment out-of-order on an iSOE host.
                  [x] The levels for the number of received TCP window probe on an iSOE host.
                  [x] The levels for the number of received TCP window update on an iSOE host.
                  [x] The levels for the number of transmitted TCP window probe persist on an iSOE host.
                  [x] The levels for the number of transmitted iSCSI PDUs on an iSOE host.
                  [x] The levels for the number of received iSCSI PDUs on an iSOE host.
                  [x] The levels for the number of transmitted iSCSI Bytes on an iSOE host.
                  [x] The levels for the number of received iSCSI Bytes on an iSOE host.
                  [x] The levels for the number of iSCSI I/Os completed on an iSOE host.
                  [x] The levels for the number of iSCSI unexpected I/Os on an iSOE host.
                  [x] The levels for the number of iSCSI format errors on an iSOE host.
                  [x] The levels for the number of iSCSI header digest (CRC) errors on an iSOE host.
                  [x] The levels for the number of iSCSI data digest (CRC) errors on an iSOE host.
                  [x] The levels for the number of iSCSI sequence errors on an iSOE host.
                  [x] The levels for the number of ECC error corrections on an iSOE host.

The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions (iSCSI Session Status) and open-iscsi_host_stats (iSCSI Host Stats) service checks over two QLogic 8200 Series hardware iSCSI initiators:

Status output example for open-iscsi_sessions and open-iscsi_host_stats service checks over QLogic 8200 Series hardware iSCSI initiators

This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state – in this example LOGGED_IN – is shown. There are also two iSCSI Host Stats service check items in the example, which are pairs of MAC addresses and iSCSI network interface names. For each of those items the current throughput rate on the MAC, IP/IPv6, TCP and iSCSI protocol layer is shown. The throughput rate on the MAC protocol layer is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.

The following three images show examples of the PNP4Nagios graphs for the open-iscsi_host_stats (iSCSI Host Stats) service check.

Example PNP4Nagios graphs for a open-iscsi_host_stats service check (MAC Frames, Traffic, MAC Errors)

The middle graph shows a combined view of the throughput rate for received and transmitted traffic on the different MAC, IP/IPv6, TCP and iSCSI protocol layers. The upper graph shows the throughput rate for various frame types on the MAC protocol layer. The lower graph shows the rate for various error frame types on the MAC protocol layer.

Example PNP4Nagios graphs for a open-iscsi_host_stats service check (IP Packets and Fragments, IP Errors, TCP Segments)

The upper graph shows the throughput rate for received and transmitted traffic on the IP/IPv6 protocol layer. The middle graph shows the rate for various error packet types on the IP/IPv6 protocol layer. The lower graph shows the throughput rate for received and transmitted traffic on the TCP protocol layer.

Example PNP4Nagios graphs for a open-iscsi_host_stats service check (TCP Errors, ECC Error Correction, iSCSI PDUs, iSCSI Errors)

The first graph shows the rate for various protocol control and error segment types on the TCP protocol layer. The second graph shows the rate of ECC error corrections that occured on the QLogic 8200 Series hardware iSCSI initiator. The third graph shows the throughput rate for received and transmitted traffic on the iSCSI protocol layer. The fourth and last graph shows the rate for various control and error PDUs on the iSCSI protocol layer.

Open-iSCSI Session Statistics

The check plugin open-iscsi_session_stats is responsible for the monitoring of the statistics on individual iSCSI sessions. Upon inventory this check plugin creates a service check for each pair of MAC address of the network interface and IQN of the iSCSI target volume. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is collected for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.

With the additional WATO plugin open-iscsi_session_stats.py it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSCSI session statistics levels can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Storage, Filesystems and Files
         -> Open-iSCSI Session Statistics
            -> Create Rule in Folder ...
               -> The levels for the Open-iSCSI session statistics values
                  [x] The levels for the number of transmitted bytes in an Open-iSCSI session
                  [x] The levels for the number of received bytes in an Open-iSCSI session
                  [x] The levels for the number of digest (CRC) errors in an Open-iSCSI session
                  [x] The levels for the number of timeout errors in an Open-iSCSI session
                  [x] The levels for the number of transmitted NOP commands in an Open-iSCSI session
                  [x] The levels for the number of received NOP commands in an Open-iSCSI session
                  [x] The levels for the number of transmitted SCSI command requests in an Open-iSCSI session
                  [x] The levels for the number of received SCSI command reponses in an Open-iSCSI session
                  [x] The levels for the number of transmitted task management function commands in an Open-iSCSI session
                  [x] The levels for the number of received task management function responses in an Open-iSCSI session
                  [x] The levels for the number of transmitted login requests in an Open-iSCSI session
                  [x] The levels for the number of transmitted logout requests in an Open-iSCSI session
                  [x] The levels for the number of received logout responses in an Open-iSCSI session
                  [x] The levels for the number of transmitted text PDUs in an Open-iSCSI session
                  [x] The levels for the number of received text PDUs in an Open-iSCSI session
                  [x] The levels for the number of transmitted data PDUs in an Open-iSCSI session
                  [x] The levels for the number of received data PDUs in an Open-iSCSI session
                  [x] The levels for the number of transmitted single negative ACKs in an Open-iSCSI session
                  [x] The levels for the number of received ready to transfer PDUs in an Open-iSCSI session
                  [x] The levels for the number of received reject PDUs in an Open-iSCSI session
                  [x] The levels for the number of received asynchronous messages in an Open-iSCSI session

The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions (iSCSI Session Status) and open-iscsi_session_stats (iSCSI Session Stats) service checks over two BCM578xx dependent hardware iSCSI initiators:

Status output example for open-iscsi_sessions and open-iscsi_session_stats service checks over BCM578xx dependent hardware iSCSI initiators

This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state, connection_state and internal_state – in this example with the respective values LOGGED_IN, LOGGED_IN and NO_CHANGE – are shown. There are also an equivalent number of iSCSI Session Stats service check items in the example, which are also pairs of MAC addresses of the network interfaces and IQNs of the iSCSI target volumes. For each of those items the current throughput rate of the individual iSCSI session is shown. As long as the rate of the digest (CRC) and timeout error counters is zero, the string no protocol errors is displayed. Otherwise the name and throughput rate of any non-zero error counter is shown. The throughput rate of the iSCSI session is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.

The following image shows an example of the three PNP4Nagios graphs for a single open-iscsi_session_stats (iSCSI Session Stats) service check.

Example PNP4Nagios graphs for a single open-iscsi_session_stats service check

The upper graph shows the throughput rate for received and transmitted traffic of the iSCSI session. The middle graph shows the rate for received and transmitted iSCSI PDUs, broken down by the different types of PDUs on the iSCSI protocol layer. The lower graph shows the rate for the digest (CRC) and timeout errors on the iSCSI protocol layer.

The described Check_MK service check to monitor the status of Open-iSCSI sessions, Open-iSCSI session metrics and iSCSI hardware initiator host metrics has been verified to work with version 2.0.874-2~bpo8+1 of the open-iscsi package from the backports repository of Debian stable (Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.

I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.

// Check_MK Monitoring - XFS Filesystem Quotas

XFS, like most other modern filesystems, offers the ability to configure and use disk quotas. Usually those quotas set limits with regard to the amount of allocatable disk space or number of filesystem objects. Those limits are in most filesystems bound to users or groups of users. XFS is one of the few filesystems which offer quota limits on a per directory basis. In the case of XFS directory quotas are implemented via projects. This article introduces a new Check_MK service check to monitor the use of XFS filesystem quotas with a strong focus on directory based quotas.

For the impatient and TL;DR here is the Check_MK package of the XFS filesystem quota monitoring checks:

XFS filesystem quota monitoring checks (Compatible with Check_MK versions 1.2.6 and earlier)
XFS filesystem quota monitoring checks (Compatible with Check_MK versions 1.2.8 and later)

The sources are to be found in my Check_MK repository on GitHub


The Check_MK service check to monitor XFS filesystem quotas consists of two major parts, an agent plugin and a check plugin.

The Check_MK agent plugin named xfs_quota is a simple Bash shell script. It calls the XFS quota administration tool xfs_quota in order to retrieve a report of the current quota usage. The exact call to xfs_quota is:

/usr/sbin/xfs_quota -x -c 'report -p -b -i -a'

which currently reports only block and inode quotas of projects or directories on all availables XFS filesystems. An example output of the above command is:

Project quota on /srv/xfs (/dev/mapper/vg00-xfs)
                               Blocks                                          Inodes                     
Project ID       Used       Soft       Hard    Warn/Grace           Used       Soft       Hard    Warn/ Grace     
---------- -------------------------------------------------- -------------------------------------------------- 
test1               0          0       1024     00 [--------]          1          0          0     00 [--------]
test2               0          0       2048     00 [--------]          1          0          0     00 [--------]
test3               0          0       3072     00 [--------]          1          0          0     00 [--------]

The output of the above command is parsed and reformated by the agent plugin for easier processing in the check plugin. The above example output would thus be transformed into the agent plugin output shown in the following example:

<<<xfs_quota>>>
/srv/xfs:/dev/mapper/vg00-xfs:test1:0:0:1024:1:0:0
/srv/xfs:/dev/mapper/vg00-xfs:test2:0:0:2048:1:0:0
/srv/xfs:/dev/mapper/vg00-xfs:test3:0:0:3072:1:0:0

Although a simple Bash shell script, the agent plugin xfs_quota has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands xfs_quota, sed and egrep. On Debian based systems, the necessary packages can be installed with the following command:

root@host:~# apt-get install xfsprogs sed grep

The second part, a Check_MK check plugin also named xfs_quota, provides the necessary inventory and check functions. Upon inventory it creates a service check for each pair of XFS filesystem mountpoint and quota project ID. During normal check execution, the number of used XFS filesystem blocks and inodes are determined for each inventorized item (pair of XFS filesystem mountpoint and quota project ID). If the hard and soft, block and inode quotas for particular item are all set to zero, no further checks are carried out and only performance data is reported by the check. If either one of hard or soft, block or inode quota is set to a non-zero value, the number of remaining free XFS filesystem blocks or inodes is compared to warning and critical threshold values and an alarm is raised accordingly.

With the additional WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for blocks_hard and blocks_soft are zero (0) free blocks for both warning and critical thresholds. The default values for inodes_hard and inodes_soft are zero (0) free inodes for both warning and critical thresholds. The configuration options for the free block or inode levels can be found under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Storage, Filesystems and Files
         -> XFS Quota Utilization
            -> Create Rule in Folder ...
               -> The levels for the soft/hard block/inode quotas on XFS filesystems
                  [x] The levels for the hard block quotas on XFS filesystems
                  [x] The levels for the soft block quotas on XFS filesystems
                  [x] The levels for the hard inode quotas on XFS filesystems
                  [x] The levels for the soft inode quotas on XFS filesystems 

The following image shows a status output example for several xfs_quota service checks from the WATO WebUI:

Status output example for xfs_quota service checks

This example shows several service check items, which again are pairs of XFS filesystem mountpoints (here: /backup) and anonymized quota project IDs. For each item the number of blocks and inodes used are shown along with the appropriate hard and soft quota values. The number of blocks and inodes used are also visualized in the perf-o-meter, blocks on a logarithmic scale growing from the middle to the left, inodes on a logarithmic scale growing from the middle to the right.

The following image shows an example of the two PNP4Nagios graphs for a single service check:

Example PNP4Nagios graph for a single xfs_quota service check

The upper graph shows the number of used XFS filesystem blocks for a pair of XFS filesystem mountpoints (here: /backup) and a - again anonymized - quota project ID. The lower graph shows the number of used inodes for the same pair. Both graphs show warning and critical thresholds values, which are in this example at their default value of zero. If configured - like in the upper graph of the example - block or inode quotas are also shown as blue horizontal lines in the respective graphs.

The described Check_MK service check to monitor XFS filesystem quotas has been verified to work with version 3.2.1 of the xfsprogs package on Debian stable (Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.

I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.

// Dell iDRAC and Limitation in Certificate Signing Request Field Lengths

Current versions of the Dell iDRAC have limitations with regard to the field length for the SSL CSR. These limitations violate RFC 5280 and lead to an invalid CSR and possibly an invalid SSL certificate.


The Dell iDRAC allows the protection of browser based access via HTTPS. In order to facilitate this, SSL certificates are being used. Initially and by default there is a self-signed SSL certificate installed on the iDRAC which is signed by the Dell CA. For additional security this self-signed SSL certificate can be replaced with one provided by the systems administrator. There are two ways to create an own SSL certificate for the Dell iDRAC.

The first one is to use the iDRACs integrated facility to create a SSL CSR. The appropriate form can be found in the iDRAC WebUI under:

iDRAC Settings → Network → SSL → Generate Certificate Signing Request (CSR) → Next

After filling in the fields Common Name, Organization Name, Organization Unit, Locality, State Name, Country Code and Email with the appropriate values and clicking on Generate a CSR is generated in the background and after a short time offered for download. This CSR can now be fed into a CA – own or service provider operated – in order to generate the actual certificate. The generated certificate is then uploaded into the iDRAC via the WebUI.

So far so easy, and done countless times without any issues. Until the day, one of the values of the aforementioned fields Common Name, Organization Name, Organization Unit, Locality, State Name, Country Code or Email for the CSR changes to a significantly longer string. In my case the value of Organization Name changed from 11 characters to 60 characters. It appeares that unfortunately the developers of the iDRAC didn't expect values of that length, although those are absolutely in conformance with the appropriate standard RFC 5280 and well within the upper bounds mentioned there:

[...]
--  specifications of Upper Bounds MUST be regarded as mandatory
--  from Annex B of ITU-T X.411 Reference Definition of MTS Parameter
--  Upper Bounds

-- Upper Bounds
ub-name INTEGER ::= 32768
ub-common-name INTEGER ::= 64
ub-locality-name INTEGER ::= 128
ub-state-name INTEGER ::= 128
ub-organization-name INTEGER ::= 64
ub-organizational-unit-name INTEGER ::= 64
[...]
ub-emailaddress-length INTEGER ::= 255
ub-common-name-length INTEGER ::= 64
ub-country-name-alpha-length INTEGER ::= 2
[...]

-- Note - upper bounds on string types, such as TeletexString, are
-- measured in characters.  Excepting PrintableString or IA5String, a
-- significantly greater number of octets will be required to hold
-- such a value.  As a minimum, 16 octets, or twice the specified
-- upper bound, whichever is the larger, should be allowed for
-- TeletexString.  For UTF8String or UniversalString at least four
-- times the upper bound should be allowed.
[...]

Even worse, the iDRAC doesn't issue an error or a warning if it's given an exceedingly long value. On the contrary, it silently accepts the longer value and just cuts off everything after the 48th or 51st character.

I've checked various versions of the Dell iDRAC and encountered the issue on at least the following Dell PowerEdge hardware and iDRAC software versions:

Host Model iDRAC iDRAC Version
host1 R815 6 1.95 (Build 05)
host2 R815 6 2.85 (Build 04)
host3 M915 6 3.75 (Build 5)
host4 M620 7 2.21.21.21
host5 M630 8 2.30.30.30
host6 M630 8 2.41.40.40

Here are some examples of the Subject: from the CSRs generated on different Dell PowerEdge hardware and iDRAC software versions:

user@host:~$ for FL in *.csr; do echo -n "${FL}: "; openssl req -text -in ${FL} -noout | grep "Subject: "; done
host1.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host2.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host3.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, FQDN/emailAddress=EMAIL
host4.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host5.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
host6.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL

The values in the different fields have been anonymized with the characters of the field name, but have otherwise been preserved in the length they appeared in the CSR. The observed behaviour is mostly consistent, with only one exception where the field value is strangely cut off after the 51st character.

Needless to say, cropping the field values like this results in an invalid CSR which in turn leads to issues in the certificate generation process at the CA.

After raising a support case with Dell, the problem was verified and the issue was acknowledged as a defect. A future iDRAC update is supposed to fix this issue. Unfortunately i have no information what version that will be nor when it will be released.

In the meantime, until there is a proper fix for this issue, the second way can be used. This is to manually create all three components – an own SSL private key, an own CSR and a certificate – for the Dell iDRAC. Unfortunately this involves significantly more steps, which are described below.

  • If necessary install some additional software packages. For Debian based systems:

    root@host:~# apt-get install openssl wsmancli rpl
  • In the following steps and commands, replace <FQDN> and <IP> with the FQDN and IP address of the Dell iDRAC where necessary.

  • Create a SSL private key:

    user@host:~$ openssl genrsa -out <FQDN>.key.pem 4096
  • Generate a CSR for the FQDN and IP address of the Dell iDRAC. Replace <C>, <ST>, <L>, <O>, <OU>, and <EMAIL> with the country, state, locality, organization, organizational-unit and a valid email address respectively.

    user@host:~$ openssl req -new -sha256 -nodes -key <FQDN>.key.pem -out <FQDN>.csr.pem -subj '/C=<C>/ST=<ST>/L=<L>/O=<O>/OU=<OU>/CN=<FDQN>/emailAddress=<EMAIL>' -config <(
    cat <<-EOF
    [req]
    req_extensions = req_ext
    distinguished_name = dn
    [ dn ]
    [ req_ext ]
    subjectAltName = @alt_names
    [alt_names]
    DNS.1 = <FQDN>
    IP.1 = <IP>
    EOF
    )

    Additional DNS.X = <FQDN> or IP.X = <IP> lines can be added if necessary. Increment the value of X accordingly.

  • With the generated CSR, request a certificate from your own CA or the one of your service provider. The range of different CAs and organizational processes involved is just too broad to cover this step in greater detail.

  • Merge the received certificate and the private key which was generated in one of the previous steps:

    user@host:~$ cat <FQDN>.cert.pem <FQDN>.key.pem > <FQDN>.certkey.pem
  • Convert the combined certificate and the private key into the PKCS #12 format:

    user@host:~$ openssl pkcs12 -export -in <FQDN>.certkey.pem -out <FQDN>.certkey.pfx

    This command will request a password at the command line which will later on be needed again.

  • Convert the PKCS #12 file into a base64-encoded file:

    user@host:~$ openssl base64 -in <FQDN>.certkey.pfx -out <FQDN>.certkey.p12
  • Prepare the base64-encoded PKCS #12 file for the upload to the Dell iDRAC over its SOAP-interface:

    user@host:~$ echo "<p:SetCertificateAndPrivateKey_INPUT xmlns:p="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_LCService"><p:Type>server</p:Type><p:PKCS12>" > <FQDN>.certkey.xml
    user@host:~$ cat <FQDN>.certkey.p12 >> <FQDN>.certkey.xml
    user@host:~$ echo "</p:PKCS12><p:PKCS12pin>PASSWORD_AT_PKCS12_EXPORT</p:PKCS12pin></p:SetCertificateAndPrivateKey_INPUT>" >> <FQDN>.certkey.xml

    Replace the string PASSWORD_AT_PKCS12_EXPORT with the password that was given in the previous step where the certificate was exported into the PKCS #12 format:

    user@host:~$ rpl PASSWORD_AT_PKCS12_EXPORT <Paßwort> <FQDN>.certkey.xml
  • Upload the base64-encoded PKCS #12 file to the Dell iDRAC over its SOAP-interface. Replace <IP> with the IP address of the Dell iDRAC and <USER> and <PASSWORD> with the appropriate credentials of a user allowed to access the Dell iDRAC:

    user@host:~$ wsman invoke -a SetCertificateAndPrivateKey http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_LCService?SystemCreationClassName=DCIM_ComputerSystem,CreationClassName=DCIM_LCService,SystemName=DCIM:ComputerSystem,Name=DCIM:LCService -h <IP> -P 443 -u <USER> -p <PASSWORD> -c dummy.cert -y basic -V -v -J <FQDN>.certkey.xml -j utf-8

The last two steps are optional and only necessary if you don't want to or cannot install the Dell OpenManage DRAC Tools, which include the racadm command, or if the remote access via racadm is not or cannot be enabled.

This workaround can also be applied generally and independently of the mentioned bug in the Dell iDRAC in order to create a valid SSL certificate for Dell iDRAC and Dell Blade CMC systems.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website. More information about cookies