2017-08-05 // In-place Upgrade from RHEL 6 to RHEL 7 with separate /usr Filesystem
Officially there is no support for an in-place upgrade from RHEL 6 to RHEL 7 within a Hyper-V VM and with /usr
residing on a separate filesystem. The latter issue is described in the Red Hat knowledge base article Why does Red Hat Enterprise Linux 7 in-place upgrade fails if /usr is on separate partition?. With several tweaks in the right places an in-place upgrade with /usr
residing on a separate filesystem is still possible. This article describes the necessary steps.
Create a backup and or a snapshot of the system to be upgraded.
Follow the procedure outlined in How do I upgrade from Red Hat Enterprise Linux 6 to Red Hat Enterprise Linux 7?.
The preupgrade assistant will fail with at least the following messages:
root@rhel6:# preupg The Preupgrade Assistant is a diagnostics tool and does not perform the actual upgrade. Do you want to continue? [Y/n] y Gathering logs used by the Preupgrade Assistant: [...] |In-place upgrade requirements for the /usr/ directory |fail | |Hyper-V |fail | -------------------------------------------------------------------------------------------------- The tarball with results is stored in '/root/preupgrade-results/preupg_results-170731111749.tar.gz' . The latest assessment is stored in the '/root/preupgrade' directory. Summary information: We have found some critical issues. In-place upgrade or migration is not advised. Read the file /root/preupgrade/result.html for more details. Read the admin report file /root/preupgrade/result-admin.html for more details. Please ensure you have backed up your system and/or data before doing a system upgrade to prevent loss of data in case the upgrade fails and full re-install of the system from installation media is needed. Upload results to UI by the command: e.g. preupg -u http://example.com:8099/submit/ -r /root/preupgrade-results/preupg_results-170731111749.tar.gz .
Make sure any other issues or conflicts are resolved and re-run the the preupgrade assistant iteratively until all but the two issues shown above are resolved.
Force starting the upgrade process by ignoring the last two remaining issues shown above:
root@rhel6:# redhat-upgrade-tool -f --network <VERSION> --instrepo <URL>
After the upgrade is finished do not reboot immediately!
Modify the GRUB configuration in
/boot/grub/grub.conf
first. This is necessary in order to activate the logical volume that contains the/usr
filesystem at boot time:root@rhel6:# vi /boot/grub/grub.conf
In the section
title System Upgrade (redhat-upgrade-tool)
add ard.lvm.lv=VG/LV
stanza to thekernel
line. E.g. for the logical volumelv_usr
within the volume groupvg00
:[...] title System Upgrade (redhat-upgrade-tool) root (hd0,0) kernel /vmlinuz-redhat-upgrade-tool [...] rd.lvm.lv=vg00/lv_usr [...] initrd /initramfs-redhat-upgrade-tool.img [...]
Modify the system init script
/etc/rc.d/rc.sysinit
in order to prevent file system checks at boot time. Otherwise the startup of the upgrade system will fail, due to the fact that/usr
is now already mounted from the boot stage.root@rhel6:# vi /etc/rc.d/rc.sysinit
Comment the
fsck
command in line 422 of the init script/etc/rc.d/rc.sysinit
, like shown in the following example:- /etc/rc.d/rc.sysinit
420 STRING=$"Checking filesystems" 421 echo $STRING 422 #fsck -T -t noopts=_netdev -A $fsckoptions 423 rc=$? 424
The line numbers are only shown for purpose of illustration. Do not insert the line numbers at the beginning of the line!
Reboot the system to start the actual upgrade process.
After the upgrade to RHEL 7 has finished, the system might hang on the post-upgrade reboot due to the logical volume containing the
/usr
filesystem not being available. In order to circumvent this issue, stop the automatic boot of the upgraded system and perform a manual boot with modified GRUB parameters.In the GRUB menu navigate to the entry
Red Hat Enterprise Linux Server ([…])
and press E to enter the command line editing mode.Navigate to the
kernel /vmlinuz[…]
entry and again press E.Like above, add a
rd.lvm.lv=VG/LV
stanza for the logical volume containing the/usr
filesystem to thekernel
line. E.g. for the logical volumelv_usr
within the volume groupvg00
add the stanzard.lvm.lv=vg00/lv_usr
.Press RETURN followed by B to start the boot of the RHEL 7 system.
After the RHEL 7 system has sucessfully booted, don't forget to permanently add the changes to the GRUB command line from the previous step to the GRUB configuration
/boot/grub/grub.conf
:root@rhel7:# vi /boot/grub/grub.conf
For every
title
section add ard.lvm.lv=VG/LV
stanza to thekernel
line. E.g. for the logical volumelv_usr
within the volume groupvg00
:[...] title Red Hat Enterprise Linux Server 7.3 Rescue 3b8941354040abfa3882f5ca00000039 (3.10.0-514.el7.x86_64) [...] kernel /vmlinuz-0-rescue-[...] rd.lvm.lv=vg00/lv_usr [...] [...] title Red Hat Enterprise Linux Server (3.10.0-514.el7.x86_64) 7.3 (Maipo) [...] kernel /vmlinuz-[...] rd.lvm.lv=vg00/lv_usr [...] [...] [...]
Finish the procedure outlined in How do I upgrade from Red Hat Enterprise Linux 6 to Red Hat Enterprise Linux 7?.
All done, your system should now have successfully been upgraded from RHEL 6 to RHEL7.
2017-07-21 // Check_MK - Race Condition in Processing of Piggyback Data
In current versions of Check_MK – namely 1.2.8p24 and 1.4.0p8 – which are included in the Open Monitoring Distribution, there are several race conditions in the code parts responsible for processing of piggyback data sent by the agents. These race conditions are usually triggered when a host is monitored more than once, e.g. by the use of cluster services or if by chance a manual check on the command line coincides with a scheduled check from the monitoring core. While not critical to the whole monitoring process, the effect of these races are intermittent and annoying UNKNOWN - [Errno 2] No such file or directory
errors for the monitored host. This article provides patches for both Check_MK versions in order to deal with the spurious error messages.
The normal order of processing of piggybacked data received from agents seems to be as follows:
The Check_MK server calls the agent on the monitored host.
The agent on the monitored host responds with the output of its own host as well as the piggybacked output from other entities.
The Check_MK server processes the agents response and extracts the piggybacked data.
If there is already data stored from the piggybacked host, it's considered stale and is thus removed.
The piggybacked data is stored in a file named
<AGENT_HOSTNAME>
in the directory named<PIGGYBACKED_HOSTNAME>
under the path${OMD_ROOT}/var/check_mk/piggyback/
(e.g.:${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME>
).The Check_MK server lists the contents of the directory
${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME>
. For each file item in the list it processes the data contained in the file.The Check_MK server removes the files in and subsequently the directory
${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/
itself.
In case a host is monitored more than once, e.g. through the use of cluster services, the steps 4 through 7 above can overlap for two or more concurrent monitoring processes. Such an overlap can cause one process at step 4 to delete the directory ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME>
while another process is still working on the directory at step 5 or step 6.
The effect of this issue are mainly intermittent and annoying UNKNOWN - [Errno 2] No such file or directory
errors in the Check_MK WebUI as well as errors like the following examples in the log files:
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> An exception occured while processing host "hostA" 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> Traceback (most recent call last): 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> status = command_function(command_tuple) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> return mode_function(hostname, ipaddress) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1290, in do_check 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> do_all_checks_on_host(hostname, ipaddress, only_check_types) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1506, in do_all_checks_on_host 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> info = get_info_for_check(hostname, ipaddress, infotype) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> ignore_check_interval=True) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 541, in get_realhost_info 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> store_persisted_info(hostname, persisted) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 567, in store_persisted_info 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> os.rename("%s.#new" % file_path, file_path) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> OSError: [Errno 2] No such file or directory
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> An exception occured while processing host "hostA" 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> Traceback (most recent call last): 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> status = command_function(command_tuple) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> return mode_function(hostname, ipaddress) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1241, in do_check 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> do_all_checks_on_host(hostname, ipaddress, only_check_types) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1457, in do_all_checks_on_host 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> info = get_info_for_check(hostname, ipaddress, infotype) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> ignore_check_interval=True) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 540, in get_realhost_info 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> store_piggyback_info(hostname, piggybacked) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 651, in store_piggyback_info 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> os.rename(dir + "/.new." + sourcehost, dir + "/" + sourcehost) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> OSError: [Errno 2] No such file or directory
In version 1.4.0p8 of Check_MK there is already a partial fix for the described issue in the form of Werk 4755 (Git commit 77b3bfc3). For both Check_MK versions the following two trivial patches are provided in order to deal with the spurious error messages:
-
to be applied by the command:
root@host:~# patch -b -z .race < check_mk_base.py_1.2.8p24.patch
-
to be applied by the command:
root@host:~# patch -b -z .race < check_mk_base.py_1.4.0p8.patch
In the current development version of Check_MK – probably to be named 1.5 – there is a rather large amount of code rewrite and movement. From a first glance i couldn't determine if the issue still exists there.
2017-04-12 // Check_MK Monitoring - Open-iSCSI
The Open-iSCSI project provides a high-performance, transport independent, implementation of RFC 3720 iSCSI for Linux. It allows remote access to SCSI targets via TCP/IP over several different transport technologies. This article introduces a new Check_MK service check to monitor the status of Open-iSCSI sessions as well as the monitoring of several statistical metrics on Open-iSCSI sessions and iSCSI hardware initiator hosts.
For the impatient and TL;DR here is the Check_MK package of the Open-iSCSI monitoring checks:
Open-iSCSI monitoring checks (Compatible with Check_MK versions 1.2.8 and later)
The sources are to be found in my Check_MK repository on GitHub
The Check_MK service check to monitor Open-iSCSI consists of two major parts, an agent plugin and three check plugins.
The first part, a Check_MK agent plugin named open-iscsi
, is a simple Bash shell script. It calls the Open-iSCSI administration tool iscsiadm
in order to retrieve a list of currently active iSCSI sessions. The exact call to iscsiadm
to retrieve the session list is:
/usr/bin/iscsiadm -m session -P 1
If there are any active iSCSI sessions, the open-iscsi
agent plugin also tries to collect several statistics for each iSCSI session. This is done by another call to iscsiadm
for each iSCSI Session ${SID}
, which is shown in the following example:
/usr/bin/iscsiadm -m session -r ${SID} -s
Unfortunately, the iSCSI session statistics are currently only supported for Open-iSCSI software initiators or dependent hardware iSCSI initiators like the Broadcom BCM577xx or BCM578xx adapters which are covered by the bnx2i
kernel module. See Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell and Backporting Open-iSCSI to Debian 8 "Jessie" for additional information on those dependent hardware iSCSI initiators.
For hardware iSCSI initiators, like the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs, which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware, there is currently no support for iSCSI session statistics. Instead, the open-iscsi
agent plugin collects several global statistics on each iSOE host ${HST}
which is covered by the qla4xxx
kernel module with the command shown in the following example:
/usr/bin/iscsiadm -m host -H ${HST} -C stats
The output of the above commands is parsed and reformated by the agent plugin for easier processing in the check plugins. The following example shows the agent plugin output for a system with two BCM578xx dependent hardware iSCSI initiators:
<<<open-iscsi_sessions>>> bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:bf:2d eth2 10.0.3.52 LOGGED_IN LOGGED_IN NO_CHANGE bnx2i 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS> bnx2i.f8:ca:b8:7d:c2:34 eth3 10.0.3.53 LOGGED_IN LOGGED_IN NO_CHANGE <<<open-iscsi_session_stats>>> [session stats f8:ca:b8:7d:bf:2d iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 40960 rxdata_octets: 461171313 noptx_pdus: 0 scsicmd_pdus: 153967 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 153967 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 112420 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0 [session stats f8:ca:b8:7d:c2:34 iqn.2001-05.com.equallogic:8-da6616-807572d50-5080000001758a32-<ISCSI-ALIAS>] txdata_octets: 16384 rxdata_octets: 255666052 noptx_pdus: 0 scsicmd_pdus: 84312 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 0 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 84312 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 62418 logoutrsp_pdus: 0 r2t_pdus: 0 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0
The next example shows the agent plugin output for a system with two QLogic 8200 Series hardware iSCSI initiators:
<<<open-iscsi_sessions>>> qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7d.ipv4.0 none 10.0.3.50 LOGGED_IN Unknown Unknown qla4xxx 10.0.3.4:3260,1 iqn.2001-05.com.equallogic:8-da6616-57e572d50-80e0000001458a32-v-sto2-tst-000001 qla4xxx.f8:ca:b8:7d:c1:7e.ipv4.0 none 10.0.3.51 LOGGED_IN Unknown Unknown <<<open-iscsi_host_stats>>> [host stats f8:ca:b8:7d:c1:7d iqn.2000-04.com.qlogic:isp8214.000e1e3574ac.4] mactx_frames: 563454 mactx_bytes: 52389948 mactx_multicast_frames: 877513 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573455 macrx_bytes: 440845678 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508160 iptx_bytes: 29474232 iptx_fragments: 0 iprx_packets: 401785 iprx_bytes: 354673156 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508160 tcptx_bytes: 19310736 tcprx_segments: 401785 tcprx_byte: 346637456 tcp_duplicate_ack_retx: 1 tcp_retx_timer_expired: 1 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106449 tcptx_pure_ack: 106489 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 695915 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401697 iscsi_data_bytes_tx: 29225 iscsi_pdu_rx: 401697 iscsi_data_bytes_rx: 327355963 iscsi_io_completed: 101 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0 [host stats f8:ca:b8:7d:c1:7e iqn.2000-04.com.qlogic:isp8214.000e1e3574ad.5] mactx_frames: 563608 mactx_bytes: 52411412 mactx_multicast_frames: 877517 mactx_broadcast_frames: 0 mactx_pause_frames: 0 mactx_control_frames: 0 mactx_deferral: 0 mactx_excess_deferral: 0 mactx_late_collision: 0 mactx_abort: 0 mactx_single_collision: 0 mactx_multiple_collision: 0 mactx_collision: 0 mactx_frames_dropped: 0 mactx_jumbo_frames: 0 macrx_frames: 1573572 macrx_bytes: 441630442 macrx_unknown_control_frames: 0 macrx_pause_frames: 0 macrx_control_frames: 0 macrx_dribble: 0 macrx_frame_length_error: 0 macrx_jabber: 0 macrx_carrier_sense_error: 0 macrx_frame_discarded: 0 macrx_frames_dropped: 1755017 mac_crc_error: 0 mac_encoding_error: 0 macrx_length_error_large: 0 macrx_length_error_small: 0 macrx_multicast_frames: 0 macrx_broadcast_frames: 0 iptx_packets: 508310 iptx_bytes: 29490504 iptx_fragments: 0 iprx_packets: 401925 iprx_bytes: 355436636 iprx_fragments: 0 ip_datagram_reassembly: 0 ip_invalid_address_error: 0 ip_error_packets: 0 ip_fragrx_overlap: 0 ip_fragrx_outoforder: 0 ip_datagram_reassembly_timeout: 0 ipv6tx_packets: 0 ipv6tx_bytes: 0 ipv6tx_fragments: 0 ipv6rx_packets: 0 ipv6rx_bytes: 0 ipv6rx_fragments: 0 ipv6_datagram_reassembly: 0 ipv6_invalid_address_error: 0 ipv6_error_packets: 0 ipv6_fragrx_overlap: 0 ipv6_fragrx_outoforder: 0 ipv6_datagram_reassembly_timeout: 0 tcptx_segments: 508310 tcptx_bytes: 19323952 tcprx_segments: 401925 tcprx_byte: 347398136 tcp_duplicate_ack_retx: 2 tcp_retx_timer_expired: 4 tcprx_duplicate_ack: 0 tcprx_pure_ackr: 0 tcptx_delayed_ack: 106466 tcptx_pure_ack: 106543 tcprx_segment_error: 0 tcprx_segment_outoforder: 0 tcprx_window_probe: 0 tcprx_window_update: 696035 tcptx_window_probe_persist: 0 ecc_error_correction: 0 iscsi_pdu_tx: 401787 iscsi_data_bytes_tx: 37970 iscsi_pdu_rx: 401791 iscsi_data_bytes_rx: 328112050 iscsi_io_completed: 127 iscsi_unexpected_io_rx: 0 iscsi_format_error: 0 iscsi_hdr_digest_error: 0 iscsi_data_digest_error: 0 iscsi_sequence_error: 0
Although a simple Bash shell script, the agent plugin open-iscsi
has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands iscsiadm
, sed
, tr
and egrep
. On Debian based systems, the necessary packages can be installed with the following command:
root@host:~# apt-get install coreutils grep open-iscsi sed
The second part of the Check_MK service check for Open-iSCSI provides the necessary check logic through individual inventory and check functions. This is implemented in the three Check_MK check plugins open-iscsi_sessions
, open-iscsi_host_stats
and open-iscsi_session_stats
, which will be discussed separately in the following sections.
Open-iSCSI Session Status
The check plugin open-iscsi_sessions
is responsible for the monitoring of individual iSCSI sessions and their internal session states. Upon inventory this check plugin creates a service check for each pair of iSCSI network interface name and IQN of the iSCSI target volume. Unlike the iSCSI session ID, which changes over time (e.g. after iSCSI logout and login), this pair uniquely identifies a iSCSI session on a host. During normal check execution, the list of currently active iSCSI sessions on a host is compared to the list of active iSCSI sessions gathered during inventory on that host. If a session is missing or if the session has an erroneous internal state, an alarm is raised accordingly.
For all types of initiators – software, dependent hardware and hardware – there is the state session_state
which can take on the following values:
ISCSI_STATE_FREE ISCSI_STATE_LOGGED_IN ISCSI_STATE_FAILED ISCSI_STATE_TERMINATE ISCSI_STATE_IN_RECOVERY ISCSI_STATE_RECOVERY_FAILED ISCSI_STATE_LOGGING_OUT
An alarm is raised if the session is in any state other than ISCSI_STATE_LOGGED_IN
. For software and dependent hardware initiators there are two additional states – connection_state
and internal_state
. The state connection_state
can take on the values:
FREE TRANSPORT WAIT IN LOGIN LOGGED IN IN LOGOUT LOGOUT REQUESTED CLEANUP WAIT
and internal_state
can take on the values:
NO CHANGE CLEANUP REOPEN REDIRECT
In addition to the above session_state
, an alarm is raised if the connection_state
is in any other state than LOGGED IN
and internal_state
is in any other state than NO CHANGE
.
No performance data is currently reported by this check.
Open-iSCSI Hosts Statistics
The check plugin open-iscsi_host_stats
is responsible for the monitoring of the global statistics on a iSOE host. Upon inventory this check plugin creates a service check for each pair of MAC address and iSCSI network interface name. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is determined for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_host_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSOE host statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Host Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI host statistics values [x] The levels for the number of transmitted MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of received MAC/Layer2 bytes on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 multicast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 broadcast frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 pause frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dropped frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 deferral frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 abort frames on an iSOE host. [x] The levels for the number of transmitted MAC/Layer2 jumbo frames on an iSOE host. [x] The levels for the number of MAC/Layer2 late transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 single transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 multiple transmit collisions on an iSOE host. [x] The levels for the number of MAC/Layer2 collisions on an iSOE host. [x] The levels for the number of received MAC/Layer2 control frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 dribble on an iSOE host. [x] The levels for the number of received MAC/Layer2 frame length errors on an iSOE host. [x] The levels for the number of discarded received MAC/Layer2 frames on an iSOE host. [x] The levels for the number of received MAC/Layer2 jabber on an iSOE host. [x] The levels for the number of received MAC/Layer2 carrier sense errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 CRC errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 encoding errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too large errors on an iSOE host. [x] The levels for the number of received MAC/Layer2 length too small errors on an iSOE host. [x] The levels for the number of transmitted IP packets on an iSOE host. [x] The levels for the number of received IP packets on an iSOE host. [x] The levels for the number of transmitted IP bytes on an iSOE host. [x] The levels for the number of received IP bytes on an iSOE host. [x] The levels for the number of transmitted IP fragments on an iSOE host. [x] The levels for the number of received IP fragments on an iSOE host. [x] The levels for the number of IP datagram reassemblies on an iSOE host. [x] The levels for the number of IP invalid address errors on an iSOE host. [x] The levels for the number of IP packet errors on an iSOE host. [x] The levels for the number of IP fragmentation overlaps on an iSOE host. [x] The levels for the number of IP fragmentation out-of-order on an iSOE host. [x] The levels for the number of IP datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted IPv6 packets on an iSOE host. [x] The levels for the number of received IPv6 packets on an iSOE host. [x] The levels for the number of transmitted IPv6 bytes on an iSOE host. [x] The levels for the number of received IPv6 bytes on an iSOE host. [x] The levels for the number of transmitted IPv6 fragments on an iSOE host. [x] The levels for the number of received IPv6 fragments on an iSOE host. [x] The levels for the number of IPv6 datagram reassemblies on an iSOE host. [x] The levels for the number of IPv6 invalid address errors on an iSOE host. [x] The levels for the number of IPv6 packet errors on an iSOE host. [x] The levels for the number of IPv6 fragmentation overlaps on an iSOE host. [x] The levels for the number of IPv6 fragmentation out-of-order on an iSOE host. [x] The levels for the number of IPv6 datagram reassembly timeouts on an iSOE host. [x] The levels for the number of transmitted TCP segments on an iSOE host. [x] The levels for the number of received TCP segments on an iSOE host. [x] The levels for the number of transmitted TCP bytes on an iSOE host. [x] The levels for the number of received TCP bytes on an iSOE host. [x] The levels for the number of duplicate TCP ACK retransmits on an iSOE host. [x] The levels for the number of received TCP retransmit timer expiries on an iSOE host. [x] The levels for the number of received TCP duplicate ACKs on an iSOE host. [x] The levels for the number of received TCP pure ACKs on an iSOE host. [x] The levels for the number of transmitted TCP delayed ACKs on an iSOE host. [x] The levels for the number of transmitted TCP pure ACKs on an iSOE host. [x] The levels for the number of received TCP segment errors on an iSOE host. [x] The levels for the number of received TCP segment out-of-order on an iSOE host. [x] The levels for the number of received TCP window probe on an iSOE host. [x] The levels for the number of received TCP window update on an iSOE host. [x] The levels for the number of transmitted TCP window probe persist on an iSOE host. [x] The levels for the number of transmitted iSCSI PDUs on an iSOE host. [x] The levels for the number of received iSCSI PDUs on an iSOE host. [x] The levels for the number of transmitted iSCSI Bytes on an iSOE host. [x] The levels for the number of received iSCSI Bytes on an iSOE host. [x] The levels for the number of iSCSI I/Os completed on an iSOE host. [x] The levels for the number of iSCSI unexpected I/Os on an iSOE host. [x] The levels for the number of iSCSI format errors on an iSOE host. [x] The levels for the number of iSCSI header digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI data digest (CRC) errors on an iSOE host. [x] The levels for the number of iSCSI sequence errors on an iSOE host. [x] The levels for the number of ECC error corrections on an iSOE host.
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_host_stats
(iSCSI Host Stats) service checks over two QLogic 8200 Series hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
– in this example LOGGED_IN
– is shown. There are also two iSCSI Host Stats service check items in the example, which are pairs of MAC addresses and iSCSI network interface names. For each of those items the current throughput rate on the MAC, IP/IPv6, TCP and iSCSI protocol layer is shown. The throughput rate on the MAC protocol layer is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following three images show examples of the PNP4Nagios graphs for the open-iscsi_host_stats
(iSCSI Host Stats) service check.
The middle graph shows a combined view of the throughput rate for received and transmitted traffic on the different MAC, IP/IPv6, TCP and iSCSI protocol layers. The upper graph shows the throughput rate for various frame types on the MAC protocol layer. The lower graph shows the rate for various error frame types on the MAC protocol layer.
The upper graph shows the throughput rate for received and transmitted traffic on the IP/IPv6 protocol layer. The middle graph shows the rate for various error packet types on the IP/IPv6 protocol layer. The lower graph shows the throughput rate for received and transmitted traffic on the TCP protocol layer.
The first graph shows the rate for various protocol control and error segment types on the TCP protocol layer. The second graph shows the rate of ECC error corrections that occured on the QLogic 8200 Series hardware iSCSI initiator. The third graph shows the throughput rate for received and transmitted traffic on the iSCSI protocol layer. The fourth and last graph shows the rate for various control and error PDUs on the iSCSI protocol layer.
Open-iSCSI Session Statistics
The check plugin open-iscsi_session_stats
is responsible for the monitoring of the statistics on individual iSCSI sessions. Upon inventory this check plugin creates a service check for each pair of MAC address of the network interface and IQN of the iSCSI target volume. During normal check execution, an extensive list of statistics – see the above example output of the Check_MK agent plugin – is collected for each inventorized item. If the rate of one of the statistics values is above the configured warning and critical threshold values, an alarm is raised accordingly. For all statistics, performance data is reported by the check.
With the additional WATO plugin open-iscsi_session_stats.py
it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for all statistics are a rate of zero (0) units per second for both warning and critical thresholds. The configuration options for the iSCSI session statistics levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> Open-iSCSI Session Statistics -> Create Rule in Folder ... -> The levels for the Open-iSCSI session statistics values [x] The levels for the number of transmitted bytes in an Open-iSCSI session [x] The levels for the number of received bytes in an Open-iSCSI session [x] The levels for the number of digest (CRC) errors in an Open-iSCSI session [x] The levels for the number of timeout errors in an Open-iSCSI session [x] The levels for the number of transmitted NOP commands in an Open-iSCSI session [x] The levels for the number of received NOP commands in an Open-iSCSI session [x] The levels for the number of transmitted SCSI command requests in an Open-iSCSI session [x] The levels for the number of received SCSI command reponses in an Open-iSCSI session [x] The levels for the number of transmitted task management function commands in an Open-iSCSI session [x] The levels for the number of received task management function responses in an Open-iSCSI session [x] The levels for the number of transmitted login requests in an Open-iSCSI session [x] The levels for the number of transmitted logout requests in an Open-iSCSI session [x] The levels for the number of received logout responses in an Open-iSCSI session [x] The levels for the number of transmitted text PDUs in an Open-iSCSI session [x] The levels for the number of received text PDUs in an Open-iSCSI session [x] The levels for the number of transmitted data PDUs in an Open-iSCSI session [x] The levels for the number of received data PDUs in an Open-iSCSI session [x] The levels for the number of transmitted single negative ACKs in an Open-iSCSI session [x] The levels for the number of received ready to transfer PDUs in an Open-iSCSI session [x] The levels for the number of received reject PDUs in an Open-iSCSI session [x] The levels for the number of received asynchronous messages in an Open-iSCSI session
The following image shows a status output example from the WATO WebUI with several open-iscsi_sessions
(iSCSI Session Status) and open-iscsi_session_stats
(iSCSI Session Stats) service checks over two BCM578xx dependent hardware iSCSI initiators:
This example shows six iSCSI Session Status service check items, which are pairs of iSCSI network interface names and – here anonymized – IQNs of the iSCSI target volumes. For each item the current session_state
, connection_state
and internal_state
– in this example with the respective values LOGGED_IN
, LOGGED_IN
and NO_CHANGE
– are shown. There are also an equivalent number of iSCSI Session Stats service check items in the example, which are also pairs of MAC addresses of the network interfaces and IQNs of the iSCSI target volumes. For each of those items the current throughput rate of the individual iSCSI session is shown. As long as the rate of the digest (CRC) and timeout error counters is zero, the string no protocol errors is displayed. Otherwise the name and throughput rate of any non-zero error counter is shown. The throughput rate of the iSCSI session is also visualized in the Perf-O-Meter, received traffic growing from the middle to the left, transmitted traffic growing from the middle to the right.
The following image shows an example of the three PNP4Nagios graphs for a single open-iscsi_session_stats
(iSCSI Session Stats) service check.
The upper graph shows the throughput rate for received and transmitted traffic of the iSCSI session. The middle graph shows the rate for received and transmitted iSCSI PDUs, broken down by the different types of PDUs on the iSCSI protocol layer. The lower graph shows the rate for the digest (CRC) and timeout errors on the iSCSI protocol layer.
The described Check_MK service check to monitor the status of Open-iSCSI sessions, Open-iSCSI session metrics and iSCSI hardware initiator host metrics has been verified to work with version 2.0.874-2~bpo8+1 of the open-iscsi
package from the backports repository of Debian stable
(Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.
I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.
2017-03-16 // Check_MK Monitoring - XFS Filesystem Quotas
XFS, like most other modern filesystems, offers the ability to configure and use disk quotas. Usually those quotas set limits with regard to the amount of allocatable disk space or number of filesystem objects. Those limits are in most filesystems bound to users or groups of users. XFS is one of the few filesystems which offer quota limits on a per directory basis. In the case of XFS directory quotas are implemented via projects. This article introduces a new Check_MK service check to monitor the use of XFS filesystem quotas with a strong focus on directory based quotas.
For the impatient and TL;DR here is the Check_MK package of the XFS filesystem quota monitoring checks:
XFS filesystem quota monitoring checks (Compatible with Check_MK versions 1.2.6 and earlier)
XFS filesystem quota monitoring checks (Compatible with Check_MK versions 1.2.8 and later)
The sources are to be found in my Check_MK repository on GitHub
The Check_MK service check to monitor XFS filesystem quotas consists of two major parts, an agent plugin and a check plugin.
The Check_MK agent plugin named xfs_quota
is a simple Bash shell script. It calls the XFS quota administration tool xfs_quota
in order to retrieve a report of the current quota usage. The exact call to xfs_quota
is:
/usr/sbin/xfs_quota -x -c 'report -p -b -i -a'
which currently reports only block and inode quotas of projects or directories on all availables XFS filesystems. An example output of the above command is:
Project quota on /srv/xfs (/dev/mapper/vg00-xfs) Blocks Inodes Project ID Used Soft Hard Warn/Grace Used Soft Hard Warn/ Grace ---------- -------------------------------------------------- -------------------------------------------------- test1 0 0 1024 00 [--------] 1 0 0 00 [--------] test2 0 0 2048 00 [--------] 1 0 0 00 [--------] test3 0 0 3072 00 [--------] 1 0 0 00 [--------]
The output of the above command is parsed and reformated by the agent plugin for easier processing in the check plugin. The above example output would thus be transformed into the agent plugin output shown in the following example:
<<<xfs_quota>>> /srv/xfs:/dev/mapper/vg00-xfs:test1:0:0:1024:1:0:0 /srv/xfs:/dev/mapper/vg00-xfs:test2:0:0:2048:1:0:0 /srv/xfs:/dev/mapper/vg00-xfs:test3:0:0:3072:1:0:0
Although a simple Bash shell script, the agent plugin xfs_quota
has several dependencies which need to be installed in order for the agent plugin to work properly. Namely those are the commands xfs_quota
, sed
and egrep
. On Debian based systems, the necessary packages can be installed with the following command:
root@host:~# apt-get install xfsprogs sed grep
The second part, a Check_MK check plugin also named xfs_quota
, provides the necessary inventory and check functions. Upon inventory it creates a service check for each pair of XFS filesystem mountpoint and quota project ID. During normal check execution, the number of used XFS filesystem blocks and inodes are determined for each inventorized item (pair of XFS filesystem mountpoint and quota project ID). If the hard and soft, block and inode quotas for particular item are all set to zero, no further checks are carried out and only performance data is reported by the check. If either one of hard or soft, block or inode quota is set to a non-zero value, the number of remaining free XFS filesystem blocks or inodes is compared to warning and critical threshold values and an alarm is raised accordingly.
With the additional WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values. The default values for blocks_hard
and blocks_soft
are zero (0) free blocks for both warning and critical thresholds. The default values for inodes_hard
and inodes_soft
are zero (0) free inodes for both warning and critical thresholds. The configuration options for the free block or inode levels can be found under:
-> Host & Service Parameters -> Parameters for discovered services -> Storage, Filesystems and Files -> XFS Quota Utilization -> Create Rule in Folder ... -> The levels for the soft/hard block/inode quotas on XFS filesystems [x] The levels for the hard block quotas on XFS filesystems [x] The levels for the soft block quotas on XFS filesystems [x] The levels for the hard inode quotas on XFS filesystems [x] The levels for the soft inode quotas on XFS filesystems
The following image shows a status output example for several xfs_quota
service checks from the WATO WebUI:
This example shows several service check items, which again are pairs of XFS filesystem mountpoints (here: /backup
) and anonymized quota project IDs. For each item the number of blocks and inodes used are shown along with the appropriate hard and soft quota values. The number of blocks and inodes used are also visualized in the perf-o-meter, blocks on a logarithmic scale growing from the middle to the left, inodes on a logarithmic scale growing from the middle to the right.
The following image shows an example of the two PNP4Nagios graphs for a single service check:
The upper graph shows the number of used XFS filesystem blocks for a pair of XFS filesystem mountpoints (here: /backup
) and a - again anonymized - quota project ID. The lower graph shows the number of used inodes for the same pair. Both graphs show warning and critical thresholds values, which are in this example at their default value of zero. If configured - like in the upper graph of the example - block or inode quotas are also shown as blue horizontal lines in the respective graphs.
The described Check_MK service check to monitor XFS filesystem quotas has been verified to work with version 3.2.1 of the xfsprogs
package on Debian stable
(Jessie) on the client side and the Check_MK versions 1.2.6 and 1.2.8 on the server side.
I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.
2017-01-29 // Dell iDRAC and Limitation in Certificate Signing Request Field Lengths
Current versions of the Dell iDRAC have limitations with regard to the field length for the SSL CSR. These limitations violate RFC 5280 and lead to an invalid CSR and possibly an invalid SSL certificate.
The Dell iDRAC allows the protection of browser based access via HTTPS. In order to facilitate this, SSL certificates are being used. Initially and by default there is a self-signed SSL certificate installed on the iDRAC which is signed by the Dell CA. For additional security this self-signed SSL certificate can be replaced with one provided by the systems administrator. There are two ways to create an own SSL certificate for the Dell iDRAC.
The first one is to use the iDRACs integrated facility to create a SSL CSR. The appropriate form can be found in the iDRAC WebUI under:
iDRAC Settings → Network → SSL → Generate Certificate Signing Request (CSR) → Next
After filling in the fields Common Name, Organization Name, Organization Unit, Locality, State Name, Country Code and Email with the appropriate values and clicking on Generate a CSR is generated in the background and after a short time offered for download. This CSR can now be fed into a CA – own or service provider operated – in order to generate the actual certificate. The generated certificate is then uploaded into the iDRAC via the WebUI.
So far so easy, and done countless times without any issues. Until the day, one of the values of the aforementioned fields Common Name, Organization Name, Organization Unit, Locality, State Name, Country Code or Email for the CSR changes to a significantly longer string. In my case the value of Organization Name changed from 11 characters to 60 characters. It appeares that unfortunately the developers of the iDRAC didn't expect values of that length, although those are absolutely in conformance with the appropriate standard RFC 5280 and well within the upper bounds mentioned there:
[...] -- specifications of Upper Bounds MUST be regarded as mandatory -- from Annex B of ITU-T X.411 Reference Definition of MTS Parameter -- Upper Bounds -- Upper Bounds ub-name INTEGER ::= 32768 ub-common-name INTEGER ::= 64 ub-locality-name INTEGER ::= 128 ub-state-name INTEGER ::= 128 ub-organization-name INTEGER ::= 64 ub-organizational-unit-name INTEGER ::= 64 [...] ub-emailaddress-length INTEGER ::= 255 ub-common-name-length INTEGER ::= 64 ub-country-name-alpha-length INTEGER ::= 2 [...] -- Note - upper bounds on string types, such as TeletexString, are -- measured in characters. Excepting PrintableString or IA5String, a -- significantly greater number of octets will be required to hold -- such a value. As a minimum, 16 octets, or twice the specified -- upper bound, whichever is the larger, should be allowed for -- TeletexString. For UTF8String or UniversalString at least four -- times the upper bound should be allowed. [...]
Even worse, the iDRAC doesn't issue an error or a warning if it's given an exceedingly long value. On the contrary, it silently accepts the longer value and just cuts off everything after the 48th or 51st character.
I've checked various versions of the Dell iDRAC and encountered the issue on at least the following Dell PowerEdge hardware and iDRAC software versions:
Host | Model | iDRAC | iDRAC Version |
---|---|---|---|
host1 | R815 | 6 | 1.95 (Build 05) |
host2 | R815 | 6 | 2.85 (Build 04) |
host3 | M915 | 6 | 3.75 (Build 5) |
host4 | M620 | 7 | 2.21.21.21 |
host5 | M630 | 8 | 2.30.30.30 |
host6 | M630 | 8 | 2.41.40.40 |
Here are some examples of the Subject:
from the CSRs generated on different Dell PowerEdge hardware and iDRAC software versions:
user@host:~$ for FL in *.csr; do echo -n "${FL}: "; openssl req -text -in ${FL} -noout | grep "Subject: "; done host1.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL host2.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL host3.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, FQDN/emailAddress=EMAIL host4.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL host5.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL host6.csr: Subject: C=CC, ST=SSSSSS, L=LLLLLL, O=OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, OU=OUOUOUOUOUOUOUOU, CN=FQDN/emailAddress=EMAIL
The values in the different fields have been anonymized with the characters of the field name, but have otherwise been preserved in the length they appeared in the CSR. The observed behaviour is mostly consistent, with only one exception where the field value is strangely cut off after the 51st character.
Needless to say, cropping the field values like this results in an invalid CSR which in turn leads to issues in the certificate generation process at the CA.
After raising a support case with Dell, the problem was verified and the issue was acknowledged as a defect. A future iDRAC update is supposed to fix this issue. Unfortunately i have no information what version that will be nor when it will be released.
In the meantime, until there is a proper fix for this issue, the second way can be used. This is to manually create all three components – an own SSL private key, an own CSR and a certificate – for the Dell iDRAC. Unfortunately this involves significantly more steps, which are described below.
If necessary install some additional software packages. For Debian based systems:
root@host:~# apt-get install openssl wsmancli rpl
In the following steps and commands, replace
<FQDN>
and<IP>
with the FQDN and IP address of the Dell iDRAC where necessary.Create a SSL private key:
user@host:~$ openssl genrsa -out <FQDN>.key.pem 4096
Generate a CSR for the FQDN and IP address of the Dell iDRAC. Replace
<C>
,<ST>
,<L>
,<O>
,<OU>
, and<EMAIL>
with the country, state, locality, organization, organizational-unit and a valid email address respectively.user@host:~$ openssl req -new -sha256 -nodes -key <FQDN>.key.pem -out <FQDN>.csr.pem -subj '/C=<C>/ST=<ST>/L=<L>/O=<O>/OU=<OU>/CN=<FDQN>/emailAddress=<EMAIL>' -config <( cat <<-EOF [req] req_extensions = req_ext distinguished_name = dn [ dn ] [ req_ext ] subjectAltName = @alt_names [alt_names] DNS.1 = <FQDN> IP.1 = <IP> EOF )
Additional
DNS.X = <FQDN>
orIP.X = <IP>
lines can be added if necessary. Increment the value ofX
accordingly.With the generated CSR, request a certificate from your own CA or the one of your service provider. The range of different CAs and organizational processes involved is just too broad to cover this step in greater detail.
Merge the received certificate and the private key which was generated in one of the previous steps:
user@host:~$ cat <FQDN>.cert.pem <FQDN>.key.pem > <FQDN>.certkey.pem
Convert the combined certificate and the private key into the PKCS #12 format:
user@host:~$ openssl pkcs12 -export -in <FQDN>.certkey.pem -out <FQDN>.certkey.pfx
This command will request a password at the command line which will later on be needed again.
Convert the PKCS #12 file into a base64-encoded file:
user@host:~$ openssl base64 -in <FQDN>.certkey.pfx -out <FQDN>.certkey.p12
Prepare the base64-encoded PKCS #12 file for the upload to the Dell iDRAC over its SOAP-interface:
user@host:~$ echo "<p:SetCertificateAndPrivateKey_INPUT xmlns:p="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_LCService"><p:Type>server</p:Type><p:PKCS12>" > <FQDN>.certkey.xml user@host:~$ cat <FQDN>.certkey.p12 >> <FQDN>.certkey.xml user@host:~$ echo "</p:PKCS12><p:PKCS12pin>PASSWORD_AT_PKCS12_EXPORT</p:PKCS12pin></p:SetCertificateAndPrivateKey_INPUT>" >> <FQDN>.certkey.xml
Replace the string
PASSWORD_AT_PKCS12_EXPORT
with the password that was given in the previous step where the certificate was exported into the PKCS #12 format:user@host:~$ rpl PASSWORD_AT_PKCS12_EXPORT <Paßwort> <FQDN>.certkey.xml
Upload the base64-encoded PKCS #12 file to the Dell iDRAC over its SOAP-interface. Replace
<IP>
with the IP address of the Dell iDRAC and<USER>
and<PASSWORD>
with the appropriate credentials of a user allowed to access the Dell iDRAC:user@host:~$ wsman invoke -a SetCertificateAndPrivateKey http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_LCService?SystemCreationClassName=DCIM_ComputerSystem,CreationClassName=DCIM_LCService,SystemName=DCIM:ComputerSystem,Name=DCIM:LCService -h <IP> -P 443 -u <USER> -p <PASSWORD> -c dummy.cert -y basic -V -v -J <FQDN>.certkey.xml -j utf-8
The last two steps are optional and only necessary if you don't want to or cannot install the Dell OpenManage DRAC Tools, which include the racadm
command, or if the remote access via racadm
is not or cannot be enabled.
This workaround can also be applied generally and independently of the mentioned bug in the Dell iDRAC in order to create a valid SSL certificate for Dell iDRAC and Dell Blade CMC systems.