2018-12-01 // Check_MK Monitoring - Brocade / Broadcom Fibre Channel Switches
This article provides patches for the standard Check_MK distribution in order to fix and enhance the support for the monitoring of Brocade / Broadcom Fibre Channel Switches.
Out of the box, there is currently already monitoring support available for Brocade / Broadcom Fibre Channel Switches in the standard Check_MK distribution. Unfortunately there are several issues in the check brocade_fcport
, which is used to monitor the status and several metrics of the fibre channel switch ports. Those issues prevent the check from working as intended and are fixed in the version provided in this article. Up until recently, there also was no support in the standard Check_MK distribution for monitoring CPU and memory metrics on a switch level and no support for monitoring SFP metrics on a port level. This article introduces the new checks brocade_cpu
, brocade_mem
and brocade_sfp
to cover those metrics. With the most recent upstream version 1.5.x of the Check_MK distribution, there are now the standard checks brocade_sys
(covering CPU and memory metrics) and brocade_sfp
(covering the SFP port metrics) available, providing basically the same functionality.
For the impatient and TL;DR here is the enhanced version of the brocade_fcport
check:
Enhanced version of the brocade_fcport check
And the new brocade_cpu
, brocade_mem
and brocade_sfp
checks:
The new brocade_cpu check
The new brocade_mem check
The new brocade_sfp check
The sources to the new and enhanced versions of all the checks can be found in my Check_MK Plugins repository on GitHub.
Additional Checks
CPU Usage
The Check_MK service check brocade_cpu
monitors the current CPU utilization of Brocade / Broadcom fibre channel switches. It uses the SNMP OID swCpuUsage
from the fibre channel switch MIB (SW-MIB
) in order to create one service check for each CPU found in the system. The current CPU utilization in percent is compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the standard WATO plugin for CPU utilization it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values (warning: 80%; critical: 90% of CPU utilization).
The following image shows a status output example for the brocade_cpu
service check from the WATO WebUI:
This example shows the current CPU utilization of the system in percent.
The following image shows an example of the service metrics graph for the brocade_cpu
service check:
The selected example graph shows the current CPU utilization of the system in percent as well as the default warning and critical threshold values.
Memory Usage
The Check_MK service check brocade_mem
monitors the current memory (RAM) usage on Brocade / Broadcom fibre channel switches. It uses the SNMP OID swMemUsage
from the fibre channel switch MIB (SW-MIB
) in order to create a service check for the overall memory usage on the system. The amount of currently used memory is compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the added WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values (warning: 80%; critical: 90% of used memory). The configuration options for the used memory levels can be found under:
-> Host & Service Parameters -> Parameters for discovered services -> Operating System Resources -> Brocade Fibre Channel Memory Usage -> Create rule in folder ... [x] Levels for memory usage
The following image shows a status output example for the brocade_mem
service check from the WATO WebUI:
This example shows the current memory utilization of the system in percent.
The following image shows an example of the service metrics graph for the brocade_mem
service check:
The selected example graph shows the current memory utilization of the system in percent as well as the default warning and critical threshold values.
SFP Health
The Check_MK service check brocade_sfp
monitors several metrics of the SFPs in all enabled and active ports of Brocade / Broadcom fibre channel switches. It uses several SNMP OIDs from the fibre channel switch MIB (SW-MIB
) and from the Fabric Alliance Extension MIB (FA-EXT-MIB
) in order to create one service check for each SFP found in an enabled and active port on the system. The OIDs from the fibre channel switch MIB (swFCPortSpecifier
, swFCPortName
, swFCPortPhyState
, swFCPortOpStatus
, swFCPortAdmStatus
) are used to determine the number, name and status of the switch port. The OIDs from the Fabric Alliance Extension MIB are used to determine the actual SFP metrics. Those metrics are the SFP temperature (swSfpTemperature
), voltage (swSfpVoltage
) and current (swSfpCurrent
), as well as the optical receive and transmit power (swSfpRxPower
and swSfpTxPower
) of the SFP. Unfortunately the SFP metrics are not available on all Brocade / Broadcom switch hardware platforms and FabricOS version. This check was verified to work with Gen6 hardware and FabricOS v8. Another limitation is, that the SFP metrics are not available in real-time, but are only gathered in a 5 minute interval by the switch from all the SFPs in the switch. This is most likely a precausion as not to overload the processing capacity on the SFPs with too many status requests.
The current temperature level as well as the optical receive and transmit power levels are compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the added WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values:
Metric | Warning Threshold | Critical Threshold |
---|---|---|
System temperature | 55°C | 65°C |
Optical receive power | -7.0 dBm | -9.0 dBm |
Optical transmit power | -2.0 dBm | -3.0 dBm |
The configuration options for the used temperature and optical receive and transmit power levels can be found under:
-> Host & Service Parameters -> Parameters for discovered services -> Networking -> Brocade Fibre Channel SFP -> Create rule in folder ... [x] Temperature levels in degrees celcius [x] Receive power levels in dBm [x] Transmit power levels in dBm
Currently, the electrical current and voltage metrics of the SFPs are only used for long-term trends via the respective service metric template and thus are not used to raise any alarms. This is due to the fact that there is little to no information available on which precise electrical current and voltage levels would constitute as an indicator for an immediate or impending failure state.
The following image shows several status output examples for the brocade_sfp
service check from the WATO WebUI:
This example shows SFP Port service check items for eight consecutive ports on a switch. For each item the current temperature, voltage and current, as well as the optical receive and transmit power of the SFP are shown. The optical receive and transmit power levels are also visualized in the Perf-O-Meter, optical receive power growing from the middle to the left, optical transmit power growing from the middle to the right.
The following image shows an example of the service metrics graph for the brocade_sfp
service check:
The first graph shows the optical receive and transmit power of the SFP. The second shows the electrical current drawn by the SFP. The third graph shows the temperature of the SFP. The fourth and last graph shows the electrical voltage provided to the SFP.
Modified Check
Fibre Channel Port
The Check_MK service check brocade_fcport
has several issuse which are addressed by the following set of patches. The patch is actually a monolithic one and is just broken up into individual patches here for ease of discussion.
The first patch is a simple, but ugly workaround to prevent the conversion of OID_END
, which carries the port index information, from being treated as a BINARY
SNMP value:
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -154,8 +135,8 @@ bbcredits = None if len(if64_info) > 0: fcmgmt_portstats = [] - for oidend, dummy, tx_elements, rx_elements, bbcredits_64 in if64_info: - if int(index) == int(oidend.split(".")[-1]): + for oidend, tx_elements, rx_elements, bbcredits_64 in if64_info: + if index == oidend.split(".")[-1]: fcmgmt_portstats = [ binstring_to_int(''.join(map(chr, tx_elements))) / 4, binstring_to_int(''.join(map(chr, rx_elements))) / 4, @@ -477,7 +426,6 @@ # Not every device supports that (".1.3.6.1.3.94.4.5.1", [ OID_END, - "1", # Dummy value, otherwise OID_END is also treated as a BINARY value BINARY("6"), # FCMGMT-MIB::connUnitPortStatCountTxElements BINARY("7"), # FCMGMT-MIB::connUnitPortStatCountRxElements BINARY("8"), # FCMGMT-MIB::connUnitPortStatCountBBCreditZero
This is achieved by simply inserting a dummy value into the list of SNMP OIDs in the snmp_info
variable. I haven't had time to dig out the root cause of this behaviour, but i guess it must be somewhere in the core SNMP components of Check_MK. The patch also implicitly addresses and fixes an issue where two variables with differently typed content are being compared. This is achieved by simply changing the line:
if index == oidend.split(".")[-1]:
to:
if int(index) == int(oidend.split(".")[-1]):
and thus forcing a conversion to integer values.
The next patch adds support for additional encoding schemes in the FC-1 layer, which are used for fibre channel beyond the speed of 8 GBit. Up to and including a speed of 8 GBit, fibre channel uses a 8/10b encoding. This means that for every 8 bits of data, 10 bits are actually send over the fibre channel link. The encoding scheme changes for speeds higher than 8 GBit. 16 GBit fibre channel – like 10 GBit ethernet – uses a 64/66b encoding scheme, 32 GBit fibre channel and higher use a 256/257b encoding scheme. Thus the wirespeed calculations of the brocade_fcport
service check are incorrect for speeds above 8 GBit. The following patch addresses and fixes this issue:
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -283,15 +267,8 @@ output.append(speedmsg) - if gbit > 16: - # convert gbit netto link-rate to Byte/s (256/257 enc) - wirespeed = gbit * 1000000000.0 * ( 256 / 257 ) / 8 - elif gbit > 8: - # convert gbit netto link-rate to Byte/s (64/66 enc) - wirespeed = gbit * 1000000000.0 * ( 64 / 66 ) / 8 - else: # convert gbit netto link-rate to Byte/s (8/10 enc) - wirespeed = gbit * 1000000000.0 * ( 8 / 10 ) / 8 + wirespeed = gbit * 1000000000.0 * 0.8 / 8 in_bytes = 4 * get_rate("brocade_fcport.rxwords.%s" % index, this_time, rxwords) out_bytes = 4 * get_rate("brocade_fcport.txwords.%s" % index, this_time, txwords)
The third patch simply adds the notxcredits
counter and its value to the output of the service check. This information is currently missing from the status output of the check and is just added as a convenience:
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -408,9 +361,6 @@ summarystate = max(1, summarystate) text += "(!)" output.append(text) - else: - if counter == "notxcredits": - output.append(text) # P O R T S T A T E for dev_state, state_key, state_info, warn_states, state_map in [
The last patch addresses and fixes a more serious issue. The details are explained in the comment of the following code snippet. The digest here being, that the metric notxcredits
(“No TX buffer credits”) gathered from the SNMP OID swFCPortNoTxCredits
is being calculated wrong. This is due to the fact, that the brocade_fcport
service check treats this metric like all the other error metrics of a switchport (e.g. “CRC errors”, “ENC-Out”, “ENC-In” and “C3 discards”) and puts it in relation to the number of frames transmitted over a link. In reality though the definition of the metric behind the SNMP OID swFCPortNoTxCredits
is, that it is actually relative to time. This issue leads to false positives in certain edge cases. For example when a little utilized switch port sees an otherwise uncritical number of swFCPortNoTxCredits
due to the normal activity of the fibre channel flow control. In such a case, a relatively high number of swFCPortNoTxCredits
is put into relation to the sum of a relatively low number of frames transmitted over the link and again the relatively high number of swFCPortNoTxCredits
. See the last line of the code snippet below for this calculation. The result is a high value for the metric notxcredits
(“No TX buffer credits”) although the fibre channel flow control was working perfectly fine, the configured switch.edgeHoldTime
was most likely never reached and frames have never been dropped.
In order to address this issue, the following patch adds a few lines of code for a special treatment of the metric notxcredits
to the brocade_fcport
service check. This is done in the last section of the patch. The second section of the patch is just for the purpose of clarification, as it sets the metric that is being related to, to None
in case of the metric notxcredits
. The first section of the patch adjusts the default warning and critical thresholds to lower levels. The previous levels were rather high due to the miscalculation explained above. With the new calculation added by the third section of the patch, the default warning and critical threshold levels need to be much more sensitive.
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -100,11 +100,11 @@ factory_settings["brocade_fcport_default_levels"] = { "rxcrcs": (3.0, 20.0), # allowed percentage of CRC errors "rxencoutframes": (3.0, 20.0), # allowed percentage of Enc-OUT Frames "rxencinframes": (3.0, 20.0), # allowed percentage of Enc-In Frames - "notxcredits": (1.0, 3.0), # allowed percentage of No Tx Credits + "notxcredits": (3.0, 20.0), # allowed percentage of No Tx Credits "c3discards": (3.0, 20.0), # allowed percentage of C3 discards "assumed_speed": 2.0, # used if speed not available in SNMP data } @@ -349,7 +327,7 @@ ("ENC-Out", "rxencoutframes", rxencoutframes, rxframes_rate), ("ENC-In", "rxencinframes", rxencinframes, rxframes_rate), ("C3 discards", "c3discards", c3discards, txframes_rate), - ("No TX buffer credits", "notxcredits", notxcredits, None), + ("No TX buffer credits", "notxcredits", notxcredits, txframes_rate), ]: per_sec = get_rate("brocade_fcport.%s.%s" % (counter, index), this_time, value) perfdata.append((counter, per_sec)) @@ -360,31 +338,6 @@ (counter, item), this_time, per_sec, average) perfdata.append( ("%s_avg" % counter, per_sec_avg ) ) - # Calculate error rates - if counter == "notxcredits": - # Calculate the error rate for "notxcredits" (buffer credit zero). Since this value - # is relative to time instead of the number of transmitted frames it needs special - # treatment. - # Semantics of the buffer credit zero value on Brocade / Broadcom devices: - # The switch ASIC checks the buffer credit value of a switch port every 2.5us and - # increments the buffer credit zero counter if the buffer credit value is zero. - # This means if in a one second interval the buffer credit zero counter increases - # by 400000 the link on this switch port is not allowed to send any frames. - # By default the edge hold time on a Brocade / Broadcom device is about 200ms: - # switch.edgeHoldTime:220 - # If a C3 frame remains in the switches queue for more than 220ms without being - # given any credits to be transmitted, it is subsequently dropped. Thus the buffer - # credit zero counter would optimally be correlated to the C3 discards counter. - # Unfortunately the Brocade / Broadcom devices have no egress buffering and do - # ingress buffering instead. Thus the C3 discards counters are increased on the - # ingress port, while the buffer credit zero counters are increased on the egress - # port. The trade-off is to correlate the buffer credit zero counters relative to - # the measured time interval. - if per_sec > 0: - rate = per_sec / 400000.00 - else: - rate = 0 - else: # compute error rate (errors in relation to number of frames) (from 0.0 to 1.0) if ref > 0 or per_sec > 0: rate = per_sec / (ref + per_sec)
Conclusion
Adding the three new checks for Brocade / Broadcom fibre channel switches to your Check_MK server enables you to monitor additional CPU and memory aspects as well as the SFPs of your Brocade / Broadcom fibre channel devices. More recent versions of Check_MK should already include the same functionality through the now included standard checks brocade_sys
and brocade_sfp
.
New Brocade / Broadcom fibre channel devices should pick up the additional service checks immediately. Existing Brocade / Broadcom fibre channel devices might need a Check_MK inventory to be run explicitly on them in order to pick up the additional service checks.
The enhanced version of the brocade_fcport
service check addresses and fixes several issues present in the standard Check_MK service check. It should provide you with a more complete and future-proof monitoring of your fibre channel network infrastructure.
I hope you find the provided new and enhanced checks useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.
2018-02-10 // Brocade Fabric OS Authentication Failure with SSH Public Key
With the update to Fabric OS v7.4.1d on Brocade fibre channel SAN switches, the CLI login via SSH public key authentication will sometimes be broken for administrative users. This blog post describes a manual workaround which can be used in order to temporarily correct this issue without the immediate need for another Fabric OS update.
During the preparation phase for migrating from our aging Brocade 5100 and 5300 Gen4 fibre channel SAN switches to the shiny new Brocade G620 Gen6 fibre channel SAN Switches, we needed to update the Fabric OS on the old switches to a v7.4.x version. Due to compatibility and support constraints with the IBM SAN Volume Controller (SVC), we decided to go with the Fabric OS v7.4.1d version.
After the successful Fabric OS update, the CLI login via SSH public key authentication was broken for some, but not all users with admin level priviledges on some but not all switches. A re-upload of the SSH public key for those users with the sshUtil importpubkey
command didn't solve the issue. Debugging this further with a strace
attached to the SSH daemon process on an affected switch revealed why the SSH public key authentication was failing:
[...] [pid 27941] connect(8, {sa_family=AF_FILE, path="/dev/log"}, 16) = -1 EPROTOTYPE (Protocol wrong type for socket) [pid 27941] close(8) = 0 [pid 27941] socket(PF_FILE, SOCK_STREAM, 0) = 8 [pid 27941] fcntl64(8, F_SETFD, FD_CLOEXEC) = 0 [pid 27941] connect(8, {sa_family=AF_FILE, path="/dev/log"}, 16) = 0 [pid 27941] send(8, "<39>Feb 1 22:07:00 sshd[27941]: debug1: trying public key file /fabos/users/admin/.ssh/authorized_keys.<USERNAME>\0", 117, MSG_NOSIGNAL) = 117 [pid 27941] close(8) = 0 [pid 27941] open("/fabos/users/admin/.ssh/authorized_keys.<USERNAME_2>", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = -1 EACCES (Permission denied) [...]
This was done by using the root
account on the Brocade switch. The same account was also used for the following research and the temporary workaround derived from this. Beware that using the root
account on Brocade switches might have serious implications on the warranty or support for the devices. Be extra careful what you are doing as root
on the Brocade switch, since it might easily affect the operational status of the device.
The last line from the above snippet of the strace
output show, that the cause of the issue with SSH public key authentication was in the permissions of the users authorized_keys
file. Looking at the permissions of the directory containing the file and the file itself showed:
switch:FID128:root> cd /fabos/users/admin/ switch:FID128:root> ls -al total 28 drwxr-xr-x 4 root admin 4096 Jan 23 12:36 ./ drwxr-xr-x 12 root sys 4096 Jul 15 2016 ../ -rw-r--r-- 1 root admin 507 Jul 15 2016 .bash_logout -rw-r--r-- 1 root admin 27 Jul 15 2016 .inputrc -rw-r--r-- 1 root admin 1275 Jul 15 2016 .profile drwxr-xr-x 2 root admin 4096 Feb 1 22:54 .ssh/ drwxrwxrwx 3 root sys 4096 Aug 11 2011 .terminfo/ switch:FID128:root> cd .ssh switch:FID128:root> pwd /fabos/users/admin/.ssh switch:FID128:root> ls -al total 44 drwxr-xr-x 2 root admin 4096 Feb 1 22:54 ./ drwxr-xr-x 4 root admin 4096 Jan 23 12:36 ../ -rw-r--r-- 1 root admin 10240 Feb 1 22:54 authorizedKeys.tar -rw------- 1 root root 408 Jan 23 12:43 authorized_keys -rw-r--r-- 1 root admin 755 Dec 8 11:13 authorized_keys.<USERNAME_1> -rw------- 1 root admin 1230 Feb 1 22:54 authorized_keys.<USERNAME_2> -rw------- 1 root root 408 Mar 22 2016 authorized_keys.<USERNAME_3> -rw-r--r-- 1 root root 605 Sep 19 11:06 authorized_keys.<USERNAME_4> -rw-r--r-- 1 root admin 134 Jul 15 2016 environment switch:FID128:root> tar tvf authorizedKeys.tar -rw-r--r-- root/admin 755 2017-12-08 11:13:06 authorized_keys.<USERNAME_1> -rw------- root/admin 1230 2018-02-01 22:54:01 authorized_keys.<USERNAME_2> -rw------- root/root 408 2016-03-22 10:12:37 authorized_keys.<USERNAME_3> -rw-r--r-- root/root 606 2017-09-19 11:06:10 authorized_keys.<USERNAME_4> switch:FID128:root> tar tvf /mnt/fabos/users/admin/.ssh/authorizedKeys.tar -rw-r--r-- root/admin 755 2017-12-08 11:13:06 authorized_keys.<USERNAME_1> -rw------- root/admin 1230 2018-02-01 22:54:01 authorized_keys.<USERNAME_2> -rw------- root/root 408 2016-03-22 10:12:37 authorized_keys.<USERNAME_3> -rw-r--r-- root/root 606 2017-09-19 11:06:10 authorized_keys.<USERNAME_4>
The permissions on some, but not all of the authorized_keys.<USERNAME_*>
files were being too restrictive, since the SSH daemon was trying to read them as an effective user of the admin
group. An immediate fix for this issue was to alter the permissions on the authorized_keys.<USERNAME_*>
files in order to allow the admin
group to read the content of the files:
switch:FID128:root> cd /fabos/users/admin/ switch:FID128:root> chmod 640 authorized_keys.* switch:FID128:root> chown root:admin authorized_keys.* switch:FID128:root> tar cpf authorizedKeys.tar authorized_keys.* switch:FID128:root> cd /mnt/fabos/users/admin/.ssh/ switch:FID128:root> chmod 640 authorized_keys.* switch:FID128:root> chown root:admin authorized_keys.* switch:FID128:root> tar cpf authorizedKeys.tar authorized_keys.*
Again looking at the permissions of the directory containing the authorized_keys.<USERNAME_*>
files and the files itself showed now:
switch:FID128:root> cd /fabos/users/admin/ switch:FID128:root> ls -la total 44 drwxr-xr-x 2 root admin 4096 Feb 1 22:54 ./ drwxr-xr-x 4 root admin 4096 Jan 23 12:36 ../ -rw-r--r-- 1 root admin 10240 Feb 9 06:59 authorizedKeys.tar -rw------- 1 root root 408 Jan 23 12:43 authorized_keys -rw-r----- 1 root admin 755 Dec 8 11:13 authorized_keys.<USERNAME_1> -rw-r----- 1 root admin 1230 Feb 1 22:54 authorized_keys.<USERNAME_2> -rw-r----- 1 root admin 408 Mar 22 2016 authorized_keys.<USERNAME_3> -rw-r----- 1 root admin 605 Sep 19 11:06 authorized_keys.<USERNAME_4> -rw-r--r-- 1 root admin 134 Jul 15 2016 environment switch:FID128:root> tar tvf /fabos/users/admin/.ssh/authorizedKeys.tar -rw-r----- root/admin 755 2017-12-08 11:13:06 authorized_keys.<USERNAME_1> -rw-r----- root/admin 1230 2018-02-01 22:54:01 authorized_keys.<USERNAME_2> -rw-r----- root/admin 408 2016-03-22 10:12:37 authorized_keys.<USERNAME_3> -rw-r----- root/admin 606 2017-09-19 11:06:10 authorized_keys.<USERNAME_4> switch:FID128:root> cd /mnt/fabos/users/admin/.ssh/ switch:FID128:root> ls -la total 44 drwxr-xr-x 2 root admin 4096 Feb 1 22:54 ./ drwxr-xr-x 4 root admin 4096 Jan 23 12:50 ../ -rw-r--r-- 1 root admin 10240 Feb 1 22:54 authorizedKeys.tar -rw------- 1 root root 408 Jan 23 12:43 authorized_keys -rw-r----- 1 root admin 755 Dec 8 11:13 authorized_keys.<USERNAME_1> -rw-r----- 1 root admin 1230 Feb 1 22:54 authorized_keys.<USERNAME_2> -rw-r----- 1 root admin 408 Mar 22 2016 authorized_keys.<USERNAME_3> -rw-r----- 1 root admin 605 Sep 19 11:06 authorized_keys.<USERNAME_4> -rw-r--r-- 1 root admin 134 Jan 23 12:50 environment switch:FID128:root> tar tvf /mnt/fabos/users/admin/.ssh/authorizedKeys.tar -rw-r----- root/admin 755 2017-12-08 11:13:06 authorized_keys.<USERNAME_1> -rw-r----- root/admin 1230 2018-02-01 22:54:01 authorized_keys.<USERNAME_2> -rw-r----- root/admin 408 2016-03-22 10:12:37 authorized_keys.<USERNAME_3> -rw-r----- root/admin 606 2017-09-19 11:06:10 authorized_keys.<USERNAME_4>
With the corrected permissions on the authorized_keys.<USERNAME_*>
files, CLI login via SSH public key authentication was now possible again.
Unfortunately this is only a temporary workaround, since the next upload of a SSH public key with the sshUtil importpubkey
command will likely set the wrong permissions on the newly created or replaced authorized_keys.<USERNAME_*>
file. This is due to the root cause of the issue actually being with the sshUtil importpubkey
command. The snippet of a strace
output show below was captured from a running sshUtil importpubkey
command:
[...] chdir("/fabos/users/admin/.ssh") = 0 [...] [pid 10611] execve("/bin/cat", ["cat", "<USERNAME_2>_brocade_dsa.pub"], [/* 45 vars */]) = 0 [...] [pid 10612] execve("/bin/chmod", ["/bin/chmod", "600", "authorized_keys.<USERNAME_2>"], [/* 45 vars */]) = 0 [...] [pid 10612] lstat64("authorized_keys.<USERNAME_2>", {st_mode=S_IFREG|0640, st_size=1230, ...}) = 0 [pid 10612] chmod("authorized_keys.<USERNAME_2>", 0600) = 0 [...] [pid 10613] execve("/bin/cp", ["cp", "-f", "authorized_keys.<USERNAME_2>", "/mnt/fabos/users/admin/.ssh/"], [/* 45 vars */]) = 0 [...] [pid 10613] chmod("/mnt/fabos/users/admin/.ssh/authorized_keys.<USERNAME_2>", 0100600) = 0 [...] [pid 10618] execve("/bin/tar", ["tar", "-cf", "authorizedKeys.tar", "authorized_keys.<USERNAME_1>", "authorized_keys.<USERNAME_2>", "authorized_keys.<USERNAME_3>"], [/* 45 vars */]) = 0 [...]
The /bin/chmod
command on the third line of the above strace
output shows that the file permission for the authorized_keys.<USERNAME_2>
file is mistakenly set to 600
(-rw-------
) instead to at least 640
(-rw-r-----
). Exactly why this is sometimes happening can't be further analyzed, since the source code to the sshUtil
command is not available.
A permanent resolution to this issue will be to update to at least Fabric OS v7.4.1e. The Release Notes for Fabric OS v7.4.1e indicate this in the following known defect:
Defect ID: DEFECT000616486
Technical Severity: Medium
Probability: Medium
Product: Brocade Fabric OS
Technology Group: Security
Reported In Release: FOS7.4.1
Technology: SSH - Secure Shell
Symptom: Unable to authenticate an SSH session after importing public key to switch.
Condition: This is encountered by admin level users on a switch running Fabric OS v7.4.1d