bityard Blog

// Debugging Segfaults in Open-iSCSIs iscsiuio on Intel Broadwell

Open-iSCSIs tool iscsiuio, which is used to configure and manage Broadcom BCM577xx and BCM578xx iSCSI offload engines (iSOE), currently crashes with a segmentation fault upon target login on Intel Broadwell based systems. Comparable system setups, based on the older Intel Haswell chips do not show this issue.

In the past i've been using QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs which provide a full iSCSI offload engine (iSOE) implementation in the adapters firmware. Unfortunately the QLogic 8200 Series network adapters are no longer available for Dell M-Series blade servers. The alternatives offered by Dell are the Intel X520 and Intel X540 series adapters, or the Broadcom BCM57810S series adapters. Instead of using the Intel X520/X540, which provide no iSOE at all, i decided to go with the Broadcom BCM57810S, which at least provide some kind of iSOE. According to the VMware terminology the Broadcom BCM57810S are dependent hardware iSCSI Initiators. Dependent in this context means, that the iSOE does not implement all the necessary features and thus cannot perform all the tasks (e.g. TCP/IP stack, configuration and session management, authentication, etc.) necessary for target handling by itself. Rather, some of these tasks are provided by a third party on which this kind of iSOE depends on. In case of the Broadcom BCM57810S this third party is the iscsiuio daemon, which has for some time been part of the Open-iSCSI project. Simply put, the iscsiuio daemon acts as an intermediary between the iscsid on the one side and the QLogic1) NetXtreme II driver (kernel module bnx2 or bnx2x) and the QLogic2) CNIC driver (kernel module cnic) on the other side, facilitating the creation and overall management of offloaded iSCSI sessions. Very simplified, the flow of information is as follows:

iscsiadm ←→ iscsid ←→ iscsiuio ←→ bnx2/bnx2x ←→ cnic ←→ Broadcom BCM57810S adapter ←→ Network ←→ Target

In my environment the Broadcom BCM57810S adapters are installed and used on six hosts (host1 and host{5,6,7,8,9}). They all connect to the same Dell EqualLogic storage systems in the backend, using the same network dedicated to iSCSI traffic. All hosts are Dell M630 blade servers with exactly the same firmware, operating system (Debian 8) and software versions. I'm using a backported version of Open-iSCSI, which is based on Git commit 0fa43f29, but excluding the commits 76832662 and c6d1117b which just implement the externalization of the Open-iSNS code. The systems originate from the same install image, so their configuration is – to a large extent – exactly the same. The only difference between the hosts is that host1 has Intel E5 v3 (aka Haswell) CPUs, while host{5,6,7,8,9} have Intel E5 v4 (aka Broadwell) CPUs.

On host1 everything works fine, iscsiuio runs as expected and access to targets via the Broadcom BCM57810S iSOEs is working flawlessly. On host{5,6,7,8,9} on the other hand, i was getting segmentation faults like the one in the example shown below, while trying to log in to any target.

host5:~# gdb /sbin/iscsiuio
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /sbin/iscsiuio...(no debugging symbols found)...done.


(gdb) # run -d 4 -f
Starting program: /sbin/iscsiuio -d 4 -f
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
INFO  [Wed Jul 27 10:01:45 2016]Initialize logger using log file: /var/log/iscsiuio.log
INFO  [Wed Jul 27 10:01:45 2016]Started iSCSI uio stack: Ver 0.7.8.2
INFO  [Wed Jul 27 10:01:45 2016]Build date: Fri Jul 22 15:40:04 CEST 2016
INFO  [Wed Jul 27 10:01:45 2016]Debug mode enabled
INFO  [Wed Jul 27 10:01:45 2016]Running on sysname: 'Linux', release: '3.16.0-4-amd64', version '#1 SMP Debian 3.16.7-ckt
25-2+deb8u3 (2016-07-02)' machine: 'x86_64'
DBG   [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2' Version: '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016'
DBG   [Wed Jul 27 10:01:45 2016]Added 'bnx2' nic library
DBG   [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2x' Version: '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016'
DBG   [Wed Jul 27 10:01:45 2016]Added 'bnx2x' nic library
[New Thread 0x7ffff760f700 (LWP 4942)]
INFO  [Wed Jul 27 10:01:45 2016]signal handling thread ready
INFO  [Wed Jul 27 10:01:45 2016]nic_utils Found host[11]: host11
INFO  [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host11/netdev
INFO  [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host11/netdev
INFO  [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth3
WARN  [Wed Jul 27 10:01:45 2016]Could not find assoicate uio device with eth3
ERR   [Wed Jul 27 10:01:45 2016]nic_utils Could not determine UIO name for eth3
INFO  [Wed Jul 27 10:01:45 2016]nic_utils Found host[12]: host12
INFO  [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host12/netdev
INFO  [Wed Jul 27 10:01:45 2016]Done capturing /sys/class/iscsi_host/host12/netdev
INFO  [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth2
INFO  [Wed Jul 27 10:01:45 2016]nic_utils eth2 associated with uio0
INFO  [Wed Jul 27 10:01:45 2016]nic_utils NIC not found creating an instance for host_no: 12 eth2
DBG   [Wed Jul 27 10:01:45 2016]Could not increase process priority: Success
[New Thread 0x7ffff6e0e700 (LWP 4943)]
DBG   [Wed Jul 27 10:01:45 2016]iscsi_ipc Started iscsid listening thread
DBG   [Wed Jul 27 10:01:45 2016]iscsi_ipc Waiting for iscsid command
INFO  [Wed Jul 27 10:01:45 2016]NIC_NL Netlink to CNIC on pid 4938 is ready
DBG   [Wed Jul 27 10:01:57 2016]iscsi_ipc recv iscsid request: cmd: 1, payload_len: 11720
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc Received request for 'eth2' to set IP address: '10.0.1.62' VLAN: '0'
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc Using netmask: 0.0.0.0
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc  eth2, using existing NIC
INFO  [Wed Jul 27 10:01:57 2016]nic_utils looking for uio device for eth2
INFO  [Wed Jul 27 10:01:57 2016]nic_utils eth2 associated with uio0
INFO  [Wed Jul 27 10:01:57 2016]Done capturing /sys/class/uio/uio0/name
INFO  [Wed Jul 27 10:01:57 2016]nic_utils eth2: Verified uio name bnx2x_cnic with library bnx2x
INFO  [Wed Jul 27 10:01:57 2016]eth2: found NIC with library 'bnx2x'
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 library set using transport_name bnx2i
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: requesting configuration using static IP address
DBG   [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 couldn't find interface with ip_type: 0x2 creating it
INFO  [Wed Jul 27 10:01:57 2016]nic eth2: Added nic interface for VLAN: 0, protocol: 2
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: created network interface
[New Thread 0x7ffff660d700 (LWP 4947)]
WARN  [Wed Jul 27 10:01:57 2016]nic_utils eth2: device already disabled: flag: 0x1088 state: 0x1
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: configuring using static IP IPv4 address :10.0.1.62
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc  netmask: 255.255.255.0
[New Thread 0x7ffff5e0c700 (LWP 4948)]
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 vlan: 0 transport_name:bnx2i
INFO  [Wed Jul 27 10:01:57 2016]nic_utils eth2: spinning up thread for nic
DBG   [Wed Jul 27 10:01:57 2016]iscsi_ipc Waiting for iscsid command
[New Thread 0x7ffff560b700 (LWP 4949)]
DBG   [Wed Jul 27 10:01:57 2016]nic eth2: Waiting to be enabled
INFO  [Wed Jul 27 10:01:57 2016]Created nic thread: eth2
INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: started NIC enable thread state: 0x1
DBG   [Wed Jul 27 10:01:57 2016]nic eth2: is now enabled
INFO  [Wed Jul 27 10:01:57 2016]bnx2x eth2: bnx2x driver using version 1.78.19
ERR   [Wed Jul 27 10:01:58 2016]bnx2x /dev/uio0: uio device has been brought up via pid: 4938 on fd: 7
INFO  [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/name
INFO  [Wed Jul 27 10:01:58 2016]bnx2x eth2: Verified is a cnic_uio device
DBG   [Wed Jul 27 10:01:58 2016]bnx2x eth2: using rx ring size: 15, rx buffer size: 1024
INFO  [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/event
DBG   [Wed Jul 27 10:01:58 2016]bnx2x Chip ID: 168e1000
INFO  [Wed Jul 27 10:01:58 2016]nic_id eth2: is found at 03:00.00
INFO  [Wed Jul 27 10:01:58 2016]bnx2x eth2: func 0x0, pfid 0x0, client_id 0x88, cid 0x1
DBG   [Wed Jul 27 10:01:58 2016]bnx2x eth2: mode = 0x100
INFO  [Wed Jul 27 10:01:58 2016]bnx2x eth2:  Using mac address: d0:43:1e:51:98:53
INFO  [Wed Jul 27 10:01:58 2016]eth2: bnx2x initialized
INFO  [Wed Jul 27 10:01:58 2016]nic eth2: Initialized ip stack: VLAN: 0
INFO  [Wed Jul 27 10:01:58 2016]nic eth2: mac: d0:43:1e:51:98:53
INFO  [Wed Jul 27 10:01:58 2016]nic eth2: Using IP address: 10.0.1.62
INFO  [Wed Jul 27 10:01:58 2016]nic eth2: Using netmask: 255.255.255.0
INFO  [Wed Jul 27 10:01:58 2016]nic eth2: enabled vlan 0 protocol: 2
INFO  [Wed Jul 27 10:01:58 2016]nic eth2: entering main nic loop
DBG   [Wed Jul 27 10:01:58 2016]nic_utils eth2: device enabled
[Thread 0x7ffff5e0c700 (LWP 4948) exited]
DBG   [Wed Jul 27 10:01:59 2016]iscsi_ipc recv iscsid request: cmd: 1, payload_len: 11720
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc Received request for 'eth2' to set IP address: '10.0.1.62' VLAN: '0'
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc Using netmask: 0.0.0.0
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc  eth2, using existing NIC
INFO  [Wed Jul 27 10:01:59 2016]nic_utils looking for uio device for eth2
INFO  [Wed Jul 27 10:01:59 2016]nic_utils eth2 associated with uio0
INFO  [Wed Jul 27 10:01:59 2016]eth2: Have NIC library 'bnx2x'
INFO  [Wed Jul 27 10:01:59 2016]Done capturing /sys/class/uio/uio0/name
INFO  [Wed Jul 27 10:01:59 2016]nic_utils eth2: Verified uio name bnx2x_cnic with library bnx2x
INFO  [Wed Jul 27 10:01:59 2016]eth2: found NIC with library 'bnx2x'
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2 library set using transport_name bnx2i
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: requesting configuration using static IP address
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: using existing network interface
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: IP configuration didn't change using 0x2
INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: NIC already enabled flags: 0x1084 state: 0x4

INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 vlan: 0 transport_name:bnx2i
DBG   [Wed Jul 27 10:01:59 2016]iscsi_ipc Waiting for iscsid command
INFO  [Wed Jul 27 10:02:00 2016]NIC_NL Received path_req for host 12
INFO  [Wed Jul 27 10:02:00 2016]Done capturing /sys/class/iscsi_host/host12/netdev
DBG   [Wed Jul 27 10:02:00 2016]NIC_NL Pulled nl event
INFO  [Wed Jul 27 10:02:00 2016]NIC_NL eth2: Processing 'path_req'
DBG   [Wed Jul 27 10:02:00 2016]NIC_NL eth2: PATH_REQ with iface_num -1 VLAN 32768
DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: Netlink message with VLAN ID: 0, path MTU: 9000 minor: 0 ip_addr_len: 4
DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: src=10.0.1.62
DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: dst=10.0.1.2
DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: nm=255.255.255.0
INFO  [Wed Jul 27 10:02:00 2016]CNIC eth2: Didn't find IPv4: '10.0.1.2' in ARP table
DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: Sent cnic arp request for IP: 10.0.1.2
INFO  [Wed Jul 27 10:02:00 2016]Found 10.0.1.2 at b0:83:fe:cc:57:bb
DBG   [Wed Jul 27 10:02:00 2016]CNIC neighbor reply sent back to kernel 10.0.1.62 at b0:83:fe:cc:57:bb with vlan 0
INFO  [Wed Jul 27 10:02:00 2016]NIC_NL eth2: 'path_req' operation finished

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff660d700 (LWP 4947)]
__lll_unlock_elision (lock=0x55555577fd40, private=0) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
29      ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file or directory.


(gdb) # info threads
  Id   Target Id         Frame
  6    Thread 0x7ffff560b700 (LWP 4949) "iscsiuio" 0x00007ffff76f1ae3 in select () at ../sysdeps/unix/syscall-template.S:81
* 4    Thread 0x7ffff660d700 (LWP 4947) "iscsiuio" __lll_unlock_elision (lock=0x55555577fd40, private=0) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
  3    Thread 0x7ffff6e0e700 (LWP 4943) "iscsiuio" 0x00007ffff79c9ccd in accept () at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x7ffff760f700 (LWP 4942) "iscsiuio" do_sigwait (set=<optimized out>, sig=0x7ffff760eeac) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63
  1    Thread 0x7ffff7fea700 (LWP 4938) "iscsiuio" 0x00007ffff79c9e9d in recvmsg () at ../sysdeps/unix/syscall-template.S:81


(gdb) # thread apply all bt

Thread 6 (Thread 0x7ffff560b700 (LWP 4949)):
#0  0x00007ffff76f1ae3 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x000055555555ac06 in ?? ()
#2  0x000055555555d39e in nic_loop ()
#3  0x00007ffff79c30a4 in start_thread (arg=0x7ffff560b700) at pthread_create.c:309
#4  0x00007ffff76f887d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x7ffff660d700 (LWP 4947)):
#0  __lll_unlock_elision (lock=0x55555577fd40, private=0) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
#1  0x00007ffff79c7007 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:94
#2  0x000055555555e803 in nl_process_handle_thread ()
#3  0x00007ffff79c30a4 in start_thread (arg=0x7ffff660d700) at pthread_create.c:309
#4  0x00007ffff76f887d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7ffff6e0e700 (LWP 4943)):
#0  0x00007ffff79c9ccd in accept () at ../sysdeps/unix/syscall-template.S:81
#1  0x00005555555641b0 in ?? ()
#2  0x00007ffff79c30a4 in start_thread (arg=0x7ffff6e0e700) at pthread_create.c:309
#3  0x00007ffff76f887d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7ffff760f700 (LWP 4942)):
#0  do_sigwait (set=<optimized out>, sig=0x7ffff760eeac) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63
#1  0x00007ffff79ca693 in __sigwait (set=0x7ffff760eeb0, sig=0x0) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:97
#2  0x000055555555a49c in _start ()

Thread 1 (Thread 0x7ffff7fea700 (LWP 4938)):
#0  0x00007ffff79c9e9d in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#1  0x000055555555e5e9 in ?? ()
#2  0x000055555555eea8 in nic_nl_open ()
#3  0x000055555555a1b8 in main ()

The Open-iSCSI command which lead to this segmentation fault was a simple login at a previously defined target node:

host5:~# iscsiadm -m node -T <target iqn> -I <interface> --login

Double-checking the configuration, the firmware and software versions as well as the general hardware setup didn't yield any usable indication as to where the root cause of this issue might be. Searching the web for __lll_unlock_elision in conjunction with the pthread_* function calls, led me to the following resources:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574
https://lwn.net/Articles/534758/

Those are pointing towards a CPU (Broadwell and Skylake) specific problem when not carefully using mutexes. The general opinion from there and also other related bug reports seems to be, that the source of such issues is almost always an improper use of mutex locking, which – up to now – has either been tolerated or just by chance not lead to a failure. More recent CPUs and software implementations (e.g. the GNU libc) appear to be less forgiving in this regard. Thus the advice is to change the application behaviour towards a proper use of mutex locking, in order to address such an issue.

The article Intel's Broadwell Xeon E5-2600 v4 chips: So what's in it for you, smartie-pants coders offers a rather nice write-up of the new features introduced in the Intel Broadwell CPUs.

Tracking this issue further down in the Open-iSCSI sources, i ended up in the function nl_process_handle_thread() in iscsiuio/src/unix/nic_nl.c and specifically in the following code section:

iscsiuio/src/unix/nic_nl.c
474 /* NIC specific nl processing thread */
475 void *nl_process_handle_thread(void *arg)
476 {
[...]
483         while (!event_loop_stop) {
484                 char *data = NULL;
485
486                 rc = pthread_cond_wait(&nic->nl_process_cond,
487                                        &nic->nl_process_mutex);
488                 if (rc != 0) {
489                         LOG_ERR("Fatal error in NL processing thread "
490                                 "during wait[%s]", strerror(rc));
491                         break;
492                 }
[...]
499                 pthread_mutex_unlock(&nic->nl_process_mutex);
[...]

Debugging this revealed that the call to pthread_cond_wait() from the above GDB backtrace output of thread number 4 is the one from line 486 in the above code snippet.

Looking at the pthread_cond_wait() manpage showed the following constraint for its proper use:

[…]
The pthread_cond_timedwait() and pthread_cond_wait() functions shall
block on a condition variable. They shall be called with mutex locked
by the calling thread or undefined behavior results.
[…]

Although not shown in the above GDB output, this would on occasion – and again, probably just by chance – work on the first pass of the loop. At the end of the loop, at line 499 in the above code snippet, the mutex is then unlocked. Thus the cited constraint from the pthread_cond_wait() manpage is no longer met on the subsequent passes of the loop. On Intel E5 v3 (aka Haswell) CPUs, this seemed to be tolerated and without any impact. But on Intel E5 v4 (aka Broadwell) – and probably other CPUs implementing HLE and RTM – this causes the observed segmentation fault.

In order to verify my analysis and test this theory, i added a call to pthread_mutex_lock() right before the call to pthread_cond_wait() in line 486. The resulting change is available in the Git commit 9f770f9e of my Open-iSCSI fork on Github and also shown in the following patch:

nic_nl.c.patch
diff --git a/iscsiuio/src/unix/nic_nl.c b/iscsiuio/src/unix/nic_nl.c
index 391003f..581ddb0 100644
--- a/iscsiuio/src/unix/nic_nl.c
+++ b/iscsiuio/src/unix/nic_nl.c
@@ -483,6 +483,7 @@ void *nl_process_handle_thread(void *arg)
        while (!event_loop_stop) {
                char *data = NULL;
 
+               pthread_mutex_lock(&nic->nl_process_mutex);
                rc = pthread_cond_wait(&nic->nl_process_cond,
                                       &nic->nl_process_mutex);
                if (rc != 0) {

Posting this on the Open-iSCSI mailing list lead to this discussion. The suggestion from there was an additional change to the error handling code of nl_process_handle_thread() starting at line 488 in the above above code snippet. This adds proper handling of the locked mutex in case the loop is left due to an error returned from the call to pthread_cond_wait(). The resulting additional change is available in the Git commit 4191ca6b of my Open-iSCSI fork on Github. The following patch shows the summarised code changes:

nic_nl.c.patch
diff --git a/iscsiuio/src/unix/nic_nl.c b/iscsiuio/src/unix/nic_nl.c
index 391003f..1a920c7 100644
--- a/iscsiuio/src/unix/nic_nl.c
+++ b/iscsiuio/src/unix/nic_nl.c
@@ -483,9 +483,11 @@ void *nl_process_handle_thread(void *arg)
        while (!event_loop_stop) {
                char *data = NULL;
 
+               pthread_mutex_lock(&nic->nl_process_mutex);
                rc = pthread_cond_wait(&nic->nl_process_cond,
                                       &nic->nl_process_mutex);
                if (rc != 0) {
+                       pthread_mutex_unlock(&nic->nl_process_mutex);
                        LOG_ERR("Fatal error in NL processing thread "
                                "during wait[%s]", strerror(rc));
                        break;

With those two small changes to the sources of Open-iSCSIs iscsiuio, the iSCSI connections via the Broadcom BCM57810S iSOE do now work flawlessly even on newer Intel E5 v4 (aka Broadwell) based systems. Hopefully the original authors of the iscsiuio code at Broadcom/QLogic will also take part in the discussion and provide their feedback on the proposed code changes too.

1) , 2)
formerly Broadcom

// QLogic iSCSI HBA and Limitations in Bi-Directional Authentication

In the past the QLogic QConvergeConsole (qaucli) was used as an administration tool for the hardware initiator part of the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs. Unfortunately this tool was only supported on the so-called “enterprise Linux distributions” like RHEL and SLES. If you were running any other Linux distribution like e.g. Debian or even one of the BSD distributions you were out of luck.

Thankfully QLogic addressed this support issue indirectly, by first announcing and since then by actually moving from a IOCTL based management method towards the Open-iSCSI based management method via the iscsiadm command. The announcement QLogic iSCSI Solution for Transitioning to the Open-iSCSI Model and the User's Guide IOCTL to Open-iSCSI Interface can be found at the QLogic web site.

While trying to test and use the new management method for the hardware initiator via the Open-iSCSI iscsiadm command, i soon ran into the issue that the packaged version of Open-iSCSI, which is shipped with Debian Wheezy, is based on the last stable release v2.0.873 from Open-iSCSI and is thus hopelessly out of date. The Open-iSCSI package shipped with Debian Jessie is a bit better, since it's already based on a newer version from the projects GitHub repository. Still, the Git commit used there dates back to August 23rd of 2013, which is also fairly old. After updating my system to Debian Jessie, i soon decided to rebuild the Open-iSCSI package from a much more recent version from the projects GitHub repository. With this, the management of the QLogic hardware initiators worked very well via the Open-iSCSI iscsiadm command and its now enhanced host mode.

In the host mode there are now three sub-modes chap, flashnode, stats. See man iscsiadm and /usr/share/doc/open-iscsi/README.gz for more details on how to use them. By first calling the host mode without any sub-mode, iscsiadm prints a list of available iSCSI HBAs along with the host number – shown in the first pair of square brackets – associated with each host by the OS kernel:

root@host:~$ iscsiadm -m host
qla4xxx: [1] 10.0.0.5,[84:8f:69:35:fc:70],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2c.4
qla4xxx: [2] 10.0.0.6,[84:8f:69:35:fc:71],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2d.5

The host number – in the above example 1 and 2 – is used in the following examples showing the three sub-modes:

  • The stats sub-mode displays various statistics values, like e.g. TCP/IP and iSCSI sessions, of the given HBA port:

    root@host:~$ iscsiadm -m host -H 1 -C stats
    
    Host Statistics:
        mactx_frames: 2351750
        mactx_bytes: 233065914
        mactx_multicast_frames: 1209409
        mactx_broadcast_frames: 0
        mactx_pause_frames: 0
        mactx_control_frames: 0
        mactx_deferral: 0
        mactx_excess_deferral: 0
        mactx_late_collision: 0
        mactx_abort: 0
        mactx_single_collision: 0
        mactx_multiple_collision: 0
        mactx_collision: 0
        mactx_frames_dropped: 0
        mactx_jumbo_frames: 0
        macrx_frames: 4037613
        macrx_bytes: 1305799553
        macrx_unknown_control_frames: 0
        macrx_pause_frames: 0
        macrx_control_frames: 0
        macrx_dribble: 0
        macrx_frame_length_error: 0
        macrx_jabber: 0
        macrx_carrier_sense_error: 0
        macrx_frame_discarded: 0
        macrx_frames_dropped: 2409752
        mac_crc_error: 0
        mac_encoding_error: 0
        macrx_length_error_large: 0
        macrx_length_error_small: 0
        macrx_multicast_frames: 0
        macrx_broadcast_frames: 0
        iptx_packets: 1694187
        iptx_bytes: 112412836
        iptx_fragments: 0
        iprx_packets: 1446806
        iprx_bytes: 721191324
        iprx_fragments: 0
        ip_datagram_reassembly: 0
        ip_invalid_address_error: 0
        ip_error_packets: 0
        ip_fragrx_overlap: 0
        ip_fragrx_outoforder: 0
        ip_datagram_reassembly_timeout: 0
        ipv6tx_packets: 0
        ipv6tx_bytes: 0
        ipv6tx_fragments: 0
        ipv6rx_packets: 0
        ipv6rx_bytes: 0
        ipv6rx_fragments: 0
        ipv6_datagram_reassembly: 0
        ipv6_invalid_address_error: 0
        ipv6_error_packets: 0
        ipv6_fragrx_overlap: 0
        ipv6_fragrx_outoforder: 0
        ipv6_datagram_reassembly_timeout: 0
        tcptx_segments: 1694187
        tcptx_bytes: 69463008
        tcprx_segments: 1446806
        tcprx_byte: 692255204
        tcp_duplicate_ack_retx: 8
        tcp_retx_timer_expired: 28
        tcprx_duplicate_ack: 0
        tcprx_pure_ackr: 0
        tcptx_delayed_ack: 247594
        tcptx_pure_ack: 247710
        tcprx_segment_error: 0
        tcprx_segment_outoforder: 0
        tcprx_window_probe: 0
        tcprx_window_update: 2248673
        tcptx_window_probe_persist: 0
        ecc_error_correction: 0
        iscsi_pdu_tx: 1446486
        iscsi_data_bytes_tx: 30308
        iscsi_pdu_rx: 1446510
        iscsi_data_bytes_rx: 622721801
        iscsi_io_completed: 253632
        iscsi_unexpected_io_rx: 0
        iscsi_format_error: 0
        iscsi_hdr_digest_error: 0
        iscsi_data_digest_error: 0
        iscsi_sequence_error: 0
  • The chap sub-mode displays and alters a table containing authentication information. Calling this sub-mode with the -o show option displays the current contents of the table:

    root@host:~$ iscsiadm -m host -H 1 -C chap -o show
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 0
    host.auth.username_in = <empty>
    host.auth.password_in = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 1
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    
    [...]

    Why show isn't the default option in the context of the chap sub-mode, like it is in many other iscsiadm modes and sub-modes is something i haven't quite understood yet. Maybe it's a security measure to not accidentially divulge sensitive information, maybe it has just been overlooked by the developers.

    Usually, there are already two initial records with the indexes 0 and 1 present on a HBA. As shown in the example above, each authentication record consists of three parameters. A record index host.auth.tbl_idx to reference it, a username host.auth.username or host.auth.username_in and a password host.auth.password or host.auth.password_in. Depending on whether the record is used for outgoing authentication of an initiator against a target or the other way around for incoming authentication of a target against an initiator, the parameter pairs username/password or username_in/password_in are used. Apparently both types of parameter pairs – incoming and outgoing – cannot be mixed together in a single record. My guess is that this isn't a limitation in Open-iSCSI, but rather a limitation in the specification and/or of the underlying hardware.

    New authentication records can be added with the -o new option:

    root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o new
    root@host:~$ iscsiadm -m host -H 1 -C chap -o show
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 0
    host.auth.username_in = <empty>
    host.auth.password_in = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 1
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 2
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    
    [...]

    Parameters of existing authentication records can be set or updated with the -o update option. The particular record to be set or to be updated is selected in with the -x <host.auth.tbl_idx> option, which references the records host.auth.tbl_idx value. Multiple parameters can be set or updated with a single iscsiadm command by calling it with multiple pairs of -n <parameter-name> and -v <parameter-value>:

    root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o update -n host.auth.username -v testuser -n host.auth.password -v testpassword
    root@host:~$ iscsiadm -m host -H 1 -C chap -o show
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 0
    host.auth.username_in = <empty>
    host.auth.password_in = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 1
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 2
    host.auth.username = testuser
    host.auth.password = testpassword
    # END RECORD
    
    [...]

    Finally, existing authentication records can be deleted with the -o delete option:

    root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o delete
  • The flashnode sub-mode displays and alters a table containing information about the iSCSI targets. Calling this sub-mode without any other options displays an overview of the currently configured flash nodes (i.e. targets) on a particular HBA:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode
    qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002

    Similar to the previously mentioned host mode, each output line of the flashnode sub-mode contains an index number for each flash node entry (i.e. iSCSI target), which is shown in the first pair of square brackets. With this index number the individual flash node entries are referenced in all further operations.

    New flash nodes or target entries can be added with the -o new option. This operation also needs the information on whether the target addressed via the flash node will be reached via IPv4 or IPv6 addresses. This is accomplished with the -A ipv4 or -A ipv6 option:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -o new -A ipv4
    Create new flashnode for host 1.
    New flashnode for host 1 added at index 1.

    If the operation of adding a new flash node is successful, the index under which the new flash node is addressable is returned.

    root@host:~$ iscsiadm -m host -H 1 -C flashnode
    qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002
    qla4xxx: [1] 0.0.0.0:3260,0 <empty>

    Unlike authentication records, the flash node or target records contain a lot more parameters. They can be displayed by selecting a specific record by its index with the -x <flashnode_idx> option. The -o show option is the default and is thus optional:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1
    # BEGIN RECORD 2.0-873
    flashnode.session.auto_snd_tgt_disable = 0
    flashnode.session.discovery_session = 0
    flashnode.session.portal_type = ipv4
    flashnode.session.entry_enable = 0
    flashnode.session.immediate_data = 0
    flashnode.session.initial_r2t = 0
    flashnode.session.data_seq_in_order = 1
    flashnode.session.data_pdu_in_order = 1
    flashnode.session.chap_auth_en = 1
    flashnode.session.discovery_logout_en = 0
    flashnode.session.bidi_chap_en = 0
    flashnode.session.discovery_auth_optional = 0
    flashnode.session.erl = 0
    flashnode.session.first_burst_len = 0
    flashnode.session.def_time2wait = 0
    flashnode.session.def_time2retain = 0
    flashnode.session.max_outstanding_r2t = 0
    flashnode.session.isid = 000e1e17da2c
    flashnode.session.tsid = 0
    flashnode.session.max_burst_len = 0
    flashnode.session.def_taskmgmt_tmo = 10
    flashnode.session.targetalias = <empty>
    flashnode.session.targetname = <empty>
    flashnode.session.discovery_parent_idx = 0
    flashnode.session.discovery_parent_type = Sendtarget
    flashnode.session.tpgt = 0
    flashnode.session.chap_out_idx = 2
    flashnode.session.chap_in_idx = 65535
    flashnode.session.username = <empty>
    flashnode.session.username_in = <empty>
    flashnode.session.password = <empty>
    flashnode.session.password_in = <empty>
    flashnode.session.is_boot_target = 0
    flashnode.conn[0].is_fw_assigned_ipv6 = 0
    flashnode.conn[0].header_digest_en = 0
    flashnode.conn[0].data_digest_en = 0
    flashnode.conn[0].snack_req_en = 0
    flashnode.conn[0].tcp_timestamp_stat = 0
    flashnode.conn[0].tcp_nagle_disable = 0
    flashnode.conn[0].tcp_wsf_disable = 0
    flashnode.conn[0].tcp_timer_scale = 0
    flashnode.conn[0].tcp_timestamp_en = 0
    flashnode.conn[0].fragment_disable = 0
    flashnode.conn[0].max_xmit_dlength = 0
    flashnode.conn[0].max_recv_dlength = 65536
    flashnode.conn[0].keepalive_tmo = 0
    flashnode.conn[0].port = 3260
    flashnode.conn[0].ipaddress = 0.0.0.0
    flashnode.conn[0].redirect_ipaddr = 0.0.0.0
    flashnode.conn[0].max_segment_size = 0
    flashnode.conn[0].local_port = 0
    flashnode.conn[0].ipv4_tos = 0
    flashnode.conn[0].ipv6_traffic_class = 0
    flashnode.conn[0].ipv6_flow_label = 0
    flashnode.conn[0].link_local_ipv6 = <empty>
    flashnode.conn[0].tcp_xmit_wsf = 0
    flashnode.conn[0].tcp_recv_wsf = 0
    flashnode.conn[0].statsn = 0
    flashnode.conn[0].exp_statsn = 0
    # END RECORD

    From the various parameters of a flash node or target record, the following are the most relevant in day to day use:

    • flashnode.session.chap_auth_en: Controls whether the initiator should authenticate against the target. This is enabled by default.

    • flashnode.session.bidi_chap_en: Controls whether the target should also authenticate itself against the initiator. This is disabled by default.

    • flashnode.session.targetname: The IQN of the target to be logged into and to be accessed.

    • flashnode.session.chap_out_idx: The index number (i.e. the value of the host.auth.tbl_idx parameter) of the authentication record to be used for authentication of the initiator against the target.

    • flashnode.conn[0].port: The TCP port of the target portal. The default is port 3260.

    • flashnode.conn[0].ipaddress: The IP address of the target portal.

    The parameter pairs flashnode.session.username/flashnode.session.password and flashnode.session.username_in/flashnode.session.password_in are handled differently than all the other parameters. They are not set or updated directly, but are rather filled in automatically. This is done by setting the respective flashnode.session.chap_out_idx or flashnode.session.chap_in_idx parameter to a value which references the index (i.e. the value host.auth.tbl_idx parameter) of an appropriate authentication record.

    Parameters of existing flash nodes or target entries can be set or updated with the -o update option. The particular record of which the parameters are to be set or to be updated is selected with the -x <flashnode_idx> option. This references an index number gathered from the list of flash nodes or a index number returned at the time of creation of a particular flash node. Multiple parameters can be set or updated with a single iscsiadm command by calling it with multiple pairs of -n <parameter-name> and -v <parameter-value>:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o update -n flashnode.session.chap_out_idx -v 2 -n flashnode.session.targetname -v iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003 -n flashnode.conn[0].ipaddress -v 10.0.0.2

    The flash node or target entry updated by this command is shown below in a cut-down fashion for brevity:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1
    # BEGIN RECORD 2.0-873
    [...]
    flashnode.session.chap_auth_en = 1
    flashnode.session.discovery_logout_en = 0
    flashnode.session.bidi_chap_en = 0
    [...]
    flashnode.session.targetname = iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003
    [...]
    flashnode.session.chap_out_idx = 4
    flashnode.session.chap_in_idx = 65535
    flashnode.session.username = testuser
    flashnode.session.username_in = <empty>
    flashnode.session.password = testpassword
    flashnode.session.password_in = <empty>
    [...]
    flashnode.conn[0].port = 3260
    flashnode.conn[0].ipaddress = 10.0.0.2
    flashnode.conn[0].redirect_ipaddr = 0.0.0.0
    [...]
    # END RECORD

    Once configured, login and logout actions can be performed on the flash node (i.e. target) with the repective command options:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o login
    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o logout

    Unfortunately the iscsiadm command will return with a success status on the login and logout actions once it has passed them successfully to the HBA. This does not reflect on the status of the actual login and logout actions subsequently taken by the HBA against the target configured in the respective flash node! To my knowledge there is currently no information passed back to the command line about the result of the login and logout actions at the HBA levels.

    Newly established as well as already existing iSCSI sessions via the hardware initiator which were set up with the login action shown above, are shown along with all other Open-iSCSI session information in the output of the iscsiadm command in its session mode:

    root@host:~$ iscsiadm -m session
    qla4xxx: [1] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash)
    qla4xxx: [2] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash)
    
    [...]

    The fact that the session information is about a iSCSI session established via a hardware initiator is only signified by the flash label in the parentheses at the end of each line. In case of a iSCSI session established via a software initiator, the label in the parentheses reads non-flash.

    Finally, existing flash nodes can be deleted with the -o delete option:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o delete

The records from both, the chap and the flashnode table, are stored in the HBAs flash memory. For the limits on how many entries can be stored in each table, see the specification of the particular HBA.

In my opinion, the integration of management of the QLogic hardware initators into the Open-iSCSI iscsiadm command improves and simplifies the administration and configuration a lot over the previous IOCTL based management method via the QLogic QConvergeConsole (qaucli). It finally opens management access to the QLogic hardware initiators to non-“enterprise Linux distributions” like Debian. Definately a big step in the right direction! The importance of using a current version of Open-iSCSI can – in my experience – not be stressed enough. Building and maintaining a package based on a current version from the projects GitHub repository is definitely worth the effort.

One thing i couldn't get to work though was the incoming part of a bi-directional CHAP authentication. In this scenario, not only does the initiator authenticate itself at the target for a iSCSI session to be successfully established, the target also has to authenticate itself against the initiator. My initial thought was, that a setup with bi-directional CHAP authentication should be easily accomplished by just performing the following three steps:

  1. creating an incoming authentication record with the value of the parameters host.auth.username_in and host.auth.password_in set to the respective values configured at the target storage system.

  2. setting the value of the flash node parameter flashnode.session.bidi_chap_en to 1.

  3. setting the value of the flash node parameter flashnode.session.chap_in_idx to the value of the parameter host.auth.tbl_idx, gathered from the newly created incoming authentication record in step 1.

The first two of the above tasks were indeed easily accomplished. The third one seemed easy too, but turned out to be more of a challenge. Setting the flash node parameter flashnode.session.chap_in_idx to the value of the host.auth.tbl_idx parameter from the previously created incoming authentication record just didn't work. Any attempt to change the default value 65535 failed. Neither was the flashnode.session.username_in/flashnode.session.password_in parameter pair automatically updated with the values from the parameters host.auth.username_in and host.auth.password_in. Oddly enough bi-directional CHAP authentication worked as long as there only was one storage system with one set of incoming authentication credentials! Adding another set of flash nodes for a second storage system with its own set of incoming authentication credentials would cause the bi-directional CHAP authentication to fail for all targets on this second storage system.

Being unable to debug this weird behaviour any further on my own, i turned to the Open-iSCSI mailing list for help. See the thread on the Open-iSCSI mailing list for the details. Don't be confused by the fact that the thread at the mailing list was initially about getting to work the network communication with the use of jumbo frames. This initial issue was resolved for me by switching to a more recent Open-iSCSI version as already mentioned above. My last question there was not answered publicly on the mailing list, but Adheer Chandravanshi from development at QLogic got in touch via email. Here's his explanation of the observed behaviour:

[…]

I see that you have added multiple incoming CHAP entries for both hosts 1 and 2.
But looks like even in case of multiple incoming CHAP entries in flash only the first entry takes effect for that particular host for all
the target node entries in flash.
This seems to be a limitation with the flash target node entries.
So you cannot map different incoming CHAP entry for different target nodes in HBA flash.

In your case, only following incoming CHAP entry will be effective for Host 1 as it's the first incoming chap entry. Same goes for Host
2.
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 2
host.auth.username_in = <username-from-target1>
host.auth.password_in = <password-from-target1>
# END RECORD


Try using only one incoming CHAP entry per host, you can have different outgoing CHAP entries though for each flash target node.

[…]

To sum this up, the QLogic HBAs basically use a first match approach when it comes to the incoming part of a bi-directional CHAP authentication. After finding the first incoming authentication record that is configured, it uses the credentials stored there. Any other – and possibly more suitable – records for incoming authentication are ignored. There's also no way to override this behaviour on a case by case basis via the flash node entries (i.e. iSCSI targets).

In my humble opinion this is a rather serious limitation of the security features in the QLogic hardware initiators. No security policy i have ever encountered in any organisation would allow for the reuse of authentication credentials over different systems. Unfortunately i have no further information as to why the implementation turned out this way. Maybe there was no feature request for this yet, maybe it was just an oversight or maybe there is a limitation in the hardware, preventing a more flexible implementation. Unfortunately my reply to the above email with an inquiry whether such a feature would possibly be implemented in future firmware versions has – up to now – not been answered.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website. More information about cookies