bityard Blog

// QLogic iSCSI HBA and Limitations in Bi-Directional Authentication

In the past the QLogic QConvergeConsole (qaucli) was used as an administration tool for the hardware initiator part of the QLogic 4000 and QLogic 8200 Series network adapters and iSCSI HBAs. Unfortunately this tool was only supported on the so-called “enterprise Linux distributions” like RHEL and SLES. If you were running any other Linux distribution like e.g. Debian or even one of the BSD distributions you were out of luck.

Thankfully QLogic addressed this support issue indirectly, by first announcing and since then by actually moving from a IOCTL based management method towards the Open-iSCSI based management method via the iscsiadm command. The announcement QLogic iSCSI Solution for Transitioning to the Open-iSCSI Model and the User's Guide IOCTL to Open-iSCSI Interface can be found at the QLogic web site.

While trying to test and use the new management method for the hardware initiator via the Open-iSCSI iscsiadm command, i soon ran into the issue that the packaged version of Open-iSCSI, which is shipped with Debian Wheezy, is based on the last stable release v2.0.873 from Open-iSCSI and is thus hopelessly out of date. The Open-iSCSI package shipped with Debian Jessie is a bit better, since it's already based on a newer version from the projects GitHub repository. Still, the Git commit used there dates back to August 23rd of 2013, which is also fairly old. After updating my system to Debian Jessie, i soon decided to rebuild the Open-iSCSI package from a much more recent version from the projects GitHub repository. With this, the management of the QLogic hardware initiators worked very well via the Open-iSCSI iscsiadm command and its now enhanced host mode.

In the host mode there are now three sub-modes chap, flashnode, stats. See man iscsiadm and /usr/share/doc/open-iscsi/README.gz for more details on how to use them. By first calling the host mode without any sub-mode, iscsiadm prints a list of available iSCSI HBAs along with the host number – shown in the first pair of square brackets – associated with each host by the OS kernel:

root@host:~$ iscsiadm -m host
qla4xxx: [1] 10.0.0.5,[84:8f:69:35:fc:70],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2c.4
qla4xxx: [2] 10.0.0.6,[84:8f:69:35:fc:71],<empty> iqn.2000-04.com.qlogic:isp8214.000e1e37da2d.5

The host number – in the above example 1 and 2 – is used in the following examples showing the three sub-modes:

  • The stats sub-mode displays various statistics values, like e.g. TCP/IP and iSCSI sessions, of the given HBA port:

    root@host:~$ iscsiadm -m host -H 1 -C stats
    
    Host Statistics:
        mactx_frames: 2351750
        mactx_bytes: 233065914
        mactx_multicast_frames: 1209409
        mactx_broadcast_frames: 0
        mactx_pause_frames: 0
        mactx_control_frames: 0
        mactx_deferral: 0
        mactx_excess_deferral: 0
        mactx_late_collision: 0
        mactx_abort: 0
        mactx_single_collision: 0
        mactx_multiple_collision: 0
        mactx_collision: 0
        mactx_frames_dropped: 0
        mactx_jumbo_frames: 0
        macrx_frames: 4037613
        macrx_bytes: 1305799553
        macrx_unknown_control_frames: 0
        macrx_pause_frames: 0
        macrx_control_frames: 0
        macrx_dribble: 0
        macrx_frame_length_error: 0
        macrx_jabber: 0
        macrx_carrier_sense_error: 0
        macrx_frame_discarded: 0
        macrx_frames_dropped: 2409752
        mac_crc_error: 0
        mac_encoding_error: 0
        macrx_length_error_large: 0
        macrx_length_error_small: 0
        macrx_multicast_frames: 0
        macrx_broadcast_frames: 0
        iptx_packets: 1694187
        iptx_bytes: 112412836
        iptx_fragments: 0
        iprx_packets: 1446806
        iprx_bytes: 721191324
        iprx_fragments: 0
        ip_datagram_reassembly: 0
        ip_invalid_address_error: 0
        ip_error_packets: 0
        ip_fragrx_overlap: 0
        ip_fragrx_outoforder: 0
        ip_datagram_reassembly_timeout: 0
        ipv6tx_packets: 0
        ipv6tx_bytes: 0
        ipv6tx_fragments: 0
        ipv6rx_packets: 0
        ipv6rx_bytes: 0
        ipv6rx_fragments: 0
        ipv6_datagram_reassembly: 0
        ipv6_invalid_address_error: 0
        ipv6_error_packets: 0
        ipv6_fragrx_overlap: 0
        ipv6_fragrx_outoforder: 0
        ipv6_datagram_reassembly_timeout: 0
        tcptx_segments: 1694187
        tcptx_bytes: 69463008
        tcprx_segments: 1446806
        tcprx_byte: 692255204
        tcp_duplicate_ack_retx: 8
        tcp_retx_timer_expired: 28
        tcprx_duplicate_ack: 0
        tcprx_pure_ackr: 0
        tcptx_delayed_ack: 247594
        tcptx_pure_ack: 247710
        tcprx_segment_error: 0
        tcprx_segment_outoforder: 0
        tcprx_window_probe: 0
        tcprx_window_update: 2248673
        tcptx_window_probe_persist: 0
        ecc_error_correction: 0
        iscsi_pdu_tx: 1446486
        iscsi_data_bytes_tx: 30308
        iscsi_pdu_rx: 1446510
        iscsi_data_bytes_rx: 622721801
        iscsi_io_completed: 253632
        iscsi_unexpected_io_rx: 0
        iscsi_format_error: 0
        iscsi_hdr_digest_error: 0
        iscsi_data_digest_error: 0
        iscsi_sequence_error: 0
    
  • The chap sub-mode displays and alters a table containing authentication information. Calling this sub-mode with the -o show option displays the current contents of the table:

    root@host:~$ iscsiadm -m host -H 1 -C chap -o show
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 0
    host.auth.username_in = <empty>
    host.auth.password_in = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 1
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    
    [...]
    

    Why show isn't the default option in the context of the chap sub-mode, like it is in many other iscsiadm modes and sub-modes is something i haven't quite understood yet. Maybe it's a security measure to not accidentially divulge sensitive information, maybe it has just been overlooked by the developers.

    Usually, there are already two initial records with the indexes 0 and 1 present on a HBA. As shown in the example above, each authentication record consists of three parameters. A record index host.auth.tbl_idx to reference it, a username host.auth.username or host.auth.username_in and a password host.auth.password or host.auth.password_in. Depending on whether the record is used for outgoing authentication of an initiator against a target or the other way around for incoming authentication of a target against an initiator, the parameter pairs username/password or username_in/password_in are used. Apparently both types of parameter pairs – incoming and outgoing – cannot be mixed together in a single record. My guess is that this isn't a limitation in Open-iSCSI, but rather a limitation in the specification and/or of the underlying hardware.

    New authentication records can be added with the -o new option:

    root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o new
    
    root@host:~$ iscsiadm -m host -H 1 -C chap -o show
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 0
    host.auth.username_in = <empty>
    host.auth.password_in = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 1
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 2
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    
    [...]
    

    Parameters of existing authentication records can be set or updated with the -o update option. The particular record to be set or to be updated is selected in with the -x <host.auth.tbl_idx> option, which references the records host.auth.tbl_idx value. Multiple parameters can be set or updated with a single iscsiadm command by calling it with multiple pairs of -n <parameter-name> and -v <parameter-value>:

    root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o update -n host.auth.username -v testuser -n host.auth.password -v testpassword
    
    root@host:~$ iscsiadm -m host -H 1 -C chap -o show
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 0
    host.auth.username_in = <empty>
    host.auth.password_in = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 1
    host.auth.username = <empty>
    host.auth.password = <empty>
    # END RECORD
    # BEGIN RECORD 2.0-873
    host.auth.tbl_idx = 2
    host.auth.username = testuser
    host.auth.password = testpassword
    # END RECORD
    
    [...]
    

    Finally, existing authentication records can be deleted with the -o delete option:

    root@host:~$ iscsiadm -m host -H 1 -C chap -x 2 -o delete
    
  • The flashnode sub-mode displays and alters a table containing information about the iSCSI targets. Calling this sub-mode without any other options displays an overview of the currently configured flash nodes (i.e. targets) on a particular HBA:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode
    qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002
    

    Similar to the previously mentioned host mode, each output line of the flashnode sub-mode contains an index number for each flash node entry (i.e. iSCSI target), which is shown in the first pair of square brackets. With this index number the individual flash node entries are referenced in all further operations.

    New flash nodes or target entries can be added with the -o new option. This operation also needs the information on whether the target addressed via the flash node will be reached via IPv4 or IPv6 addresses. This is accomplished with the -A ipv4 or -A ipv6 option:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -o new -A ipv4
    Create new flashnode for host 1.
    New flashnode for host 1 added at index 1.
    

    If the operation of adding a new flash node is successful, the index under which the new flash node is addressable is returned.

    root@host:~$ iscsiadm -m host -H 1 -C flashnode
    qla4xxx: [0] 10.0.0.2:3260,0 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002
    qla4xxx: [1] 0.0.0.0:3260,0 <empty>
    

    Unlike authentication records, the flash node or target records contain a lot more parameters. They can be displayed by selecting a specific record by its index with the -x <flashnode_idx> option. The -o show option is the default and is thus optional:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1
    # BEGIN RECORD 2.0-873
    flashnode.session.auto_snd_tgt_disable = 0
    flashnode.session.discovery_session = 0
    flashnode.session.portal_type = ipv4
    flashnode.session.entry_enable = 0
    flashnode.session.immediate_data = 0
    flashnode.session.initial_r2t = 0
    flashnode.session.data_seq_in_order = 1
    flashnode.session.data_pdu_in_order = 1
    flashnode.session.chap_auth_en = 1
    flashnode.session.discovery_logout_en = 0
    flashnode.session.bidi_chap_en = 0
    flashnode.session.discovery_auth_optional = 0
    flashnode.session.erl = 0
    flashnode.session.first_burst_len = 0
    flashnode.session.def_time2wait = 0
    flashnode.session.def_time2retain = 0
    flashnode.session.max_outstanding_r2t = 0
    flashnode.session.isid = 000e1e17da2c
    flashnode.session.tsid = 0
    flashnode.session.max_burst_len = 0
    flashnode.session.def_taskmgmt_tmo = 10
    flashnode.session.targetalias = <empty>
    flashnode.session.targetname = <empty>
    flashnode.session.discovery_parent_idx = 0
    flashnode.session.discovery_parent_type = Sendtarget
    flashnode.session.tpgt = 0
    flashnode.session.chap_out_idx = 2
    flashnode.session.chap_in_idx = 65535
    flashnode.session.username = <empty>
    flashnode.session.username_in = <empty>
    flashnode.session.password = <empty>
    flashnode.session.password_in = <empty>
    flashnode.session.is_boot_target = 0
    flashnode.conn[0].is_fw_assigned_ipv6 = 0
    flashnode.conn[0].header_digest_en = 0
    flashnode.conn[0].data_digest_en = 0
    flashnode.conn[0].snack_req_en = 0
    flashnode.conn[0].tcp_timestamp_stat = 0
    flashnode.conn[0].tcp_nagle_disable = 0
    flashnode.conn[0].tcp_wsf_disable = 0
    flashnode.conn[0].tcp_timer_scale = 0
    flashnode.conn[0].tcp_timestamp_en = 0
    flashnode.conn[0].fragment_disable = 0
    flashnode.conn[0].max_xmit_dlength = 0
    flashnode.conn[0].max_recv_dlength = 65536
    flashnode.conn[0].keepalive_tmo = 0
    flashnode.conn[0].port = 3260
    flashnode.conn[0].ipaddress = 0.0.0.0
    flashnode.conn[0].redirect_ipaddr = 0.0.0.0
    flashnode.conn[0].max_segment_size = 0
    flashnode.conn[0].local_port = 0
    flashnode.conn[0].ipv4_tos = 0
    flashnode.conn[0].ipv6_traffic_class = 0
    flashnode.conn[0].ipv6_flow_label = 0
    flashnode.conn[0].link_local_ipv6 = <empty>
    flashnode.conn[0].tcp_xmit_wsf = 0
    flashnode.conn[0].tcp_recv_wsf = 0
    flashnode.conn[0].statsn = 0
    flashnode.conn[0].exp_statsn = 0
    # END RECORD
    

    From the various parameters of a flash node or target record, the following are the most relevant in day to day use:

    • flashnode.session.chap_auth_en: Controls whether the initiator should authenticate against the target. This is enabled by default.

    • flashnode.session.bidi_chap_en: Controls whether the target should also authenticate itself against the initiator. This is disabled by default.

    • flashnode.session.targetname: The IQN of the target to be logged into and to be accessed.

    • flashnode.session.chap_out_idx: The index number (i.e. the value of the host.auth.tbl_idx parameter) of the authentication record to be used for authentication of the initiator against the target.

    • flashnode.conn[0].port: The TCP port of the target portal. The default is port 3260.

    • flashnode.conn[0].ipaddress: The IP address of the target portal.

    The parameter pairs flashnode.session.username/flashnode.session.password and flashnode.session.username_in/flashnode.session.password_in are handled differently than all the other parameters. They are not set or updated directly, but are rather filled in automatically. This is done by setting the respective flashnode.session.chap_out_idx or flashnode.session.chap_in_idx parameter to a value which references the index (i.e. the value host.auth.tbl_idx parameter) of an appropriate authentication record.

    Parameters of existing flash nodes or target entries can be set or updated with the -o update option. The particular record of which the parameters are to be set or to be updated is selected with the -x <flashnode_idx> option. This references an index number gathered from the list of flash nodes or a index number returned at the time of creation of a particular flash node. Multiple parameters can be set or updated with a single iscsiadm command by calling it with multiple pairs of -n <parameter-name> and -v <parameter-value>:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o update -n flashnode.session.chap_out_idx -v 2 -n flashnode.session.targetname -v iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003 -n flashnode.conn[0].ipaddress -v 10.0.0.2
    

    The flash node or target entry updated by this command is shown below in a cut-down fashion for brevity:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1
    # BEGIN RECORD 2.0-873
    [...]
    flashnode.session.chap_auth_en = 1
    flashnode.session.discovery_logout_en = 0
    flashnode.session.bidi_chap_en = 0
    [...]
    flashnode.session.targetname = iqn.2001-05.com.equallogic:0-fe83b6-d63c152cc-7ce004e1102558d4-lun-000003
    [...]
    flashnode.session.chap_out_idx = 4
    flashnode.session.chap_in_idx = 65535
    flashnode.session.username = testuser
    flashnode.session.username_in = <empty>
    flashnode.session.password = testpassword
    flashnode.session.password_in = <empty>
    [...]
    flashnode.conn[0].port = 3260
    flashnode.conn[0].ipaddress = 10.0.0.2
    flashnode.conn[0].redirect_ipaddr = 0.0.0.0
    [...]
    # END RECORD
    

    Once configured, login and logout actions can be performed on the flash node (i.e. target) with the repective command options:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o login
    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o logout
    

    Unfortunately the iscsiadm command will return with a success status on the login and logout actions once it has passed them successfully to the HBA. This does not reflect on the status of the actual login and logout actions subsequently taken by the HBA against the target configured in the respective flash node! To my knowledge there is currently no information passed back to the command line about the result of the login and logout actions at the HBA levels.

    Newly established as well as already existing iSCSI sessions via the hardware initiator which were set up with the login action shown above, are shown along with all other Open-iSCSI session information in the output of the iscsiadm command in its session mode:

    root@host:~$ iscsiadm -m session
    qla4xxx: [1] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash)
    qla4xxx: [2] 10.0.0.2:3260,1 iqn.2001-05.com.equallogic:0-fe83b6-a35c152cc-c72004e10ff558d4-lun-000002 (flash)
    
    [...]
    

    The fact that the session information is about a iSCSI session established via a hardware initiator is only signified by the flash label in the parentheses at the end of each line. In case of a iSCSI session established via a software initiator, the label in the parentheses reads non-flash.

    Finally, existing flash nodes can be deleted with the -o delete option:

    root@host:~$ iscsiadm -m host -H 1 -C flashnode -x 1 -o delete
    

The records from both, the chap and the flashnode table, are stored in the HBAs flash memory. For the limits on how many entries can be stored in each table, see the specification of the particular HBA.

In my opinion, the integration of management of the QLogic hardware initators into the Open-iSCSI iscsiadm command improves and simplifies the administration and configuration a lot over the previous IOCTL based management method via the QLogic QConvergeConsole (qaucli). It finally opens management access to the QLogic hardware initiators to non-“enterprise Linux distributions” like Debian. Definately a big step in the right direction! The importance of using a current version of Open-iSCSI can – in my experience – not be stressed enough. Building and maintaining a package based on a current version from the projects GitHub repository is definitely worth the effort.

One thing i couldn't get to work though was the incoming part of a bi-directional CHAP authentication. In this scenario, not only does the initiator authenticate itself at the target for a iSCSI session to be successfully established, the target also has to authenticate itself against the initiator. My initial thought was, that a setup with bi-directional CHAP authentication should be easily accomplished by just performing the following three steps:

  1. creating an incoming authentication record with the value of the parameters host.auth.username_in and host.auth.password_in set to the respective values configured at the target storage system.

  2. setting the value of the flash node parameter flashnode.session.bidi_chap_en to 1.

  3. setting the value of the flash node parameter flashnode.session.chap_in_idx to the value of the parameter host.auth.tbl_idx, gathered from the newly created incoming authentication record in step 1.

The first two of the above tasks were indeed easily accomplished. The third one seemed easy too, but turned out to be more of a challenge. Setting the flash node parameter flashnode.session.chap_in_idx to the value of the host.auth.tbl_idx parameter from the previously created incoming authentication record just didn't work. Any attempt to change the default value 65535 failed. Neither was the flashnode.session.username_in/flashnode.session.password_in parameter pair automatically updated with the values from the parameters host.auth.username_in and host.auth.password_in. Oddly enough bi-directional CHAP authentication worked as long as there only was one storage system with one set of incoming authentication credentials! Adding another set of flash nodes for a second storage system with its own set of incoming authentication credentials would cause the bi-directional CHAP authentication to fail for all targets on this second storage system.

Being unable to debug this weird behaviour any further on my own, i turned to the Open-iSCSI mailing list for help. See the thread on the Open-iSCSI mailing list for the details. Don't be confused by the fact that the thread at the mailing list was initially about getting to work the network communication with the use of jumbo frames. This initial issue was resolved for me by switching to a more recent Open-iSCSI version as already mentioned above. My last question there was not answered publicly on the mailing list, but Adheer Chandravanshi from development at QLogic got in touch via email. Here's his explanation of the observed behaviour:

[…]

I see that you have added multiple incoming CHAP entries for both hosts 1 and 2.
But looks like even in case of multiple incoming CHAP entries in flash only the first entry takes effect for that particular host for all
the target node entries in flash.
This seems to be a limitation with the flash target node entries.
So you cannot map different incoming CHAP entry for different target nodes in HBA flash.

In your case, only following incoming CHAP entry will be effective for Host 1 as it's the first incoming chap entry. Same goes for Host
2.
# BEGIN RECORD 2.0-873
host.auth.tbl_idx = 2
host.auth.username_in = <username-from-target1>
host.auth.password_in = <password-from-target1>
# END RECORD


Try using only one incoming CHAP entry per host, you can have different outgoing CHAP entries though for each flash target node.

[…]

To sum this up, the QLogic HBAs basically use a first match approach when it comes to the incoming part of a bi-directional CHAP authentication. After finding the first incoming authentication record that is configured, it uses the credentials stored there. Any other – and possibly more suitable – records for incoming authentication are ignored. There's also no way to override this behaviour on a case by case basis via the flash node entries (i.e. iSCSI targets).

In my humble opinion this is a rather serious limitation of the security features in the QLogic hardware initiators. No security policy i have ever encountered in any organisation would allow for the reuse of authentication credentials over different systems. Unfortunately i have no further information as to why the implementation turned out this way. Maybe there was no feature request for this yet, maybe it was just an oversight or maybe there is a limitation in the hardware, preventing a more flexible implementation. Unfortunately my reply to the above email with an inquiry whether such a feature would possibly be implemented in future firmware versions has – up to now – not been answered.

// Integration of Dell PowerConnect M-Series Switches with RANCID

An – almost – out of the box integration of Dell PowerConnect M-Series switches (specifically M6348 and M8024-k) with version 3.2 of the popular, open source switch and router configuration management tool RANCID.

RANCID has, in the previous version 2.3, already been able to integrate Dell PowerConnect switches through the use of the custom ''dlogin'' and ''drancid'' script addons. There are already several howtos on the net on how to use those addons and also slightly tweaked versions of them here and here.

With version 3.2 of RANCID, either build from source or installed pre-packaged e.g. from Debian testing (stretch), everything has gotten much more straight forward. The only modification needed now is a small patch to the script ''/usr/lib/rancid/bin/srancid'', adapting its output postprocessing part to those particular switch models. This modification is necessary in order to remove constantly changing output – in this case uptime and temperature information – from the commands show version and show system, which are issued by srancid for those particular switch models. For the purpose of clarification, here are output samples from the show version:

switch1# show version

System Description................ Dell Ethernet Switch
System Up Time.................... 90 days, 04h:48m:41s
System Contact.................... <contact email>
System Name....................... <system name>
System Location................... <location>
Burned In MAC Address............. F8B1.566E.4AFB
System Object ID.................. 1.3.6.1.4.1.674.10895.3041
System Model ID................... PCM8024-k
Machine Type...................... PowerConnect M8024-k

unit image1      image2      current-active next-active
---- ----------- ----------- -------------- --------------
1    5.1.3.7     5.1.8.2     image2         image2
2    5.1.3.7     5.1.8.2     image2         image2

and show system:

switch1# show system

System Description: Dell Ethernet Switch
System Up Time: 90 days, 04h:48m:19s
System Contact: <contact email>
System Name: <system name>
System Location: <location>
Burned In MAC Address: F8B1.566E.4AFB
System Object ID: 1.3.6.1.4.1.674.10895.3041
System Model ID: PCM8024-k
Machine Type: PowerConnect M8024-k
Temperature Sensors:

Unit     Description       Temperature    Status
                            (Celsius)
----     -----------       -----------    ------
1        System            39             Good
2        System            39             Good

Power Supplies:

Unit  Description    Status

----  -----------  -----------
NA        NA            NA
NA        NA            NA

commands issued on a stacked pair of Dell PowerConnect M8024-k switches. Without the patch to srancid, the lines starting with System Up Time as well as the lines for each temperature sensor unit would always trigger a configuration change in RANCID, even if there was no real change in configuration. The provided patch adds handling and the subsequent removal of those ever-changing values from the configuration which is stored in the RCS used by RANCID.

Aside from this, the only pitfall with Dell PowerConnect switch models is to follow your intuition and use the RANCID device type dell (see man router.db). This device type is intended to be used with D-Link switches OEMed by Dell and will not work with Dell PowerConnect switch models. The correct device type for Dell PowerConnect switch models is smc, though.

For the sake of completeness, here a full step-by-step configuration example for Dell PowerConnect M6348 and M8024-k switches:

  • Add the Debian testing (stretch) packet sources to the APT configuration:

    root@host:~$ echo 'deb http://ftp.debian.org:80/debian testing main contrib non-free' >> /etc/apt/sources.list.d/testing.list
    root@host:~$ apt-get update
    
  • Install RANCID v3.2.x from Debian testing (stretch):

    root@host:~$ apt-get -y install rancid/testing
    

    Optional: In case Subversion should be used as a revision control system (RCS) to store the switch configuration, install it:

    root@host:~$ apt-get -y install subversion
    
  • Download and apply the patch to the script ''/usr/lib/rancid/bin/srancid'' for proper handling of the system information and configuration output of Dell PowerConnect M6348 and M8024-k switches:

    srancid.patch
    --- /usr/lib/rancid/bin/srancid.orig	2015-05-07 00:00:19.000000000 +0200
    +++ /usr/lib/rancid/bin/srancid	2015-09-24 16:02:23.379156524 +0200
    @@ -49,6 +49,8 @@
     #
     # Code tested and working fine on these models:
     #
    +#	DELL PowerConnect M8024 / M8024-k
    +#	DELL PowerConnect M6348
     #	DELL PowerConnect 62xx
     #	DELL 34xx (partially; configuration is incomplete)
     #
    @@ -174,6 +176,7 @@
     sub Dir {
         print STDERR "    In Dir: $_" if ($debug);
     
    +	ProcessHistory("COMMENTS","keysort","D1","!Directory contents:\n");
         while (<INPUT>) {
     	s/^\s+\015//g;
     	tr/\015//d;
    @@ -184,6 +187,7 @@
     
     	ProcessHistory("COMMENTS","keysort","D1","! $_");
         }
    +	ProcessHistory("COMMENTS","keysort","D1","! \n");
         return(0);
     }
     
    @@ -198,6 +202,9 @@
     	# pager remnants like: ^H^H^H    ^H^H^H content
     	s/[\b]+\s*[\b]*//g;
     
    +	# Remove Uptime
    +	/ up time/i && next;
    +
     	ProcessHistory("COMMENTS","keysort","B1","! $_");
         }
         return(0);
    @@ -218,9 +225,12 @@
     	/ up time/i && next;
     
     	# filter temperature sensor info for Dell 6428 stacks
    -	/Temperature Sensors:/ && next;
    +	/Temperature Sensors:/ &&
    +        ProcessHistory("COMMENTS","keysort","C1","\n! $_") &&
    +        next;
     	if (/Temperature \(Celsius\)/ &&
    -	    ProcessHistory("COMMENTS","keysort","C1","! Unit\tStatus\n")) {
    +	    ProcessHistory("COMMENTS","keysort","C1","! Unit\tStatus\n") &&
    +	    ProcessHistory("COMMENTS","keysort","C1","! ----\t------\n")) {
     	    while (<INPUT>) {
     		s/^\s+\015//g;
     		tr/\015//d;
    @@ -228,6 +238,26 @@
     		ProcessHistory("COMMENTS","keysort","C1","! $1\t$2\n");
     		/^\s*$/ && last;
     	    }
    +    } 
    +    # Filter temperature sensor info for Dell M6348 and M8024 blade switches
    +    #
    +    # M6348 and M8024 sample lines:
    +    #   Unit     Description       Temperature    Status
    +    #                               (Celsius)
    +    #   ----     -----------       -----------    ------
    +    #   1        System            39             Good
    +    #   2        System            39             Good
    +    elsif (/Temperature/ &&
    +	    ProcessHistory("COMMENTS","keysort","C1","! Unit\tDescription\tStatus\n") &&
    +	    ProcessHistory("COMMENTS","keysort","C1","! ----\t-----------\t------\n")) {
    +	    while (<INPUT>) {
    +		/\(celsius\)/i && next;
    +		s/^\s+\015//g;
    +		tr/\015//d;
    +		/(\d+)\s+(\w+)\s+\d+\s+(.*)$/ &&
    +		ProcessHistory("COMMENTS","keysort","C1","! $1\t$2\t\t$3\n");
    +		/^\s*$/ && last;
    +	    }
     	}
     
     	/system description: (.*)/i &&
    @@ -242,6 +272,7 @@
     sub ShowVlan {
         print STDERR "    In ShowVlan: $_" if ($debug);
     
    +	ProcessHistory("COMMENTS","keysort","D1","!VLAN definitions:\n");
         while (<INPUT>) {
     	s/^\s+\015//g;
     	tr/\015//d;
    @@ -254,6 +285,7 @@
     	/ up time/i && next;
     	ProcessHistory("COMMENTS","keysort","D1","! $_");
         }
    +	ProcessHistory("COMMENTS","keysort","D1","! \n");
         return(0);
     }
     
    root@host:~$ patch < srancid.patch
    
  • Edit the global RANCID configuration:

    root@host:~$ vi /etc/rancid/rancid.conf
    

    Select the RCS (CVS, SVN or Git) of your choice. In this example SVN is used:

    RCSSYS=svn; export RCSSYS

    Define a name for your Dell PowerConnect device group in the LIST_OF_GROUPS configuration variable. In this example we'll use the name dell-sw:

    LIST_OF_GROUPS="dell-sw"; export LIST_OF_GROUPS
  • Create the cloginrc configuration file, which will contain the login information for your Dell PowerConnect devices and some default values:

    root@host:~$ touch /etc/rancid/cloginrc
    root@host:~$ chmod 660 /etc/rancid/cloginrc
    root@host:~$ chown root:rancid /etc/rancid/cloginrc
    root@host:~$ vi /etc/rancid/cloginrc
    

    Example:

    add user        dell-switch-1           dell-user
    add password    dell-switch-1           <login-passwort> <enable-passwort>
    add noenable    dell-switch-1           0
    
    [...]
    
    add user        *                       <default-user>
    add password    *                       <default-login-passwort>
    add noenable    *                       1
    add method      *                       ssh

    For the device named dell-switch-1 login as user dell-user with the password <login-passwort>. Only for this system change into the enable mode (add noenable dell-switch-1 0) of the switch and use the password <enable-passwort> to do so.

    For all other systems, login as user <default-user> with the password <default-login-passwort>. Deactivate the use of the enable mode on all systems (noenable * 1). The login method for all systems is via SSH.

    Since the cloginrc configuration file is parsed in a first-match fashion, the default values must always be at the bottom of the file.

  • Determine the name of the device type. See man router.db and /etc/rancid/rancid.types.base. In this example and in the general case of Dell PowerConnect M6348 and M8024-k switches the name of the device type is smc:

    root@host:~$ grep smc /etc/rancid/rancid.types.base
    smc;script;srancid
    smc;login;hlogin
    

    From this we can also determine the login script hlogin and the postprocessing script srancid, which will be used for this device type.

  • Change to the user rancid:

    root@host:~$ su - rancid
    
    • Create a symbolic link to the login configuration previously created in /etc/rancid/:

      rancid@host:~$ ln -s /etc/rancid/cloginrc /var/lib/rancid/.cloginrc
      
    • Initialize the directory structure for the RCS (CVS, SVN or Git) selected above. This will automatically be done for each device group configured in the LIST_OF_GROUPS configuration variable. The example shown here only creates the directory structure for the device group dell-sw defined above:

      rancid@host:~$ /usr/lib/rancid/bin/rancid-cvs
      Committed revision 1.
      Checked out revision 1.
      Updating '.':
      At revision 1.
      A         configs
      Adding         configs
      
      Committed revision 2.
      A         router.db
      Adding         router.db
      Transmitting file data .
      Committed revision 3.
      
      rancid@host:~$ find /var/lib/rancid/dell-sw/
      /var/lib/rancid/dell-sw
      /var/lib/rancid/dell-sw/configs
      /var/lib/rancid/dell-sw/router.db
      /var/lib/rancid/dell-sw/routers.all
      /var/lib/rancid/dell-sw/routers.down
      /var/lib/rancid/dell-sw/routers.up
      /var/lib/rancid/dell-sw/.svn
      /var/lib/rancid/dell-sw/.svn/entries
      /var/lib/rancid/dell-sw/.svn/format
      /var/lib/rancid/dell-sw/.svn/pristine
      /var/lib/rancid/dell-sw/.svn/pristine/da
      /var/lib/rancid/dell-sw/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base
      /var/lib/rancid/dell-sw/.svn/tmp
      /var/lib/rancid/dell-sw/.svn/wc.db
      
    • Add switch devices by their hostname to the configuration file router.db of the corresponding device group:

      rancid@host:~$ vi /var/lib/rancid/dell-sw/router.db
      

      In this example the device group dell-sw, the device type smc and the system dell-switch-1:

      dell-switch-1;smc;up;A comment describing the system dell-switch-1
    • Perform a login test with the previously determined login script hlogin on the newly defined system dell-switch-1. The following example output shows the steps that should automatically be performed by the hlogin expect script. No manual intervention should be necessary.

      rancid@host:~$ /usr/lib/rancid/bin/hlogin dell-switch-1
      dell-switch-1
      spawn ssh -c 3des -x -l root dell-switch-1
      The authenticity of host 'dell-switch-1 (<ip address>)' can't be established.
      RSA key fingerprint is <rsa key fingerprint>
      Are you sure you want to continue connecting (yes/no)?
      Host dell-switch-1 added to the list of known hosts.
      yes
      Warning: Permanently added 'dell-switch-1,<ip address>' (RSA) to the list of known hosts.
      root@dell-switch-1's password:
      
      dell-switch-1>enable
      Password:**********
      
      dell-switch-1#
      
    • Finish the login test by manually logging out of the system:

      dell-switch-1#exit
      dell-switch-1>exit
      rancid@host:~$ 
      
    • Manually perform a initial RANCID run to make sure everything works as expected:

      rancid@host:~$ rancid-run
      

      If everything ran successfully, there should now be a file /var/lib/rancid/dell-sw/configs/dell-switch-1 containing the output of the commands show version, show system and show running-config for the system dell-switch-1.

  • Create the email aliases necessary for the proper delivery of the emails generated by RANCID. Again in this example for the device group dell-sw:

    root@host:~$ vi /etc/aliases
    
    rancid-dell-sw:       <email>@<domain>
    rancid-admin-dell-sw: <email>@<domain>

    Recreate your aliases DB. In case postfix is used as an MTA:

    root@host:~$ postalias /etc/aliases
    
  • Enable the RANCID cron jobs. Adjust the execution times and intervals according to your needs:

    root@host:~$ vi /etc/cron.d/rancid
    

Some final words: The contents of the directories /var/lib/rancid/<device group>/ and /var/lib/rancid/<device group>/configs/ are maintained in the RCS – CVS, SVN or Git – of your choice. You can operate on those directories with the usual commands of the selected RCS. There are also some really nice and intuitive web frontends to the RCS of choice. For me, the combination of SVN as RCS and WebSVN as a web frontend worked out very well.

// Dell EqualLogic PS Series - Security

An initial setup is always a good opportunity to take a more in-depth look at systems, while they are not yet in production. In this case i'm looking at one of the advertised security measures of Dell EqualLogic PS Series systems.

We're currently in the process of implementing iSCSI-based Dell EqualLogic PS Series systems. Specifically we're using PS-M4110E and PS-M4110X models in several Dell M1000e blade chassis. The EqualLogic storages and the hosts they provide storage space for are being connected with PowerConnect M8024-K switches. The switches are located in the slots B1 and B2 of the blade chassis and are dedicated for iSCSI traffic with 10 gigabit ethernet.

Although the separation of SAN and LAN could be considered enough security, i was really pleased to read in the Dell EqualLogic Group Manager Administrator's Manual - PS Series Firmware Version 7.0 (110-6152-EN-R1) that the folks at EqualLogic thought otherwise and added further security measures in order to protect the EqualLogic PS Series systems from unauthorized access to its management functions over the iSCSI network. A quote from page 76 of said document reads:

[…]

About Dedicated Management Networks

For increased security, or if your environment requires the separation of management traffic and iSCSI traffic, you can configure a dedicated management network (DMN) that is used only for administrative access to the group. The management network is separate from the network that handles iSCSI traffic to the group.

- Without a dedicated management network (the default configuration), administrators connect to the group IP address for both administrative access to the group and iSCSI initiator access to iSCSI targets (volumes and snapshots).
- With a dedicated management network, administrators do not use the group IP address for administrative access to the group. Instead, administrators connect to the management network address. All iSCSI traffic, including traffic by replication partners, and access to Dell EqualLogic Auto-Snapshot Manager/Linux Edition (ASM/LE), Dell EqualLogic Auto-Snapshot Manager/Microsoft Edition (ASM/ME), and Dell EqualLogic Virtual Storage Manager for VMware (formerly ASM/VE), continues to use the group IP address. SAN Headquarters can connect to the group using either the management network address or the iSCSI address.

[…]

And further on page 78 of the same document:

[…]

When you complete the management network configuration, administrators cannot log in to the group using the group IP address. Instead, administrators must use the new management IP address. Any open GUI or CLI sessions using the group IP address eventually time out and close.
After configuring a dedicated management network, you might need to:
- Inform administrators of the new management network IP address.
- If you run the Group manager GUI as a standalone application and have a shortcut on the computer's desktop, the group address in the shortcut is not updated with the new management address. You must uninstall and then reinstall the GUI application.
- If you are running SAN Headquarters, you must update the group IP address in the application to the dedicated management address. For more information, see the SAN Headquarters documentation.

[…]

Judging from those two sections it would appear that the EqualLogic PS Series systems have no or at least a very small attack surface on the – potentially untrustworthy – host-facing network dedicated to iSCSI traffic. In open systems this is usually achieved by binding the services which are necessary for the management function to a specific network interface, instead of letting them listen on all available interfaces.

Using a DMN and thus separating the management traffic from the iSCSI traffic resulted in our case in the following configuration example:

group1-grp(member_group1)> eth show
Name ifType          ifSpeed    Mtu  Ipaddress                     Status Errors DCB
---- --------------- ---------- ---- ----------------------------- ------ ------ ------
eth0 ethernet-csmacd 10 Gbps    9000 10.0.0.1                      up     0      off
eth1 ethernet-csmacd 100 Mbps   1500 123.123.123.123               up     0      off
group1-grp(member_group1)> eth select 0 show
_______________________________ Eth Information _______________________________
Name: eth0
Status: up
Changed: Mon Jul 20 12:23:15 2015
Type: ethernet-csmacd
DesiredStatus: up
Mtu: 9000
Speed: 10 Gbps
HardwareAddress: B0:83:FE:CC:52:C1
IPAddress: 10.0.0.1
NetMask: 255.255.0.0
IPv6Address:
Description: group1 iSCSI Interface
SupportsManagement: no
ManagementStatus: normal
DCB: off
group1-grp(member_group1)> eth select 1 show
_______________________________ Eth Information _______________________________
Name: eth1
Status: up
Changed: Wed Jul 29 08:33:35 2015
Type: ethernet-csmacd
DesiredStatus: up
Mtu: 1500
Speed: 100 Mbps
HardwareAddress: B0:83:FE:CC:52:C2
IPAddress: 123.123.123.123
NetMask: 255.255.255.0
IPv6Address:
Description: group1 Management Interface
SupportsManagement: only
ManagementStatus: mgmt
DCB: off

Here the lines with the SupportsManagement options could be construed in the way, that management access is not possible over the eth0 interface, which in this configuration is the connection to the host-facing iSCSI network.

From previous lessons learned not to be overly trusty of lofty vendor promises, i decided to double check this with the simple use of the nmap network and port scanner. The results of the TCP and UDP port scans against both the member and the group IP address are shown below.

  • Member IP - TCP scan:

    root@host:~$ nmap -sS -p 0-65535 10.0.0.1
    
    Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-15 11:14 CEST
    Nmap scan report for ******** (10.0.0.1)
    Host is up (0.000097s latency).
    Not shown: 9991 closed ports
    PORT     STATE SERVICE
    21/tcp   open  ftp
    22/tcp   open  ssh
    80/tcp   open  http
    443/tcp  open  https
    2606/tcp open  netmon
    3002/tcp open  exlm-agent
    3003/tcp open  cgms
    3260/tcp open  iscsi
    9876/tcp open  sd
    20002/tcp open  commtact-http
    20003/tcp open  unknown
    25555/tcp open  unknown
    MAC Address: B0:83:FE:CC:52:C1 (Unknown)
    
    Nmap done: 1 IP address (1 host up) scanned in 107.66 seconds
    
  • Member IP - UDP scan:

    root@host:~$ nmap -sU -p 0-65535 10.0.0.1
    
    Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-20 09:27 CEST
    Nmap scan report for ******** (10.0.0.1)
    Host is up (0.000086s latency).
    Not shown: 65532 closed ports
    PORT      STATE         SERVICE
    0/udp     open|filtered unknown
    123/udp   open          ntp
    161/udp   open          snmp
    65519/udp open|filtered unknown
    MAC Address: B0:83:FE:CC:52:C1 (Unknown)
    
    Nmap done: 1 IP address (1 host up) scanned in 2.55 seconds
    
  • Group IP - TCP scan:

    root@host:~$ nmap -sS -p 0-65535 10.0.0.2
    
    Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-15 11:17 CEST
    Nmap scan report for ******** (10.0.0.2)
    Host is up (0.00010s latency).
    Not shown: 9991 closed ports
    PORT     STATE SERVICE
    21/tcp   open  ftp
    22/tcp   open  ssh
    80/tcp   open  http
    443/tcp  open  https
    2606/tcp open  netmon
    3002/tcp open  exlm-agent
    3003/tcp open  cgms
    3260/tcp open  iscsi
    9876/tcp open  sd
    20002/tcp open  commtact-http
    20003/tcp open  unknown
    25555/tcp open  unknown
    MAC Address: B0:83:FE:CC:52:C1 (Unknown)
    
    Nmap done: 1 IP address (1 host up) scanned in 103.71 seconds
    
  • Group IP - UDP scan:

    root@host:~$ nmap -sU -p 0-65535 10.0.0.2
    
    Starting Nmap 6.00 ( http://nmap.org ) at 2015-07-20 09:28 CEST
    Nmap scan report for ******** (10.0.0.2)
    Host is up (0.00013s latency).
    Not shown: 65532 closed ports
    PORT      STATE         SERVICE
    0/udp     open|filtered unknown
    123/udp   open          ntp
    161/udp   open          snmp
    65519/udp open|filtered unknown
    MAC Address: B0:83:FE:CC:52:C1 (Unknown)
    
    Nmap done: 1 IP address (1 host up) scanned in 3.07 seconds
    

These scan results are pretty disappointing with regard to the expected additional security measures mentioned above.

An actual login test via SSH and FTP was successful and i guess a login via Telnet would have been successful too, if the protocol hadn't already been disabled in our configuration. This means there is also no filter on the application layer preventing access to the management functions, which wouldn't have been recognized by the simple port scan above.

As described earlier it shouldn't be too hard binding the services which are necessary for the management function to a specific network interface. I can't help but wonder why EqualLogic didn't follow through with this, especially since it's already described in the product manual. Maybe there were too many feature requests from the customer side or even technical requirements to better integrate the EqualLogic PS Series systems with certain host systems. Even then i wonder why EqualLogic didn't at least provide some means of selectively enabling or disabling the access to the management function from the iSCSI network with some kind of configuration parameter. In any case, the situation at hand – describing one thing and implementing another – seems to be the least favorable one to me.

// IBM SVC and experiences with Real-time Compression - VMware and MS SQL Server

Following up on the previous article (IBM SVC and experiences with Real-time Compression) about our experiences with Real-time compression (RTC) in a IBM SVC environment, we did some additional and further testing with regard to RTC being used in a VMware environment hosting soley database instances of Microsoft SQL Server.

With the implementation of the seperation of all of our Microsoft SQL Server instances into an own VMware ESX cluster – which had to be done due to licensing constraints – an opportunity presented itself, to take a bit more detailed look into how well the SVCs RTC feature would perform on this particular type of workload.

Usually, the contents of a VMware datastore are by their very nature opaque to the lower levels of the storage hierarchy like the SVC which is providing the block storage for the datastores. On the one hand an advantage, this lack of transparency on the other hand also prohibits the SVC from providing a fine grained, per VM set of its usual statistics (e.g. the actual amount of storage used with thin-provisioning or RTC enabled, the amount of storage in the individual storage tier classes, etc.) and performance metrics. By not only moving the compute part of the VMs over to the new VMware ESX hosts in a seperate VMware ESX cluster, but also individually moving the storage part of one VM after the other onto new and empty datastores, which in turn were at the SVC based on newly created, empty VDisks with RTC enabled, we were able to measure the impact of RTC on a per VM level. It was a bit of tedious work and the precision of the numbers has to be taken with a grain of salt because I/O was concurrently happening on the database and operating system level, probably causing additional extends to be allocated and thus some inaccuracy to a small extent.

The environment consists of the following four datastores, four VDisks and a total number of 43 VMs running various versions of Microsoft SQL Server:

VM Datastore SVC VDisk Datastore Size (GB) Number of VMs
san-ssb-sql-oben-35C sas-4096G-35C 4096 16
san-ssb-sql-unten-360 sas-4096G-360 4096 9
san-ssb-sql-unten-361 sas-2048G-361 2048 8
san-ssb-sql-oben-362 sas-2048G-362 2048 10
Sum 12288 43

Usually the VMs are configured with one or two disks or drives for the OS and one or more disks or drives dedicated to the database contents. In some legacy cases, specifically VM #7, #10 and #12 on datastore “san-ssb-sql-oben-35C”, VM #6 on datastore “san-ssb-sql-unten-360” and VM #8 on datastore “san-ssb-sql-unten-361”, there is also a local application instance – in our case different versions of SAP – installed on the system. The operating systems used are Windows Server 2003, 2008R2 and 2012R2.

For each of the four datastores or VDisks, the results of the tests are shown in the following four sections. Each section consists of a table representation of the datastore as it was being populated with VMs in chronological order. The columns of the tables are pretty self-explainatory, with the most interesting columns being “Rel. Compressed Size”, “Rel. Compression Saving” and “Compression Ratio”. If JavaScript is enabled in your browser, click on a tables column headers to sort by that particular column. Each section also contains a graph of the absolute allocation values (table columns “VM Provisioned”, “VM Used” and “SVC Used”) per VM and a graph of the relative allocation values (values from the table column “SVC Used” divided by values from the table column “VM Used” in %) per VM. The latter ones also feature the “Compression Ratio” metric which is graphed against a second y-axis on the right-hand side of the graphs.

  • Datastore: san-ssb-sql-oben-35C
    VDisk: sas-4096G-35C

    VM VM Provisioned (GB) VM Used (GB) SVC Used (GB) SVC Delta Used (GB) Rel. Compressed Size (%) Rel. Compression Saving (%) Compression Ratio SQL Server Version Comment
    Empty Datastore 6.81 MB 6.81 MB
    Number 1 134.11 44.00 20.78 20.10 45.68 54.32 2.19 10.50.4000.0
    Number 2 144.11 55.04 48.45 28.35 51.51 48.49 1.94 11.0.5058.0
    Number 3 94.18 49.79 75.88 27.43 55.09 44.91 1.82 11.0.5058.0
    76.30 0.42
    Number 4 454.11 52.22 92.47 16.17 30.97 69.03 3.23 10.50.4000.0
    Number 5 456.11 391.14 337.74 245.27 62.71 37.29 1.59 10.50.4000.0 SharePoint with file uploads
    Number 6 238.14 156.29 450.89 113.15 72.40 27.60 1.38 10.50.6000.34 HR Applicant Management
    Number 7 488.11 304.32 660.05 209.16 68.73 31.27 1.45 10.50.4000.0 SQL Page Compression
    Number 8 348.09 259.32 703.47 43.42 16.74 83.26 5.97 9.00.4035.00
    Number 9 185.24 149.33 744.68 41.21 27.60 72.40 3.62 10.50.4000.0
    Number 10 292.12 104.31 804.14 59.46 57.00 43.00 1.75 11.0.3349.0 SQL Page Compression
    Number 11 214.11 148.73 882.44 78.30 52.65 47.35 1.90 10.50.2500.0
    Number 12 496.09 463.15 993.53 111.09 23.99 76.01 4.17 9.00.3228.00
    Number 13 202.12 176.96 1058.76 65.23 36.86 63.14 2.71 9.00.4035.00
    Number 14 442.41 363.83 1200.11 141.35 38.85 61.15 2.57 11.0.3412.0 SQL Page Compression
    Number 15 395.38 374.78 1300.00 97.96 26.14 73.86 3.83 11.0.3412.0 SQL Page Compression
    Number 16 739.31 464.30 1520.00 220.00 47.38 52.62 2.11 11.0.3412.0 SQL Page Compression
    Sum or Average 5189.63 3513.51 1497.55 42.10 57.90 2.38

    SVC RTC absolute allocation - Datastore san-ssb-sql-oben-35C; VDisk sas-4096G-35C

    SVC RTC relative allocation - Datastore san-ssb-sql-oben-35C; VDisk sas-4096G-35C

  • Datastore: san-ssb-sql-unten-360
    VDisk: sas-4096G-360

    VM VM Provisioned (GB) VM Used (GB) SVC Used (GB) SVC Delta Used (GB) Rel. Compressed Size (%) Rel. Compression Saving (%) Compression Ratio SQL Server Version Comment
    Empty Datastore 5.47 MB 5.47 MB
    Number 1 377.68 241.77 98.13 97.58 40.36 59.64 2.48 11.0.3412.0 SQL Page Compression
    Number 2 439.29 256.54 198.76 100.63 39.23 60.77 2.55 11.0.3412.0 SQL Page Compression
    Number 3 776.76 507.24 448.04 249.28 49.14 50.86 2.03 11.0.3412.0 SQL Page Compression
    Number 4 195.21 154.10 533.52 85.48 55.47 44.53 1.80 11.0.5058.0
    Number 5 34.11 23.91 546.66 13.14 54.96 45.04 1.82 9.00.4035.00
    Number 6 497.11 394.46 718.35 171.69 43.53 56.47 2.30 10.50.1702.0 SQL Page Compression
    Number 7 219.91 53.87 740.20 21.85 40.56 59.44 2.47 11.0.3000.0
    Number 8 276.11 238.67 850.77 110.57 46.33 53.67 2.16 10.50.2500.0
    Number 9 228.11 188.26 986.25 135.48 71.96 28.04 1.39 10.50.2500.0 HR Applicant Management
    Sum or Average 3044.29 2058.82 986.25 47.90 52.10 2.09

    SVC RTC absolute allocation - Datastore san-ssb-sql-unten-360; VDisk sas-4096G-360

    SVC RTC relative allocation - Datastore san-ssb-sql-unten-360; VDisk sas-4096G-360

  • Datastore: san-ssb-sql-unten-361
    VDisk: sas-2048G-361

    VM VM Provisioned (GB) VM Used (GB) SVC Used (GB) SVC Delta Used (GB) Rel. Compressed Size (%) Rel. Compression Saving (%) Compression Ratio SQL Server Version Comment
    Empty Datastore 6.66 MB 6.66 MB
    Number 1 188.11 162.98 51.46 51.45 31.57 68.43 3.17 10.50.4000.0
    Number 2 180.31 130.82 90.79 39.33 30.06 69.94 3.33 10.50.4000.0
    Number 3 178.11 142.63 131.17 40.38 28.31 71.69 3.53 10.50.4000.0
    Number 4 156.09 127.26 177.30 46.13 36.25 63.75 2.76 9.00.4035.00
    Number 5 100.86 90.40 202.76 25.46 28.16 71.84 3.55 9.00.4035.00
    205.91 3.15
    Number 6 139.09 79.17 233.85 27.94 35.29 64.71 2.83 11.0.5058.0
    Number 7 34.11 24.16 246.37 12.52 51.82 48.18 1.93 9.00.4035.00
    Number 8 696.32 289.09 402.07 155.07 53.86 46.14 1.86 11.0.5058.0 SQL Page Compression
    Sum or Average 1673.00 1046.51 402.07 38.42 61.58 2.60

    SVC RTC absolute allocation - Datastore san-ssb-sql-unten-361; VDisk sas-2048G-361

    SVC RTC relative allocation - Datastore san-ssb-sql-unten-361; VDisk sas-2048G-361

  • Datastore: san-ssb-sql-oben-362
    VDisk: sas-2048G-362

    VM VM Provisioned (GB) VM Used (GB) SVC Used (GB) SVC Delta Used (GB) Rel. Compressed Size (%) Rel. Compression Saving (%) Compression Ratio SQL Server Version Comment
    Empty Datastore
    Number 1 218.09 37.19 19.93 19.93 53.59 46.41 1.87 9.00.4035.00
    Number 2 122.12 112.53 62.32 42.39 37.67 62.33 2.65 10.50.1600.1
    Number 3 273.62 77.70 97.04 34.72 44.68 55.32 2.24 10.50.4000.0
    Number 4 143.09 37.98 116.47 19.43 51.16 48.84 1.95 11.0.5058.0
    Number 5 137.43 115.29 157.45 40.98 35.55 64.45 2.81 10.50.4000.0
    Number 6 80.09 76.04 178.76 21.31 28.02 71.98 3.57 9.00.4035.00
    182.30 3.54
    Number 7 136.09 117.59 229.95 47.65 40.52 59.48 2.47 9.00.4035.00
    Number 8 194.11 53.55 253.40 23.45 43.79 56.21 2.28 10.50.2500.0
    Number 9 78.11 52.60 273.07 19.67 37.40 62.60 2.67
    Number 10 53.11 31.42 287.33 14.26 45.39 54.61 2.20 10.50.4000.0
    Sum or Average 1435.86 711.89 283.82 40.36 59.64 2.48

    SVC RTC absolute allocation - Datastore san-ssb-sql-oben-362; VDisk sas-2048G-362

    SVC RTC relative allocation - Datastore san-ssb-sql-oben-362; VDisk sas-2048G-362

From the detailed tabular representation of each datastore on a per VM basis that is shown above, an aggregated and summarized view, shown in the table below was created. The columns of the table are again pretty self-explainatory. Some of the less interesting columns from the detailed tables above have been dropped in the aggregated result table. The most interesting columns are again “Rel. Compressed Size”, “Rel. Compression Saving” and “Compression Ratio”. Again, if JavaScript is enabled in your browser, click on a tables column headers to sort by that particular column.

VM Datastore SVC VDisk VM Provisioned (GB) VM Used (GB) SVC Used (GB) Rel. Compressed Size (%) Rel. Compression Saving (%) Compression Ratio
san-ssb-sql-oben-35C sas-4096G-35C 5189.63 3513.51 1497.55 42.10 57.90 2.38
san-ssb-sql-unten-360 sas-4096G-360 3044.29 2058.82 986.25 47.90 52.10 2.09
san-ssb-sql-unten-361 sas-2048G-361 976.68 757.42 243.21 32.53 67.47 3.07
san-ssb-sql-oben-362 sas-2048G-362 1435.86 711.89 283.82 40.36 59.64 2.48
Sum or Average 10646.46 7041.64 3010.83 42.76 57.24 2.34

On the whole, the compression results achived with RTC at SVC level are quite good. Considering the fact that VMware already achieved a sizeable amount of storage reduction through its own thin-provisioning algorithms, the average RTC compression ratio of 2.34 – with a minimum of 1.38 and a maximum of 5.97 – at the SVC is even more impressive. This means that on average only 42.76% of the storage space allocated by VMware is actually used on the SVC level to store the data. Taking VMware thin-provisioning into account, the relative amount of actual storage space needed sinks even further down to 28.28%, or in other words a reduction by a factor of 3.54.

Looking at the individual graphs offers several other interesting insights, which can be indicators for further investigations on the database layer. In the graphs showing the relative allocation values and titled “SVC Real-Time Compression Relative Storage Allocation and Compression Ratio on VMware with MS SQL Server”, those data samples with a high “Rel. Compressed Size” value (purple bars) or a low “Compression Ratio” value (red line) are of particular interest in this case. Selecting those systems from the above results, with an – arbitrarily choosen – value of well below 2.00 for the “Compression Ratio” metric, gives us the following list of systems to take a closer look at:

Sample Datastore / VDisk VM # VM Provisioned (GB) VM Used (GB) SVC Delta Used (GB) Compression Ratio Comment
1 san-ssb-sql-oben-35C sas-4096G-35C 3 94.18 49.79 27.43 1.82 SQLIO benchmark file
2 5 456.11 391.14 245.27 1.59 SharePoint with file uploads
3 6 238.14 156.29 113.15 1.38 HR Applicant Management
4 7 488.11 304.32 209.16 1.45 SAP and MSSQL (with SQL Page Compression) on the same VM
5 10 292.12 104.31 59.46 1.75 SAP and MSSQL (with SQL Page Compression) on the same VM
6 san-ssb-sql-unten-360 sas-4096G-360 4 195.21 154.10 85.48 1.80 SQLIO benchmark file
7 5 34.11 23.91 13.14 1.82
8 9 228.11 188.26 135.48 1.39 HR Applicant Management
9 san-ssb-sql-unten-361 sas-2048G-361 7 34.11 24.16 12.52 1.93
10 8 696.32 289.09 155.07 1.86 SAP and MSSQL (with SQL Page Compression) on the same VM
11 san-ssb-sql-oben-362 sas-2048G-362 1 218.09 37.19 19.93 1.87
12 4 143.09 37.98 19.43 1.95

The reasons for why those particular systems are exhibiting a subaverage compression ratio could be manifold. Together with our DBAs we went over the list and came up with the following, not exhaustive nor exclusive list of explainations which are already hinted in the “Comment” columns of the above table:

  • Samples 1 and 6: Those systems were recently used to test the influence of 4k vs. 64k NTFS cluster size on the database I/O performance. Over the course of these tests, a number of dummy files were written with the SQLIO benchmark tool. Altough the dummy files were deleted after the tests were concluded, the storage blocks remained allocated with apparently less compressible contents.

  • Samples 2, 3 and 8: Those systems host the databases for SharePoint (sample #2) and a HR/HCM applicant management and employee training management system (samples #3 and #8). Both applications seem to be designed or configured to store files uploaded to the application as binary blobs into the database. Both usecases suggest that the uploaded file data is to some extent already in some kind of compressed format (e.g. xlsx, pptx, jpeg, png, mpeg, etc.), thus limiting the efficiency of any further compression attempt like RTC. Storing large amounts of unstructured data together with structured data in a database is subject of frequent, ongoing and probably never ending, controversial discussions. There are several points of view to this question and good arguments are being made from either side. Personally and from a purely operational point of view, i'd favour the unstructured binary data not to be stored in a structured database.

  • Sample 4: This system hosts the legacy design described above, where a local application instance of SAP is installed along Microsoft SQL Server on the same system. The database tables of this SAP release already use the SQL Server page compression feature. Although this could have very well been the cause for the subaverage compression ratio, a look into the OS offered a different plausible explanation. Inside the OS the sum of space allocated to the various filesystems is only approximately 202 GB, while at same time the amount of space allocated by VMware is a little more than 304 GB. It seems that at one point in time there was a lot more data of unknown compressibility present on the system. The data has since been deleted, but the previously allocated storage blocks have not been properly reclaimed.

    In order to put this theory to test, the procedure described in VMware KB 2004155 to reclaim the unused, but not de-allocated storage blocks, was applied. The following table shows the storage allocation for the system in sample #4 before and after the reclaim procedure:

    Sample Datastore / VDisk VM # VM Provisioned (GB) VM Used (GB) SVC Delta Used (GB) Compression Ratio Comment
    4 san-rtc-ssb-sql-unten-366 sas-1024G-366 7 488.11 304.32 209.16 1.45 SAP and MSSQL (with SQL Page Compression) on the same VM - before reclaiming unused storage space
    4 7 488.11 172.53 81.29 2.12 SAP and MSSQL (with SQL Page Compression) on the same VM - after reclaiming unused storage space

    The numbers show that the above assumption was correct. After zeroing unused disk space within the VM and reclaiming unused storage blocks by re-thin-provisioning the virtual disks in VMware, the RTC compression ratio rose from a meager 1.45 to a near average 2.12. Now, only 47.11% – instead of the previous 68.73% – of the storage space allocated by VMware is actually used on the SVC level. Again, taking VMware thin-provisioning into account, the relative amount of actual storage space needed sinks even further down to 16.65%, or in other words a reduction by a factor of 6.00.

  • Samples 5 and 10: These systems basically share the same circumstances as the previously examined system in sample #4. The only difference is, that there is still some old, probably unused data sitting in the filesystems. In case of sample #5 it's about 20 GB of compressed SAP installation media and in case of sample #10 it's about 84.3 GB of partially compressed SAP installation media and SAP SUM update files. Of course we'd like to also reclaim this storage space and – hopefully – in the course of this some deleted, but up to now not de-allocated storage blocks, too. By this we hope to see equally good end results as in the case of sample #4. We're currently waiting on clearance from our SAP team to dispose of some of the old and probably unused data.

  • Samples 7, 9, 11 and 12: Systems in this sample category are probably best described as just being “small”. To better illustrate what is meant by this, the following table shows a compiled view of these four samples. The data was taken from the detailed tabular representation of each datastore on a per VM basis that is shown above:

    Sample Datastore / VDisk VM # VM Provisioned (GB) VM Used (GB) SVC Delta Used (GB) Rel. Compressed Size (%) Rel. Compression Saving (%) Compression Ratio Windows Version
    7 san-ssb-sql-unten-360 sas-4096G-360 5 34.11 23.91 13.14 54.96 45.04 1.82 2003
    9 san-ssb-sql-unten-361 sas-2048G-361 7 34.11 24.16 12.52 51.82 48.18 1.93 2003
    11 san-ssb-sql-oben-362 sas-2048G-362 1 218.09 37.19 19.93 53.59 46.41 1.87 2012 R2
    12 4 143.09 37.98 19.43 51.16 48.84 1.95 2012 R2

    Inside the systems there is very little user or installation data besides the base Windows operating system, the SQL Server binaries and some standard management tools which are used in our environment. VMware thin-provisioning already does a significant amount of space reduction. It reduces the storage space that would be used on the SVC level to approximately 24 GB for Windows 2003 and approximately 37.5 GB for Windows 2012 R2 systems. Although the number of samples is rather low, the almost consistent values within each Windows release category suggest that these values represent – in our environment – the minimal amount of storage space needed for each Windows release. The compressed size of approximately 12.8 GB for Windows 2003 and approximately 19.7 GB for Windows 2012 R2 systems as well as the average compression ratio of approximately 1.89 seem to support this theory in the way that they show very similar values within each Windows release category. It appears that – in our environment – these values are the bottom line with regard to minimal allocated storage capacity. There simply seems not to be enough other data with a good compressability in those systems in order to achieve better results with regard to the overall compression ratio.

By and large, we're quite pleased with the results we're seeing from the SVCs real-time compression feature in our VMware and MS SQL server environment. The amount of storage space saved is – especially with VMware thin-provisioning taken into account – significant. Even in those cases where the SQL servers page compression feature is heavily used, we're still seeing quite good compression results with the use of the additional real-time compression at the SVC level. In part, this is very likely due to the fact that real-time compression at the SVC level also covers the VMs data that is outside of the scope which is covered by the SQL servers page compression. On the other hand, this does not entirely suffice to explain the amount of saved storage space – between 50.86% and 73.86%, if we disregard some special cases for which the reasons of a low compression ratio were discussed above – we're seeing in cases where SQL server page compression is used. From the data collected and shown above, it would appear that the algorithms used in SQL server page compression still leave enough redundant data of low entropy for the algorithms used in the RACE at the SVC level to perform rather well.

In general and with regard to performance in particular we have – up to now – not noticed any negative side effects, like e.g. noticeable increases in I/O latency, by using RTC for the various SQL server and SAP systems shown above.

The use of RTC not only promises, but quite matter of factly offers a significant reduction of the storage space needed. Even with seemingly already compressed workloads, like e.g. SQL server page compressed databases, it deals very well. It thus enables a delayed procurement of additional storage resources, lower operational costs (administration, rack space, power, cooling, etc.), a reduced amount of I/O to the backend storage systems and a more efficient use of tier-1 storage like e.g. flash based storage systems.

It is on the other hand also no out-of-the-box, fire-and-forget solution when used sensibly. The selected examples of “interesting” systems, which were shown and discussed above, illustrates very well that there are always usecases which require special attention. They also point out the increased necessity of a good overall, interdisciplinary technical knowledge or a very good and open communication between the organisational units responsible for application, database, operating system, virtualization and storage administration.

Due to the rather old release level of our VMware environment we unfortunately weren't able to cover the interaction of TRIM, UNMAP, VAAI and RTC. It'd be very interesting to see if and how well those storage block reclaimation technologies work together with the SVCs real-time compression feature.

Comments and own experiences are – as always – very welcome!

// Post-it art: Office window decorations - Pac-Man vs. Space Invaders vs. Spider-Pig

Awesome office window decorations made from colored Post-it stickers, featuring Pac-Man, Space_Invaders, Spider-Pig and others:

Post-it office window decorations - Pac-Man, Space Invaders, Spider-Pig

Click on the image for a original sized version. Sorry for the bad quality! Taken with my crappy phone camera on 2015/01/23 on my way to work, approximately from this location.

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information