bityard Blog

// AIX and VIOS Performance with 10 Gigabit Ethernet (Update)

In october last year (10/2013) a colleague and i were given the opportunity to speak at the “IBM AIX User Group South-West” at IBMs german headquarters in Ehningen. My part of the talk was about our experiences with the move from 1GE to 10GE on our IBM Power systems. It was largely based on my pervious post AIX and VIOS Performance with 10 Gigabit Ethernet. During the preperation for the talk, while i reviewed and compiled the previously collected material on the subject, i suddenly realized a disastrous mistake in my methodology. Specifically the netperf program used for the performance tests has a bit of an inadequate heuristic for determining the TCP buffer sizes it will use during a test run. For example:

lpar1:/$ netperf -H 192.168.244.137
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ...
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
16384  16384   16384    10.00    3713.83

with the values “16384” from the last line being the relevant part here.

It turned out, that on AIX the netperf utility will only look at the global, system wide values of the tcp_recvspace and tcp_sendspace tunables set with the no command. In my example this was:

lpar1:/$ no -o tcp_recvspace -o tcp_sendspace
tcp_recvspace = 16384
tcp_sendspace = 16384

The interface specific values “262144” or “524288”, e.g.:

lpar1:/$ ifconfig en0
en0: flags=1e080863,4c0<UP,...,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 192.168.244.239 netmask 0xffffff00 broadcast 192.168.244.255
         tcp_sendspace 262144 tcp_recvspace 262144 tcp_nodelay 1 rfc1323 1

which would override the system wide default values for a specific network interface were never properly picked up by netperf. The configuration of low, global default values and higher, interface-specific values for the tunables was deliberately chosen in order to allow the coexistance of low and high bandwidth adapters in the same system without the risk of interference. Anyway, i guess this is a very good example of the famous saying:

A fool with a tool is still a fool. Grady Booch

For the purpose of another round of performance tests i temporaryly set the global default values for the tunables tcp_recvspace and tcp_sendspace first to “262144” and then to “524288”. The number of tests were reduced to “largesend” and “largesend with JF enabled” since those – as elaborated before – seemed to be the most feasible options. The following table shows the results of the test runs in MBit/sec. The upper value in each cell being from the test with the tunables tcp_recvspace and tcp_sendspace set to “262144”, the lower value in each cell being from the test with the tunables tcp_recvspace and tcp_sendspace set to “524288”.

Options Legend
upper value tcp_sendspace 262144, tcp_recvspace 262144, tcp_nodelay 1, rfc1323 1
lower value tcp_sendspace 524288, tcp_recvspace 524288, tcp_nodelay 1, rfc1323 1
LS largesend
LS,9000 largesend, MTU 9000
Color Legend
⇐ 1 Gbps
> 1 Gbps and ⇐ 3 Gbps
> 3 Gbps and ⇐ 5 Gbps
> 5 Gbps
Managed System P770 DC 1-2 P770 DC 2-2
IP Destination .243 .239 .137
Source Option LS LS,9000 LS LS,9000 LS LS,9000
P770 DC 1-2 .243 LS 9624.47
10494.85
9475.76
10896.52
4114.34
4810.47
3966.08
4537.50
LS,9000 9313.00
10133.70
7549.75
7221.16
4069.94
4538.29
3699.03
4302.74
.239 LS 8484.16
10324.04
8534.12
10237.57
4000.52
4640.57
3834.55
4356.51
LS,9000 8181.51
11539.93
7267.75
6505.37
4010.89
4512.89
4056.08
4410.07
P770 DC 2-2 .137 LS 3937.75
4073.65
3945.29
4386.60
3892.89
4194.19
3906.86
4303.14
LS,9000 3849.58
4742.04
3693.16
4360.35
3423.19
4561.73
3502.64
4317.46

The results show a significant increase in throughput in the cases of intra-managed system network traffic. This is now up to almost wire speed. For the cases of inter-managed system traffic there is also a noticeable, but far less significant increase in throughput. To check if the two TCP buffer sizes selected were still too low, i did a quick check for the network latency in our environment and compared it to the reference values from IBM (see the redbook link below):

Network IBM RTT Reference RTT Measured
1GE Phys 0.144 ms 0.188 ms
10GE Phys 0.062 ms 0.074 ms
10GE Hyp 0.038 ms 0.058 ms
10GE SEA 0.274 ms 0.243 ms
10GE VMXNET3 0.149 ms

Every value besides the measurement within a managed system (“10GE Hyp”) represents the worst case RTT between two separate managed systems, one in each of the two datacenters. For reference purposes i added a test (“10GE VMXNET3”) between two Linux VMs running on two different VMware ESX hosts, one in each of the two datacenters, using the same network hardware as the IBM Power systems. The measured values themselves are well within the range of the reference values given by IBM, so the network setup in general should be fine. A quick calculation of the Bandwidth-delay product for the value of the test case “10GE SEA”:

B x D = 10^10bps x 0.243ms ~ 326kB

confirmed that the value of “524288” for the tunables tcp_recvspace and tcp_sendspace should be sufficient. Very alarming on the other hand is the fact, that processing of simple ICMP packets takes almost twice as long going through the SEA on the IBM Power systems, compared to the network virtualization layer of VMware ESX. Part of the rather sub-par throughput performance measured on IBM Power systems is most likely caused by inefficient processing of network traffic within the SEA and/or VIOS. Rumor has it, that with the advent of 40GE and things getting even worse, the IBM AIX lab finally acknowledged this as an issue and is working on a streamlined, more efficient network stack. Other sources say SR-IOV and with it passing direct access to shared hardware resources to the LPAR systems on a larger scale is considered to be the way forward. Unfortunately this currently conflicts with LPM, so it'll probably be mutually exclusive for most environments.

In any case IBM still seems to have some homework to do on the network front. It'll be interesting to see what the developers come up with in the future.

// Recover from a broken VIO Server Update

If you're – like me – running Ganglia to monitor the performance metrics of your AIX and VIOS systems, it can make sense to disable the, now kind of redundant, default collection method for performance metrics in order not to waste recources and to prevent unnecessary cluttering of the local VIOS filesystem. This can be done via the “xmdaily” entry in /etc/inittab:

/etc/inittab
:xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /home/ios/perf/topas/ -ypersistent=1 2>&1 >/dev/null
$ lsitab xmdaily; echo $?
1

Unfortunately this breaks the VIOS upgrade process, since the good folks at IBM insist on the “xmdaily” inittab entry being vital for a properly functioning VIOS system. Specifically, this is caused by the /usr/lpp/ios.cli/ios.cli.rte/<VERSION>/inst_root/ios.cli.rte.post_u script in the ios.cli.rte package. E.g.:

/usr/lpp/ios.cli/ios.cli.rte/6.1.9.1/inst_root/ios.cli.rte.post_u
507 rt=`/usr/sbin/lsitab xmdaily`
508 if [[ $rt != "" ]]
[...]
528 else
529    echo "Warning: lsitab failed..."
530    exit 1
531 fi

I beg to differ on the whole subject of “xmdaily” being necessary at all, but one could file this under the category of “philosophical differences”. To fail the whole package update with the “exit 1” in line 530 of the above code sample seems to be a bit too harsh though.

So normally i would just put the “xmdaily” entry back into the inittab right before an VIOS update. Unfortunately on the update to 2.2.3.1-FP27 and subsequently to 2.2.3.2-FP27-SP02 i forgot to do that on a few VIOS systems. The result of this negligence were failure messages during the update process (RBAC output omitted for better readability!):

[...]
installp:  APPLYING software for:
        ios.cli.rte 6.1.9.2

. . . . . << Copyright notice for ios.cli >> . . . . . . .
 Licensed Materials - Property of IBM

 5765G3400
   Copyright International Business Machines Corp. 2004, 2014.

 All rights reserved.
 US Government Users Restricted Rights - Use, duplication or disclosure
 restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for ios.cli >>. . . .

sysck: 3001-022 The file
        /usr/ios/utils/part
        was not found.

Start Creating VIOS Authorizations
Completed Creating VIOS Authorizations

Warning: lsitab failed...
update:  Failed while executing the ios.cli.rte.post_u script.

0503-464 installp:  The installation has FAILED for the "root" part
        ios.cli.rte 6.1.9.2

installp:  Cleaning up software for:
        ios.cli.rte 6.1.9.2

[...]

as well as at the end of the update process:

[...]
Installation Summary
--------------------
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
ios.cli.rte                 6.1.9.2         USR         APPLY       SUCCESS
ios.cli.rte                 6.1.9.2         ROOT        APPLY       FAILED
ios.cli.rte                 6.1.9.2         ROOT        CLEANUP     SUCCESS
devices.vtdev.scsi.rte      6.1.9.2         USR         APPLY       SUCCESS
devices.vtdev.scsi.rte      6.1.9.2         ROOT        APPLY       SUCCESS
[...]

This would manifest through the fact that no IOS command would work properly anymore:

IBM Virtual I/O Server

login: padmin
padmin's Password:
Last login: Sun Apr 27 11:40:41 DFT 2014 on /dev/vty0

Access to run command is not valid.

[vios1-p550-b1] /home/padmin $ license -accept
Access to run command is not valid.

[vios2-p550-b1] /home/padmin $ lsmap -all
Access to run command is not valid.

[vios2-p550-b1] /home/padmin $ 

A quick check with a login to the VIOS as root via SSH confirmed that the “root” part of the package ios.cli.rte had been rolled back entirely:

root@vios2-p550-b1:/$ lppchk -v
lppchk:  The following filesets need to be installed or corrected to bring
         the system to a consistent state:

  ios.cli.rte 6.1.9.2                     (usr: APPLIED, root: not installed)

To fix this issue, it worked for me to just run the “ios.cli.rte.pre_u” script of the ios.cli.rte package manually to redefine the now rolled back RBAC authorizations:

root@vios2-p550-b1:/$ SAVEDIR=/tmp/ /usr/lpp/ios.cli/ios.cli.rte/6.1.9.2/inst_root/ios.cli.rte.pre_u
root@vios2-p550-b1:/$ rm /tmp/org_*

For future reference, make sure you have a valid “xmdaily” entry in /etc/inittab before attempting to update any VIOS system:

$ lsitab xmdaily; echo $?
xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /home/ios/perf/topas/ -ypersistent=1 2>&1 >/dev/null
0

// Debian Wheezy on IBM Power LPAR with Multipath Support

Earlier, in the post Debian Wheezy on IBM Power LPAR, i wrote about installing Debian on an IBM Power LPAR. Admittedly the previous post was a bit sparse, especially about the hoops one has to jump through when installing Debian in a multipathed setup like this:

Debian Storage Setup in Dual VIOS Configuration on IBM Power Hardware

To adress this here's a more complete, step by step guide on how to successfully install Debian Wheezy with multipathing enabled. The environment is basically the same as described before in Debian Wheezy on IBM Power LPAR:

  1. Prepare an LPAR. In this example the parition ID is “4”. The VSCSI server adapters on the two VIO servers have the adapter ID “2”.

  2. Prepare the VSCSI disk mappings on the two VIO servers. The hdisks are backed by a SAN / IBM SVC setup. In this example the disk used is hdisk230 on both VIO servers, it has the UUID 332136005076801918127980000000000032904214503IBMfcp.

    Disk mapping on VIOS #1:

    root@vios1-p730-342:/$ lscfg -vl hdisk230
    
      hdisk230         U5877.001.0080249-P1-C9-T1-W5005076801205A2F-L7A000000000000  MPIO FC 2145
    
            Manufacturer................IBM
            Machine Type and Model......2145
            ROS Level and ID............0000
            Device Specific.(Z0)........0000063268181002
            Device Specific.(Z1)........0200646
            Serial Number...............60050768019181279800000000000329
    
    root@vios1-p730-342:/$ /usr/ios/cli/ioscli mkvdev -vdev hdisk230 -vadapter vhost0
    root@vios1-p730-342:/$ /usr/ios/cli/ioscli lsmap -all | less
    
    SVSA            Physloc                                      Client Partition ID
    --------------- -------------------------------------------- ------------------
    vhost0          U8231.E2D.06AB34T-V1-C2                      0x00000004
    
    VTD                   vtscsi0
    Status                Available
    LUN                   0x8100000000000000
    Backing device        hdisk230
    Physloc               U5877.001.0080249-P1-C9-T1-W5005076801205A2F-L7A000000000000
    Mirrored              false
    

    Disk mapping on VIOS #2:

    root@vios2-p730-342:/$ lscfg -vl hdisk230
    
      hdisk230         U5877.001.0080249-P1-C6-T1-W5005076801305A2F-L7A000000000000  MPIO FC 2145
    
            Manufacturer................IBM
            Machine Type and Model......2145
            ROS Level and ID............0000
            Device Specific.(Z0)........0000063268181002
            Device Specific.(Z1)........0200646
            Serial Number...............60050768019181279800000000000329
    
    root@vios2-p730-342:/$ /usr/ios/cli/ioscli mkvdev -vdev hdisk230 -vadapter vhost0
    root@vios2-p730-342:/$ /usr/ios/cli/ioscli lsmap -all | less
    
    SVSA            Physloc                                      Client Partition ID
    --------------- -------------------------------------------- ------------------
    vhost0          U8231.E2D.06AB34T-V2-C2                      0x00000004
    
    VTD                   vtscsi0
    Status                Available
    LUN                   0x8100000000000000
    Backing device        hdisk230
    Physloc               U5877.001.0080249-P1-C6-T1-W5005076801305A2F-L7A000000000000
    Mirrored              false
    
  3. Prepare the installation media, as a virtual target optical (VTOPT) device on one of the VIO servers:

    root@vios1-p730-342:/$ /usr/ios/cli/ioscli mkrep -sp rootvg -size 2G
    root@vios1-p730-342:/$ df -g /var/vio/VMLibrary
    
    Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
    /dev/VMLibrary      2.00      1.24   39%        7     1% /var/vio/VMLibrary
    
    source-system$ scp debian-testing-powerpc-netinst_20131214.iso root@vios1-p730-342:/var/vio/VMLibrary/
    root@vios1-p730-342:/$ /usr/ios/cli/ioscli lsrep
    
    Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
        2041     1269 rootvg                    32256             5120
    
    Name                                                  File Size Optical         Access
    debian-7.2.0-powerpc-netinst.iso                            258 None            rw
    
    root@vios1-p730-342:/$ /usr/ios/cli/ioscli mkvdev -vadapter vhost0 -fbo
    root@vios1-p730-342:/$ /usr/ios/cli/ioscli loadopt -vtd vtopt0 -disk debian-7.2.0-powerpc-netinst.iso
    root@vios1-p730-342:/$ /usr/ios/cli/ioscli lsmap -all | less
    
    SVSA            Physloc                                      Client Partition ID
    --------------- -------------------------------------------- ------------------
    vhost0          U8231.E2D.06AB34T-V1-C2                      0x00000004
    
    VTD                   vtopt0
    Status                Available
    LUN                   0x8200000000000000
    Backing device        /var/vio/VMLibrary/debian-7.2.0-powerpc-netinst.iso
    Physloc
    Mirrored              N/A
    
    VTD                   vtscsi0
    Status                Available
    LUN                   0x8100000000000000
    Backing device        hdisk230
    Physloc               U5877.001.0080249-P1-C9-T1-W5005076801205A2F-L7A000000000000
    Mirrored
    
  4. Boot the LPAR and enter the SMS menu. Select the “SCSI CD-ROM” device as a boot device:

    Select “5. Select Boot Options”:

     PowerPC Firmware
     Version AL770_052
     SMS 1.7 (c) Copyright IBM Corp. 2000,2008 All rights reserved.
    -------------------------------------------------------------------------------
     Main Menu
     1.   Select Language
     2.   Setup Remote IPL (Initial Program Load)
     3.   Change SCSI Settings
     4.   Select Console
     5.   Select Boot Options
    
     -------------------------------------------------------------------------------
     Navigation Keys:
    
                                                 X = eXit System Management Services
     -------------------------------------------------------------------------------
     Type menu item number and press Enter or select Navigation key: 5
    

    Select “1.Select Install/Boot Device”:

     PowerPC Firmware
     Version AL770_052
     SMS 1.7 (c) Copyright IBM Corp. 2000,2008 All rights reserved.
    -------------------------------------------------------------------------------
     Multiboot
     1.   Select Install/Boot Device
     2.   Configure Boot Device Order
     3.   Multiboot Startup <OFF>
     4.   SAN Zoning Support
    
     -------------------------------------------------------------------------------
     Navigation keys:
     M = return to Main Menu
     ESC key = return to previous screen         X = eXit System Management Services
     -------------------------------------------------------------------------------
     Type menu item number and press Enter or select Navigation key: 1
    

    Select “7. List all Devices”:

     PowerPC Firmware
     Version AL770_052
     SMS 1.7 (c) Copyright IBM Corp. 2000,2008 All rights reserved.
    -------------------------------------------------------------------------------
     Select Device Type
     1.   Diskette
     2.   Tape
     3.   CD/DVD
     4.   IDE
     5.   Hard Drive
     6.   Network
     7.   List all Devices
    
     -------------------------------------------------------------------------------
     Navigation keys:
     M = return to Main Menu
     ESC key = return to previous screen         X = eXit System Management Services
     -------------------------------------------------------------------------------
     Type menu item number and press Enter or select Navigation key: 7
    

    Select “2. SCSI CD-ROM”:

     PowerPC Firmware
     Version AL770_052
     SMS 1.7 (c) Copyright IBM Corp. 2000,2008 All rights reserved.
    -------------------------------------------------------------------------------
     Select Device
     Device  Current  Device
     Number  Position  Name
     1.        -      Interpartition Logical LAN
            ( loc=U8231.E2D.06AB34T-V4-C4-T1 )
     2.        3      SCSI CD-ROM
            ( loc=U8231.E2D.06AB34T-V4-C2-T1-L8200000000000000 )
    
     -------------------------------------------------------------------------------
     Navigation keys:
     M = return to Main Menu
     ESC key = return to previous screen         X = eXit System Management Services
     -------------------------------------------------------------------------------
     Type menu item number and press Enter or select Navigation key: 2
    

    Select “2. Normal Mode Boot”:

     PowerPC Firmware
     Version AL770_052
     SMS 1.7 (c) Copyright IBM Corp. 2000,2008 All rights reserved.
    -------------------------------------------------------------------------------
     Select Task
    
    SCSI CD-ROM
        ( loc=U8231.E2D.06AB34T-V4-C2-T1-L8200000000000000 )
    
     1.   Information
     2.   Normal Mode Boot
     3.   Service Mode Boot
    
     -------------------------------------------------------------------------------
     Navigation keys:
     M = return to Main Menu
     ESC key = return to previous screen         X = eXit System Management Services
     -------------------------------------------------------------------------------
     Type menu item number and press Enter or select Navigation key: 2
    

    Select “1. Yes”:

     PowerPC Firmware
     Version AL770_052
     SMS 1.7 (c) Copyright IBM Corp. 2000,2008 All rights reserved.
    -------------------------------------------------------------------------------
     Are you sure you want to exit System Management Services?
     1.   Yes
     2.   No
    
     -------------------------------------------------------------------------------
     Navigation Keys:
    
                                                 X = eXit System Management Services
     -------------------------------------------------------------------------------
     Type menu item number and press Enter or select Navigation key: 1
    
  5. At the Debian installer yaboot boot prompt enter “expert disk-detect/multipath/enable=true” to boot the installer image with multipath support enabled:

    IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
    IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
    IBM IBM IBM IBM IBM IBM                             IBM IBM IBM IBM IBM IBM
    IBM IBM IBM IBM IBM IBM     STARTING SOFTWARE       IBM IBM IBM IBM IBM IBM
    IBM IBM IBM IBM IBM IBM        PLEASE WAIT...       IBM IBM IBM IBM IBM IBM
    IBM IBM IBM IBM IBM IBM                             IBM IBM IBM IBM IBM IBM
    IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
    IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
    -
    Elapsed time since release of system processors: 150814 mins 23 secs
    
    Config file read, 1337 bytes
    
    Welcome to Debian GNU/Linux wheezy!
    
    This is a Debian installation CDROM,
    built on 20131012-15:01.
    
    [...]
    
    Welcome to yaboot version 1.3.16
    Enter "help" to get some basic usage information
    
    boot: expert disk-detect/multipath/enable=true
    
  6. Go through the following installer dialogs:

    Choose language
    Configure the keyboard
    Detect and mount CD-ROM

    and select the configuration items according to your needs and environment.

  7. In the installer dialog:

    Load installer components from CD

    select:

    +----------------+ [?] Load installer components from CD +----------------+
    |                                                                         |
    | All components of the installer needed to complete the install will     |
    | be loaded automatically and are not listed here. Some other             |
    | (optional) installer components are shown below. They are probably      |
    | not necessary, but may be interesting to some users.                    |
    |                                                                         |
    | Note that if you select a component that requires others, those         |
    | components will also be loaded.                                         |
    |                                                                         |
    | Installer components to load:                                           |
    |                                                                         |
    |  [*] multipath-modules-3.2.0-4-powerpc64-di: Multipath support          |
    |  [*] network-console: Continue installation remotely using SSH          |
    |  [*] openssh-client-udeb: secure shell client for the Debian installer  |
    |                                                                         |
    |     <Go Back>                                            <Continue>     |
    |                                                                         |
    +-------------------------------------------------------------------------+
    
  8. Go through the following installer dialogs:

    Detect network hardware
    Configure the network
    Continue installation remotely using SSH
    Set up users and passwords
    Configure the clock

    and configure the debian installer network and user settings according to your environment.

  9. Log into the system via SSH and replace the find-partitions binary on the installer:

    ~ $  /usr/lib/partconf/find-partitions  --flag prep
    Warning: The driver descriptor says the physical block size is 512 bytes, but
    Linux says it is 2048 bytes. A bug has been detected in GNU Parted. Refer to
    the web site of parted http://www.gnu.org/software/parted/parted.html for more
    information of what could be useful for bug submitting! Please email a bug
    report to bug-parted@gnu.org containing at least the version (2.3) and the
    following message: Assertion (disk != NULL) at ../../libparted/disk.c:1548 in
    function ped_disk_next_partition() failed. Aborted
    
    ~ $ mv -i /usr/lib/partconf/find-partitions /usr/lib/partconf/find-partitions.orig
    ~ $ scp user@source-system:~/find-partitions /usr/lib/partconf/
    ~ $ chmod 755 /usr/lib/partconf/find-partitions
    ~ $ /usr/lib/partconf/find-partitions --flag prep
    
  10. While still in the SSH login to the debian installer, manually load the multipath kernel module:

    ~ $ cd /lib/modules/3.2.0-4-powerpc64/kernel/drivers/md
    ~ $ modprobe multipath
    
  11. Return to the virtual terminal and select the installer dialog:

    Detect disks

    and acknowledge the question regarding parameters for the dm-emc kernel module.

  12. Select the installer dialog:

    Partition disks

    and select the:

    Manual

    paritioning method. Then select the entry:

    LVM VG mpatha, LV mpatha - 38.7 GB Linux device-mapper (multipath)

    Do NOT select one of the entries starting with “SCSI”! If there is no entry offering a “mpath” device path, the most likely cause is that the above step no. 10, loading the multipath kernel module, failed. In this case check the file /var/log/syslog for error messages.

    Select “Yes”“ to create a new partition table

    +-----------------------+ [!!] Partition disks +------------------------+
    |                                                                       |
    | You have selected an entire device to partition. If you proceed with  |
    | creating a new partition table on the device, then all current        |
    | partitions will be removed.                                           |
    |                                                                       |
    | Note that you will be able to undo this operation later if you wish.  |
    |                                                                       |
    | Create new empty partition table on this device?                      |
    |                                                                       |
    |     <Go Back>                                       <Yes>    <No>     |
    |                                                                       |
    +-----------------------------------------------------------------------+
    

    Select ”msdos“ as a partition table type:

    +-------------+ [.] Partition disks +--------------+
    |                                                  |
    | Select the type of partition table to use.       |
    |                                                  |
    | Partition table type:                            |
    |                                                  |
    |                     aix                          |
    |                     amiga                        |
    |                     bsd                          |
    |                     dvh                          |
    |                     gpt                          |
    |                     mac                          |
    |                     msdos                        |
    |                     sun                          |
    |                     loop                         |
    |                                                  |
    |     <Go Back>                                    |
    |                                                  |
    +--------------------------------------------------+
    

    Select the ”FREE SPACE“ entry on the mpath device to create new partitions:

    +------------------------+ [!!] Partition disks +-------------------------+
    |                                                                         |
    | This is an overview of your currently configured partitions and mount   |
    | points. Select a partition to modify its settings (file system, mount   |
    | point, etc.), a free space to create partitions, or a device to         |
    | initialize its partition table.                                         |
    |                                                                         |
    |  Guided partitioning                                                    |
    |  Configure software RAID                                                |
    |  Configure the Logical Volume Manager                                   |
    |  Configure encrypted volumes                                            |
    |                                                                         |
    |  LVM VG mpatha, LV mpatha - 38.7 GB Linux device-mapper (multipath      |
    |          pri/log  38.7 GB      FREE SPACE                               |
    |  SCSI1 (0,1,0) (sda) - 38.7 GB AIX VDASD                                |
    |  SCSI2 (0,1,0) (sdb) - 38.7 GB AIX VDASD                                |
    |                                                                         |
    |                                                                         |
    |     <Go Back>                                                           |
    |                                                                         |
    +-------------------------------------------------------------------------+
    

    Select ”Create a new partition“:

    +-----------+ [!!] Partition disks +-----------+
    |                                              |
    | How to use this free space:                  |
    |                                              |
    |  Create a new partition                      |
    |  Automatically partition the free space      |
    |  Show Cylinder/Head/Sector information       |
    |                                              |
    |     <Go Back>                                |
    |                                              |
    +----------------------------------------------+
    

    Enter “8” to create a parition of about 7MB size:

    +-----------------------+ [!!] Partition disks +------------------------+
    |                                                                       |
    | The maximum size for this partition is 38.7 GB.                       |
    |                                                                       |
    | Hint: "max" can be used as a shortcut to specify the maximum size, or |
    | enter a percentage (e.g. "20%") to use that percentage of the maximum |
    | size.                                                                 |
    |                                                                       |
    | New partition size:                                                   |
    |                                                                       |
    | 8____________________________________________________________________ |
    |                                                                       |
    |     <Go Back>                                          <Continue>     |
    |                                                                       |
    +-----------------------------------------------------------------------+
    

    Select ”Primary“ as a partition type:

    +-----+ [!!] Partition disks +------+
    |                                   |
    | Type for the new partition:       |
    |                                   |
    |            Primary                |
    |            Logical                |
    |                                   |
    |     <Go Back>                     |
    |                                   |
    +-----------------------------------+
    

    Select ”Beginning“ as the location for the new parition:

    +------------------------+ [!!] Partition disks +-------------------------+
    |                                                                         |
    | Please choose whether you want the new partition to be created at the   |
    | beginning or at the end of the available space.                         |
    |                                                                         |
    | Location for the new partition:                                         |
    |                                                                         |
    |                              Beginning                                  |
    |                              End                                        |
    |                                                                         |
    |     <Go Back>                                                           |
    |                                                                         |
    +-------------------------------------------------------------------------+
    

    Select ”PowerPC PReP boot partition“ as the parition usage and set the ”Bootable flag“ to ”on“:

    +------------------------+ [!!] Partition disks +-------------------------+
    |                                                                         |
    | You are editing partition #1 of LVM VG mpatha, LV mpatha. No existing   |
    | file system was detected in this partition.                             |
    |                                                                         |
    | Partition settings:                                                     |
    |                                                                         |
    |             Use as:         PowerPC PReP boot partition                 |
    |                                                                         |
    |             Bootable flag:  on                                          |
    |                                                                         |
    |             Copy data from another partition                            |
    |             Delete the partition                                        |
    |             Done setting up the partition                               |
    |                                                                         |
    |     <Go Back>                                                           |
    |                                                                         |
    +-------------------------------------------------------------------------+
    

    Repeat the above steps two more times and create the following parition layout. Then select ”Configure the Logical Volume Manager“ and acknowledge the partition scheme to be written to disk:

    +------------------------+ [!!] Partition disks +-------------------------+
    |                                                                         |
    | This is an overview of your currently configured partitions and mount   |
    | points. Select a partition to modify its settings (file system, mount   |
    | point, etc.), a free space to create partitions, or a device to         |
    | initialize its partition table.                                         |
    |                                                                         |
    |  Guided partitioning                                                    |
    |  Configure software RAID                                                |
    |  Configure the Logical Volume Manager                                   |
    |  Configure encrypted volumes                                            |
    |                                                                         |
    |  LVM VG mpatha, LV mpatha - 38.7 GB Linux device-mapper (multipath      |
    |  >     #1  primary    7.3 MB  B  K                                      |
    |  >     #2  primary  511.7 MB     f  ext4    /boot                       |
    |  >     #5  logical   38.1 GB     K  lvm                                 |
    |  SCSI1 (0,1,0) (sda) - 38.7 GB AIX VDASD                                |
    |                                                                         |
    |     <Go Back>                                                           |
    |                                                                         |
    +-------------------------------------------------------------------------+
    

    Select ”Create volume group“:

    +----------+ [!!] Partition disks +-----------+
    |                                             |
    | Summary of current LVM configuration:       |
    |                                             |
    |  Free Physical Volumes:  1                  |
    |  Used Physical Volumes:  0                  |
    |  Volume Groups:          0                  |
    |  Logical Volumes:        0                  |
    |                                             |
    | LVM configuration action:                   |
    |                                             |
    |      Display configuration details          |
    |      Create volume group                    |
    |      Finish                                 |
    |                                             |
    |     <Go Back>                               |
    |                                             |
    +---------------------------------------------+
    

    Enter a volume group name, here ”vg00“:

    +-----------------------+ [!!] Partition disks +------------------------+
    |                                                                       |
    | Please enter the name you would like to use for the new volume group. |
    |                                                                       |
    | Volume group name:                                                    |
    |                                                                       |
    | vg00_________________________________________________________________ |
    |                                                                       |
    |     <Go Back>                                          <Continue>     |
    |                                                                       |
    +-----------------------------------------------------------------------+
    

    Select the device ”/dev/mapper/mpathap5“ created above as a physical volume for the volume group:

    +-----------------+ [!!] Partition disks +------------------+
    |                                                           |
    | Please select the devices for the new volume group.       |
    |                                                           |
    | You can select one or more devices.                       |
    |                                                           |
    | Devices for the new volume group:                         |
    |                                                           |
    |      [ ] /dev/mapper/mpathap1           (7MB)             |
    |      [ ] /dev/mapper/mpathap2           (511MB; ext4)     |
    |      [*] /dev/mapper/mpathap5           (38133MB)         |
    |      [ ] (38133MB; lvm)                                   |
    |                                                           |
    |     <Go Back>                              <Continue>     |
    |                                                           |
    +-----------------------------------------------------------+
    

    Create logical volumes according to your needs and environment. See the following list as an example. Note that the logical volume for the root filesystem resides inside the volume group:

    +----------------------+ [!!] Partition disks +----------------------+
    |                                                                    |
    |                     Current LVM configuration:                     |
    | Unallocated physical volumes:                                      |
    |   * none                                                           |
    |                                                                    |
    | Volume groups:                                                     |
    |   * vg00                                                 (38130MB) |
    |     - Uses physical volume:         /dev/mapper/mpathap5 (38130MB) |
    |     - Provides logical volume:      home                 (1023MB)  |
    |     - Provides logical volume:      root                 (1023MB)  |
    |     - Provides logical volume:      srv                  (28408MB) |
    |     - Provides logical volume:      swap                 (511MB)   |
    |     - Provides logical volume:      tmp                  (1023MB)  |
    |     - Provides logical volume:      usr                  (4093MB)  |
    |     - Provides logical volume:      var                  (2046MB)  |
    |                                                                    |
    |                             <Continue>                             |
    |                                                                    |
    +--------------------------------------------------------------------+
    

    Assign filesystems to each of the logical volumes created above. See the following list as an example. Note that only the “PowerPC PReP boot partition” and the partition for the ”/boot“ filesystem resides outside the volume group:

    +------------------------+ [!!] Partition disks +-------------------------+
    |                                                                         |
    | This is an overview of your currently configured partitions and mount   |
    | points. Select a partition to modify its settings (file system, mount   |
    | point, etc.), a free space to create partitions, or a device to         |
    | initialize its partition table.                                         |
    |                                                                         |
    |  Guided partitioning                                                    |
    |  Configure software RAID                                                |
    |  Configure the Logical Volume Manager                                   |
    |  Configure encrypted volumes                                            |
    |                                                                         |
    |  LVM VG mpatha, LV mpatha - 38.7 GB Linux device-mapper (multipath      |
    |  >     #1  primary    7.3 MB  B  K                                      |
    |  >     #2  primary  511.7 MB        ext4                                |
    |  >     #5  logical   38.1 GB     K  lvm                                 |
    |  LVM VG mpathap1, LV mpathap1 - 7.3 MB Linux device-mapper (linear      |
    |  LVM VG mpathap2, LV mpathap2 - 511.7 MB Linux device-mapper (line      |
    |  >     #1           511.7 MB     K  ext4    /boot                       |
    |  LVM VG mpathap5, LV mpathap5 - 38.1 GB Linux device-mapper (linea      |
    |  LVM VG vg00, LV home - 1.0 GB Linux device-mapper (linear)             |
    |  >     #1             1.0 GB     f  ext4    /home                       |
    |  LVM VG vg00, LV root - 1.0 GB Linux device-mapper (linear)             |
    |  >     #1             1.0 GB     f  ext4    /                           |
    |  LVM VG vg00, LV srv - 28.4 GB Linux device-mapper (linear)             |
    |  >     #1            28.4 GB     f  ext4    /srv                        |
    |  LVM VG vg00, LV swap - 511.7 MB Linux device-mapper (linear)           |
    |  >     #1           511.7 MB     f  swap    swap                        |
    |  LVM VG vg00, LV tmp - 1.0 GB Linux device-mapper (linear)              |
    |  >     #1             1.0 GB     f  ext4    /tmp                        |
    |  LVM VG vg00, LV usr - 4.1 GB Linux device-mapper (linear)              |
    |  >     #1             4.1 GB     f  ext4    /usr                        |
    |  LVM VG vg00, LV var - 2.0 GB Linux device-mapper (linear)              |
    |  >     #1             2.0 GB     f  ext4    /var                        |
    |  SCSI1 (0,1,0) (sda) - 38.7 GB AIX VDASD                                |
    |  SCSI2 (0,1,0) (sdb) - 38.7 GB AIX VDASD                                |
    |                                                                         |
    |  Undo changes to partitions                                             |
    |  Finish partitioning and write changes to disk                          |
    |                                                                         |
    |     <Go Back>                                                           |
    |                                                                         |
    +-------------------------------------------------------------------------+
    

    Select ”Finish partitioning and write changes to disk“.

  13. Select the installer dialog:

    Install the base system

    Choose a Linux kernel to be installed and select a “targeted” initrd to be created.

  14. Log into the system via SSH again and manually install the necessary multipath packages in the target system:

    ~ $ chroot /target /bin/bash
    root@ststdebian01:/$ mount /sys
    root@ststdebian01:/$ mount /proc
    root@ststdebian01:/$ apt-get install multipath-tools multipath-tools-boot
    
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following extra packages will be installed:
      kpartx libaio1
    The following NEW packages will be installed:
      kpartx libaio1 multipath-tools multipath-tools-boot
    0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
    Need to get 0 B/258 kB of archives.
    After this operation, 853 kB of additional disk space will be used.
    Do you want to continue [Y/n]? y
    Preconfiguring packages ...
    Can not write log, openpty() failed (/dev/pts not mounted?)
    Selecting previously unselected package libaio1:powerpc.
    (Reading database ... 14537 files and directories currently installed.)
    Unpacking libaio1:powerpc (from .../libaio1_0.3.109-3_powerpc.deb) ...
    Selecting previously unselected package kpartx.
    Unpacking kpartx (from .../kpartx_0.4.9+git0.4dfdaf2b-7~deb7u1_powerpc.deb) ...
    Selecting previously unselected package multipath-tools.
    Unpacking multipath-tools (from .../multipath-tools_0.4.9+git0.4dfdaf2b-7~deb7u1_powerpc.deb) ...
    Selecting previously unselected package multipath-tools-boot.
    Unpacking multipath-tools-boot (from .../multipath-tools-boot_0.4.9+git0.4dfdaf2b-7~deb7u1_all.deb) ...
    Processing triggers for man-db ...
    Can not write log, openpty() failed (/dev/pts not mounted?)
    Setting up libaio1:powerpc (0.3.109-3) ...
    Setting up kpartx (0.4.9+git0.4dfdaf2b-7~deb7u1) ...
    Setting up multipath-tools (0.4.9+git0.4dfdaf2b-7~deb7u1) ...
    [ ok ] Starting multipath daemon: multipathd.
    Setting up multipath-tools-boot (0.4.9+git0.4dfdaf2b-7~deb7u1) ...
    update-initramfs: deferring update (trigger activated)
    Processing triggers for initramfs-tools ...
    update-initramfs: Generating /boot/initrd.img-3.2.0-4-powerpc64
    cat: /sys/devices/vio/modalias: No such device
    
    root@ststdebian01:/$ vi /etc/multipath.conf
    
    defaults {
        getuid_callout  "/lib/udev/scsi_id -g -u -d /dev/%n"
        no_path_retry  10
        user_friendly_names yes
    }
    blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z][[0-9]*]"
        devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
    }
    multipaths {
        multipath {
            wwid    360050768019181279800000000000329
        }
    }
    

    Adjust the configuration items in /etc/multipath.conf according to your needs and environment, especially the disk UUID value in the wwid field.

  15. Return to the virtual terminal and go through the following installer dialogs:

    Configure the package manager
    Select and install software

    and select the configuration items according to your needs and environment.

  16. Select the installer dialog:

    Install yaboot on a hard disk

    and select /dev/mapper/mpathap1 as the device where the yaboot boot loader should be installed:

    +-----------------+ [!!] Install yaboot on a hard disk +------------------+
    |                                                                         |
    | Yaboot (the Linux boot loader) needs to be installed on a hard disk     |
    | partition in order for your system to be bootable.  Please choose the   |
    | destination partition from among these partitions that have the         |
    | bootable flag set.                                                      |
    |                                                                         |
    | Warning: this will erase all data on the selected partition!            |
    |                                                                         |
    | Device for boot loader installation:                                    |
    |                                                                         |
    |                         /dev/mapper/mpathap1                            |
    |                                                                         |
    |     <Go Back>                                                           |
    |                                                                         |
    +-------------------------------------------------------------------------+
    
  17. After the yaboot boot loader has been successfully installed, log into the system via SSH again. Change the device path for the boot partition, to reflect the naming scheme that will be present on the target system once it is running:

    ~ $ chroot /target /bin/bash
    root@ststdebian01:/$ vi /etc/fstab
    
    #/dev/mapper/mpathap2 /boot           ext4    defaults        0       2
    /dev/mapper/mpatha-part2 /boot           ext4    defaults        0       2
    

    Check the initrd for the inclusion of the multipath tools installed in the previous step:

    root@ststdebian01:/$ gzip -dc /boot/initrd.img-3.2.0-4-powerpc64 | cpio -itvf - | grep -i multipath
    
    -rwxr-xr-x   1 root     root        18212 Oct 15 05:16 sbin/multipath
    -rw-r--r--   1 root     root          392 Dec 16 18:03 etc/multipath.conf
    -rw-r--r--   1 root     root          328 Oct 15 05:16 lib/udev/rules.d/60-multipath.rules
    drwxr-xr-x   2 root     root            0 Dec 16 18:03 lib/multipath
    -rw-r--r--   1 root     root         5516 Oct 15 05:16 lib/multipath/libpriohp_sw.so
    -rw-r--r--   1 root     root         9620 Oct 15 05:16 lib/multipath/libcheckrdac.so
    -rw-r--r--   1 root     root         9632 Oct 15 05:16 lib/multipath/libpriodatacore.so
    -rw-r--r--   1 root     root        13816 Oct 15 05:16 lib/multipath/libchecktur.so
    -rw-r--r--   1 root     root         9664 Oct 15 05:16 lib/multipath/libcheckdirectio.so
    -rw-r--r--   1 root     root         5516 Oct 15 05:16 lib/multipath/libprioemc.so
    -rw-r--r--   1 root     root         9576 Oct 15 05:16 lib/multipath/libcheckhp_sw.so
    -rw-r--r--   1 root     root         9616 Oct 15 05:16 lib/multipath/libpriohds.so
    -rw-r--r--   1 root     root         9672 Oct 15 05:16 lib/multipath/libprioiet.so
    -rw-r--r--   1 root     root         5404 Oct 15 05:16 lib/multipath/libprioconst.so
    -rw-r--r--   1 root     root         9608 Oct 15 05:16 lib/multipath/libcheckemc_clariion.so
    -rw-r--r--   1 root     root         5516 Oct 15 05:16 lib/multipath/libpriordac.so
    -rw-r--r--   1 root     root         5468 Oct 15 05:16 lib/multipath/libpriorandom.so
    -rw-r--r--   1 root     root         9620 Oct 15 05:16 lib/multipath/libprioontap.so
    -rw-r--r--   1 root     root         5488 Oct 15 05:16 lib/multipath/libcheckreadsector0.so
    -rw-r--r--   1 root     root         9656 Oct 15 05:16 lib/multipath/libprioweightedpath.so
    -rw-r--r--   1 root     root         9692 Oct 15 05:16 lib/multipath/libprioalua.so
    -rw-r--r--   1 root     root         9592 Oct 15 05:16 lib/multipath/libcheckcciss_tur.so
    -rw-r--r--   1 root     root        43552 Sep 20 06:13 lib/modules/3.2.0-4-powerpc64/kernel/drivers/md/dm-multipath.ko
    -rw-r--r--   1 root     root       267620 Oct 15 05:16 lib/libmultipath.so.0
    -rwxr-xr-x   1 root     root          984 Oct 14 22:25 scripts/local-top/multipath
    -rwxr-xr-x   1 root     root          624 Oct 14 22:25 scripts/init-top/multipath
    

    If the multipath tools, which were installed in the previous step, are missing, update the initrd to include them:

    root@ststdebian01:/$ update-initramfs -u -k all
    

    Check the validity of the yaboot configuration, especially the configuration items marked with ”⇐= !!!“:

    root@ststdebian01:/$ vi /etc/yaboot.conf
    
    boot="/dev/mapper/mpathap1"         <== !!!
    partition=2
    root="/dev/mapper/vg00-root"        <== !!!
    timeout=50
    install=/usr/lib/yaboot/yaboot
    enablecdboot
    
    image=/vmlinux                      <== !!!
            label=Linux
            read-only
            initrd=/initrd.img          <== !!!
    
    image=/vmlinux.old                  <== !!!
            label=old
            read-only
            initrd=/initrd.img.old      <== !!!
    

    If changes were made to the yaboot configuration, install the boot loader again:

    root@ststdebian01:/$ ybin -v
    

    In order to have the root filesystem on LVM and yaboot still be able to access its necessary configuration, the yaboot.conf has to be placed in a subdirectory ”/etc/“ on a non-LVM device. See HOWTO-Booting with Yaboot on PowerPC for a detailed explaination which is also true for the yaboot.conf:

    It's worth noting that yaboot locates the kernel image within a partition's filesystem without regard to where that partition will eventually be mounted within the Linux root filesystem. So, for example, if you've placed a kernel image or symlink at /boot/vmlinux, but /boot is actually a separate partition on your system, then the image path for yaboot will just be image=/vmlinux.

    In this case the kernel and initrd images reside on the ”/boot“ filesystem, so the path ”/etc/yaboot.conf“ has to be placed below ”/boot“. A symlink pointing from ”/etc/yaboot.conf“ to ”/boot/etc/yaboot.conf“ is added for the convenience of tools that expect it still to be in ”/etc“:

    root@ststdebian01:/$ mkdir /boot/etc
    root@ststdebian01:/$ mv -i /etc/yaboot.conf /boot/etc/
    root@ststdebian01:/$ ln -s /boot/etc/yaboot.conf /etc/yaboot.conf
    root@ststdebian01:/$ ls -al /boot/etc/yaboot.conf /etc/yaboot.conf
    
    -rw-r--r-- 1 root root 547 Dec 16 18:20 /boot/etc/yaboot.conf
    lrwxrwxrwx 1 root root  21 Dec 16 18:48 /etc/yaboot.conf -> /boot/etc/yaboot.conf
    
  18. Select the installer dialog:

    Finish the installation
  19. After a successful reboot, change the yaboot ”boot=“ configuration directive (marked with ”⇐= !!!“) to use the device path name present in the now running system:

    root@ststdebian01:/$ vi /etc/yaboot.conf
    
    boot="/dev/mapper/mpatha-part1"     <== !!!
    partition=2
    root="/dev/mapper/vg00-root"
    timeout=50
    install=/usr/lib/yaboot/yaboot
    enablecdboot
    
    image=/vmlinux
            label=Linux
            read-only
            initrd=/initrd.img
    
    image=/vmlinux.old
            label=old
            read-only
            initrd=/initrd.img.old
    
    root@ststdebian01:/$ ybin -v
    
  20. Check the multipath status of the newly installed, now running system:

    root@ststdebian01:~$ multipath -ll
    
    mpatha (360050768019181279800000000000329) dm-0 AIX,VDASD
    size=36G features='1 queue_if_no_path' hwhandler='0' wp=rw
    `-+- policy='round-robin 0' prio=0 status=active
      |- 0:0:1:0 sda 8:0  active ready running
      `- 1:0:1:0 sdb 8:16 active ready running
    
    root@ststdebian01:~$ df -h
    
    Filesystem                Size  Used Avail Use% Mounted on
    rootfs                    961M  168M  745M  19% /
    udev                       10M     0   10M   0% /dev
    tmpfs                     401M  188K  401M   1% /run
    /dev/mapper/vg00-root     961M  168M  745M  19% /
    tmpfs                     5.0M     0  5.0M   0% /run/lock
    tmpfs                     801M     0  801M   0% /run/shm
    /dev/mapper/mpatha-part2  473M   28M  421M   7% /boot
    /dev/mapper/vg00-home     961M   18M  895M   2% /home
    /dev/mapper/vg00-srv       27G  172M   25G   1% /srv
    /dev/mapper/vg00-tmp      961M   18M  895M   2% /tmp
    /dev/mapper/vg00-usr      3.8G  380M  3.2G  11% /usr
    /dev/mapper/vg00-var      1.9G  191M  1.6G  11% /var
    
    root@ststdebian01:~$ mount | grep mapper
    
    /dev/mapper/vg00-root on / type ext4 (rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered)
    /dev/mapper/mpatha-part2 on /boot type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered)
    /dev/mapper/vg00-home on /home type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered)
    /dev/mapper/vg00-srv on /srv type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered)
    /dev/mapper/vg00-tmp on /tmp type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered)
    /dev/mapper/vg00-usr on /usr type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered)
    /dev/mapper/vg00-var on /var type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered)
    
    root@ststdebian01:~$ pvdisplay
    
      --- Physical volume ---
      PV Name               /dev/mapper/mpatha-part5
      VG Name               vg00
      PV Size               35.51 GiB / not usable 3.00 MiB
      Allocatable           yes (but full)
      PE Size               4.00 MiB
      Total PE              9091
      Free PE               0
      Allocated PE          9091
      PV UUID               gO0PTK-xi9b-H4om-BRgj-rEy2-NSzC-UokjEw
    

Although it is a rather manual and error prone process, it is nonetheless possible to install Debian Wheezy with multipathing enabled from the start. Another and probably easier method would be to map the disk through only one VIO server during the installation phase and add the second mapping as well as the necessary multipath configuration after the installed system has been successfully brought up.

Some of the manual steps above should IMHO be handled automatically by the Debian installer. For example, why the missing packages multipath-tools and multipath-tools-boot aren't installed automatically eludes me. So there seems to be at least some room for improvement.

First tests with the Debian installer of the upcoming “Jessie” release were promising, but the completion of an install fails at the moment due to unresolved package dependencies. But that'll be the subject for future posts … ;-)

// Ganglia Fibre Channel Power/Attenuation Monitoring on AIX and VIO Servers

Although usually only available upon request via IBM support, efc_power is quite the handy tool when it comes to debugging or narrowing down fibre channel link issues. It provides information about the transmit and receive, power and attenuation values for a given FC port on a AIX or VIO server. Fortunately the output of efc_power:

$ /opt/freeware/bin/efc_power /dev/fscsi2
TX: 1232 -> 0.4658 mW, -3.32 dBm
RX: 10a9 -> 0.4265 mW, -3.70 dBm

is very parser-friendly, so it can very easily be read by a script for further processing. In this case further processing means a continuous Ganglia monitoring of the fibre channel transmit and receive, power and attenuation values for each FC port on a AIX or VIO server. This is accomplished by the two RPM packages ganglia-addons-aix and ganglia-addons-aix-scripts:

RPM packages

Source RPM packages

FilenameFilesizeLast modified
ganglia-addons-aix-0.1-1.src.rpm6.6 KiB2013/07/30 09:41
ganglia-addons-aix-0.1-1.src.rpm.sha1sum75.0 B2013/07/30 09:41

The package ganglia-addons-aix-scripts is to be installed on the AIX or VIO server which has the FC adapter installed. It depends on the aaa_base package for the efc_power binary and on the ganglia-addons-base package, specifically on the cronjob (/opt/freeware/etc/run_parts/conf.d/ganglia-addons.sh) defined by this package. In the context of this cronjob all avaliable scripts in the directory /opt/freeware/libexec/ganglia-addons/ are executed. For this specific Ganglia addon an iteration over all fscsi devices in the system is done and efc_power is called for each fscsi device. Devices can be excluded by assigning a regex pattern to the BLACKLIST variable in the configuration file /opt/freeware/etc/ganglia-addons/ganglia-addons-efc_power.cfg. The output of each efc_power call is parsed and via the gmetric command fed into a Ganglia monitoring system that has to be already set up.

The package ganglia-addons-aix is to be installed on the host running the Ganglia webinterface. It contains templates for the customization of the FC power and attenuation metrics within the Ganglia Web 2 interface. See the README.templates file for further installation instructions. Here are samples of the two graphs created with those Ganglia monitoring templates:

Example of FC power and attenuation with a bad cable

In the section “1” of the graphs, the receive attenuation on FC port fscsi2 was about -7.7 dBm, which means that of the 476.6 uW sent from the Brocade switchport:

$ sfpshow 1/10

Identifier:  3    SFP
Connector:   7    LC
Transceiver: 540c404000000000 200,400,800_MB/s M5,M6 sw Short_dist
Encoding:    1    8B10B
Baud Rate:   85   (units 100 megabaud)
Length 9u:   0    (units km)
Length 9u:   0    (units 100 meters)
Length 50u:  5    (units 10 meters)
Length 62.5u:2    (units 10 meters)
Length Cu:   0    (units 1 meter)
Vendor Name: BROCADE
Vendor OUI:  00:05:1e
Vendor PN:   57-1000012-01
Vendor Rev:  A
Wavelength:  850  (units nm)
Options:     003a Loss_of_Sig,Tx_Fault,Tx_Disable
BR Max:      0
BR Min:      0
Serial No:   UAF1112600001JW
Date Code:   110619
DD Type:     0x68
Enh Options: 0xfa
Status/Ctrl: 0x82
Alarm flags[0,1] = 0x5, 0x40
Warn Flags[0,1] = 0x5, 0x40
                                          Alarm                  Warn
                                      low        high       low         high
Temperature: 41      Centigrade     -10         90         -5          85
Current:     7.392   mAmps          1.000       17.000     2.000       14.000
Voltage:     3264.9  mVolts         2900.0      3700.0     3000.0      3600.0
RX Power:    -4.0    dBm (400.1 uW) 10.0   uW   1258.9 uW  15.8   uW   1000.0 uW
TX Power:    -3.2    dBm (476.6 uW) 125.9  uW   631.0  uW  158.5  uW   562.3  uW

only about 200 uW actually made it to the FC port fscsi2 on the VIO server. Section “2” shows even worse values during the time the FC connections and cables were checked, which basically means that the FC link was down during that time period. Section “3” shows the values after the bad cable was found and replaced. Receive attenuation on FC port fscsi2 went down to about -3.7 dBm, which means that of the now 473.6 uW sent from the Brocade switchport, 427.3 uW actually make it to the FC port fscsi2 on the VIO server.

The goal with the continuous monitoring of the fibre channel transmit and receive, power and attenuation values is to catch slowly deterioration situations early on, before they become a real issue or even a service interruption. As shown above, this can be accomplished with Ganglia and the two RPM packages ganglia-addons-aix and ganglia-addons-aix-scripts. For ad hoc checks, e.g. during the debugging of the components in a suspicious FC link, efc_power is still best to be called directly from the AIX or VIO server command line.

// Ganglia Performance Monitoring on IBM Power with AIX and LPM

Ganglia is a great open source performance monitoring tool. Like many others (e.g. Cacti, Munin, etc.) it uses RRDtool as a consolidated data storage tool. Unlike others, the main focus of Ganglia is on efficiently monitoring large scale distributed environments. Ganglia can very easily be used to do performance monitoring in IBM Power environments running AIX, Linux/PPC and VIO servers. This is thanks to the great effort of Michael Perzl, who maintains pre-build Ganglia RPM packages for AIX which also cover some of the AIX specific metrics. Ganglia can very easily be utilized to also do application specific performance monitoring, since this is a very extensive subject it'll be discussed in upcoming articles.

Since Ganglia is usually used in environments where there is a “grid” of one or more “clusters” containing a larger number of individual “hosts” to be monitored, it applies the same semantics to the way it builds its hierarchical views. Unfortunately there is no exact equivalent to those classification terms in the IBM Power world. I found the following mapping of terms to be the most useful:

Ganglia Terminology IBM Power Terminology Comment
Grid Grid Container entity of 1+ clusters or managed systems
Cluster Managed System Container entity of 1+ hosts or LPARs
Host Host, LPAR Individual host running an OS

On the one hand this mapping makes sense, since you usually are either interested in the perfomance metrics of an individual host or in the relation between individual host metrics within the context of the managed system that is running those hosts. E.g. how much CPU time is a individual host using versus how much CPU time are all hosts on a managed system using and in what distribution.

On the other hand this mapping turns out to be a bit problematic, since Ganglia expects a rather static assignment of hosts to clusters. In a traditional HPC environment a host is seldomly moved from one cluster to another, making the necessary configuration work a manageable amount of administrative overhead. In a IBM Power environment with the above mapping of terms applied, a LPAR could – and with the introduction of LPM even more – easily be moved between different clusters. To reduce the administrative overhead, the necessary configuration changes should be done automatically. Historical performance data of a host should be preserved, even when moving between different clusters.

Also, by default Ganglia uses IP multicast for the necessary communication between the hosts and the Ganglia server. While this may be a viable method for large cluster setups in flat, unsegmented networks, it does not do so well in heavily segmented or firewalled network environments. Ganglia can be configured to instead use IP unicast for the communication between the hosts and the Ganglia server, but this also has some effect on the cluster to host mapping described above.

The Ganglia setup described below will meet the following design criteria:

  • Use of IP unicast communication with predefined UDP ports.

  • A “firewall-friendly” behaviour with regard to the Ganglia network communication.

  • Automatic reconfiguration of Ganglia in case a LPAR is moved from one managed system to another.

  • Preservation of the historical performance data in case a LPAR is moved from one managed system to another.

Prerequisites

For a Ganglia setup to fullfill the above design goals, some prerequisites have to be met:

  1. A working basic Ganglia server, with the following Ganglia RPM packages installed:

    ganglia-gmetad-3.4.0-1
    ganglia-gmond-3.4.0-1
    ganglia-gweb-3.5.2-1
    ganglia-lib-3.4.0-1

    If the OS of the ganglia server is also to be monitored as a Ganglia client, install the Ganglia modules you see fit. In my case e.g.:

    ganglia-mod_ibmame-3.4.0-1
    ganglia-p6-mod_ibmpower-3.4.0-1
  2. A working Ganglia client setup on the hosts to be monitored, with the following minimal Ganglia RPM packages installed:

    ganglia-gmond-3.4.0-1
    ganglia-lib-3.4.0-1

    In addition to the minimal Ganglia RPM packages, you can install any number of additional Ganglia modules you might need. In my case e.g. for all the fully virtualized LPARs:

    ganglia-mod_ibmame-3.4.0-1
    ganglia-p6-mod_ibmpower-3.4.0-1

    and for all the VIO servers and LPARs that have network and/or fibre channel hardware resources assigned:

    ganglia-gmond-3.4.0-1
    ganglia-lib-3.4.0-1
    ganglia-mod_ibmfc-3.4.0-1
    ganglia-mod_ibmnet-3.4.0-1
    ganglia-p6-mod_ibmpower-3.4.0-1
  3. A convention of which UDP ports will be used for the communication between the Ganglia clients and the Ganglia server. Each managed system in your IBM Power environment will get its own gmond process on the Ganglia server. For the Ganglia clients on a single managed system to be able to communicate with the Ganglia server, an individual and unique UDP port will be used.

  4. Model name, serial number and name of all managed systems in your IBM Power environment.

  5. If necessary, a set of firewall rules that allow communication between the Ganglia clients and the Ganglia server along the lines of the previously defined UDP port convention.

With the information from the above items 3 and 4, create a three-way mapping table like this:

Managed System Name Model Name_Serial Number Ganglia Unicast UDP Port
P550-DC1-R01 8204-E8A_0000001 8301
P770-DC1-R05 9117-MMB_0000002 8302
P770-DC2-R35 9117-MMB_0004711 8303

The information in this mapping table will be used in the following examples, so here's a bit more detailed explaination:

  • Three columns, “Managed System Name”, “Model Name_Serial Number” and “Ganglia Unicast UDP Port”.

  • Each row contains the information for one IBM Power managed system.

  • The field “Managed System Name” contains the name of the managed system, which is later on displayed in Ganglia. For ease of administration it should ideally be consistent with the name used in the HMC. In this example the naming convention is “Systemtype-Datacenter-Racknumber”.

  • The field “Model Name_Serial Number” contains the model and the S/N of the managed system, which are concatenated with “_” to a single string containing no whitespaces. Model and S/N are substrings of the values that the command:

    $ lsattr  -El sys0 -a systemid -a modelname
    
    systemid  IBM,020000001 Hardware system identifier False
    modelname IBM,8204-E8A  Machine name               False
    

    reports.

  • The field “Ganglia Unicast UDP Port” contains the UDP port which was assigned to a specific IBM Power managed system by the convention mentioned above. In this example the UDP ports were simply allocated in a incremental, sequential order, starting at port 8301. It can be any UDP port that is available in your environment, you just have to be consistent about its use and be careful to have no duplicate assignments.

Configuration

  1. Create a filesystem and a directory hierarchy for the Ganglia RRD database files to be stored in. In this example the already existing /ganglia filesystem is used to create the following directories:

    /ganglia
    /ganglia/rrds
    /ganglia/rrds/LPARS
    /ganglia/rrds/P550-DC1-R01
    /ganglia/rrds/P770-DC1-R05
    /ganglia/rrds/P770-DC2-R35
    ...
    /ganglia/rrds/<Managed System Name>

    Make sure all the directories are owned by the user under which the gmond processes will be running. In my case this is the user nobody.

    Later on, the directory /ganglia/rrds/LPARS will contain subdirectories for each LPAR, which in turn contain the RRD files storing the performance data for each LPAR. The directories /ganglia/rrds/<Managed System Name> will only contain a symbolic link for each LPAR, pointing to the actual LPAR directory within /ganglia/rrds/LPARS, e.g.:

    $ ls -ald /ganglia/rrds/*/saixtest.mydomain
    
    drwxr-xr-x 2 nobody [...] /ganglia/rrds/LPARS/saixtest.mydomain
    lrwxrwxrwx 1 nobody [...] /ganglia/rrds/P550-DC1-R01/saixtest.mydomain -> ../LPARS/saixtest.mydomain
    lrwxrwxrwx 1 nobody [...] /ganglia/rrds/P770-DC1-R05/saixtest.mydomain -> ../LPARS/saixtest.mydomain
    lrwxrwxrwx 1 nobody [...] /ganglia/rrds/P770-DC2-R35/saixtest.mydomain -> ../LPARS/saixtest.mydomain
    

    In addition – and this is the actual reason for the whole symbolic link orgy – Ganglia will automatically create a subdirectory /ganglia/rrds/<Managed System Name>/__SummaryInfo__, which contains the RRD files used to generate the aggregated summary performance metrics of each managed system. If the above setup would be simplified by placing the symbolic links one level higher in the directory hierarchy, each managed system would use the same __SummaryInfo__ subdirectory, rendering the RRD files in it useless for summary purposes.

    This directory and symbolic link setup might seem strange or even unnecessary at first, but it plays a major role in getting Ganglia to play along with LPM. So it's important to set this up accordingly.

  2. Setup a gmond server process for each managed system in your IBM Power environment. In this example we need three processes for the three managed systems mentioned above. On the Ganglia server, create a gmond configuration file for each managed system that can send performance data to the Ganglia server. Note that only the essential parts of the gmond configuration are shown in the following code snippets:

    /etc/ganglia/gmond-p550-dc1-r01.conf
    globals {
      ...
      mute = yes
      ...
    }
     
    cluster {
      name = "P550-DC1-R01"
      ...
    }
     
    udp_recv_channel {
      port = 8301
    }
     
    tcp_accept_channel {
      port = 8301
    }
    /etc/ganglia/gmond-p770-dc1-r05.conf
    globals {
      ...
      mute = yes
      ...
    }
     
    cluster {
      name = "P770-DC1-R05"
      ...
    }
     
    udp_recv_channel {
      port = 8302
    }
     
    tcp_accept_channel {
      port = 8302
    }
    /etc/ganglia/gmond-p770-dc2-r35.conf
    globals {
      ...
      mute = yes
      ...
    }
     
    cluster {
      name = "P770-DC2-R35"
      ...
    }
     
    udp_recv_channel {
      port = 8303
    }
     
    tcp_accept_channel {
      port = 8303
    }

    Copy the stock gmond init script for each gmond server process:

    $ for MSYS in p550-dc1-r01 p770-dc1-r05 p770-dc2-r35; do
      cp -pi /etc/rc.d/init.d/gmond /etc/rc.d/init.d/gmond-${MSYS}
    done
    

    and edit the following lines:

    /etc/rc.d/init.d/gmond-p550-dc1-r01
    ...
    PIDFILE=/var/run/gmond-p550-dc1-r01.pid
    ...
    GMOND_CONFIG=/etc/ganglia/gmond-p550-dc1-r01.conf
    ...
    /etc/rc.d/init.d/gmond-p770-dc1-r05
    ...
    PIDFILE=/var/run/gmond-p770-dc1-r05.pid
    ...
    GMOND_CONFIG=/etc/ganglia/gmond-p770-dc1-r05.conf
    ...
    /etc/rc.d/init.d/gmond-p770-dc2-r35
    ...
    PIDFILE=/var/run/gmond-p770-dc2-r35.pid
    ...
    GMOND_CONFIG=/etc/ganglia/gmond-p770-dc2-r35.conf
    ...

    and start the gmond server processes. Check for running processes and UDP ports being listened on:

    $ for MSYS in p550-dc1-r01 p770-dc1-r05 p770-dc2-r35; do
      /etc/rc.d/init.d/gmond-${MSYS} start
    done
    
    $ ps -ef | grep gmond-
    nobody  4260072 [...] /opt/freeware/sbin/gmond -c /etc/ganglia/gmond-p770-dc2-r35.conf -p /var/run/gmond-p770-dc2-r35.pid
    nobody  6094872 [...] /opt/freeware/sbin/gmond -c /etc/ganglia/gmond-p770-dc1-r05.conf -p /var/run/gmond-p770-dc1-r05.pid
    nobody  6291514 [...] /opt/freeware/sbin/gmond -c /etc/ganglia/gmond-p550-dc1-r01.conf -p /var/run/gmond-p550-dc1-r01.pid
    
    $ netstat -na | grep 830
    tcp4       0      0  *.8300   *.*   LISTEN
    tcp4       0      0  *.8301   *.*   LISTEN
    tcp4       0      0  *.8302   *.*   LISTEN
    udp4       0      0  *.8218   *.*
    udp4       0      0  *.8222   *.*
    udp4       0      0  *.8223   *.*
    

    If you want the gmond server processes to be startet automatically on startup of the Ganglia server system, create the appropriate “S” and “K” symbolic links in /etc/rc.d/rc2.d/. E.g.:

    $ for MSYS in p550-dc1-r01 p770-dc1-r05 p770-dc2-r35; do
      ln -s /etc/rc.d/init.d/gmond-${MSYS} /etc/rc.d/rc2.d/Sgmond-${MSYS}
      ln -s /etc/rc.d/init.d/gmond-${MSYS} /etc/rc.d/rc2.d/Kgmond-${MSYS}
    done
    
  3. Setup the gmetad server process to query each gmond server process. On the Ganglia server, edit the gmetad configuration file:

    /etc/ganglia/gmetad.conf
    ...
    data_source "P550-DC1-R01" 60 localhost:8301
    data_source "P770-DC1-R05" 60 localhost:8302
    data_source "P770-DC2-R35" 60 localhost:8303
    ...
    rrd_rootdir "/ganglia/rrds"
    ...

    and restart the gmetad process.

  4. Create a DNS or /etc/hosts entry named ganglia which points to the IP address of your Ganglia server. Make sure the DNS or hostname is resolvable on all the Ganglia client LPARs in your environment.

  5. Install the aaa_base and ganglia-addons-base RPM packages on all the Ganglia client LPARs in your environment.

    RPM packages

    FilenameFilesizeLast modified
    ganglia-addons-base-0.1-5.ppc-aix5.1.rpm10.4 KiB2013/06/16 16:08
    ganglia-addons-base-0.1-5.ppc-aix5.1.rpm.sha1sum83.0 B2013/06/16 16:08

    Source RPM packages

    FilenameFilesizeLast modified
    ganglia-addons-base-0.1-5.src.rpm10.2 KiB2013/06/16 16:08
    ganglia-addons-base-0.1-5.src.rpm.sha1sum76.0 B2013/06/16 16:08

    The ganglia-addons-base package installs several extensions to the stock ganglia-gmond package which are:

    • a modified gmond client init script /opt/freeware/libexec/ganglia-addons-base/gmond.

    • a configuration file /etc/ganglia/gmond.init.conf for the modified gmond client init script.

    • a configuration template file /etc/ganglia/gmond.tpl.

    • a run_parts script 800_ganglia_check_links.sh for the Ganglia server to perform a sanity check on the symbolic links in the /ganglia/rrds/<managed system>/ directories.

    Edit or create /etc/ganglia/gmond.init.conf to reflect your environment and roll it out to all the Ganglia client LPARs. E.g. with the example data from above:

    /etc/ganglia/gmond.init.conf
    # MSYS_ARRAY:
    MSYS_ARRAY["8204-E8A_0000001"]="P550-DC1-R01"
    MSYS_ARRAY["9117-MMB_0000002"]="P770-DC1-R05"
    MSYS_ARRAY["9117-MMB_0004711"]="P770-DC2-R35"
    #
    # GPORT_ARRAY:
    GPORT_ARRAY["P550-DC1-R01"]="8301"
    GPORT_ARRAY["P770-DC1-R05"]="8302"
    GPORT_ARRAY["P770-DC2-R35"]="8303"
    #
    # Exception for the ganglia host:
    G_HOST="ganglia"
    G_PORT_R="8399"

    Stop the currently running gmond client process, switch the gmond client init script to the modified version from ganglia-addons-base and start a new gmond client process:

    $ /etc/rc.d/init.d/gmond stop
    $ ln -sf /opt/freeware/libexec/ganglia-addons-base/gmond /etc/rc.d/init.d/gmond
    $ /etc/rc.d/init.d/gmond start
    

    The modified gmond client init script will now determine the model an S/N of the managed system the LPAR is running on. It will also evaluate the configuration from /etc/ganglia/gmond.init.conf and in conjunction with model an S/N of the managed system determine values for the Ganglia configuration entries “cluster → name”, “host → location” and “udp_send_channel → port”. The new configuration values will be merged with the configuration template file /etc/ganglia/gmond.tpl and be written to a temporary gmond configuration file. If the temporary and current gmond configuration files differ, the current gmond configuration file will be overwritten by the temporary one and the gmond client process will be restarted. If the temporary and current gmond configuration files match, the temporary one will be deleted and no further action is taken.

    The aaa_base package will add a errnotify ODM entry, which will call the shell script /opt/freeware/libexec/aaa_base/post_lpm.sh. This shell script will trigger actions upon the generation of errpt messages with the label CLIENT_PMIG_DONE, which signify the end of a successful LPM operation. Currently there are two actions performed in the shell script, one of which is to restart the gmond client process on the LPAR. Since this restart is done by calling the gmond client init script with the restart parameter, a new gmond configuration file will be created along the lines of the sequence explained in the previous paragraph. With the new configuration file, the freshly started gmond will now send its performance data to the gmond server process that is responsible for the managed system the LPAR is now running on.

  6. After some time – usually a few minutes – new directories should start to appear under the /ganglia/rrds/<managed system>/ directories on the Ganglia server. To complete the setup those directories have to be moved to /ganglia/rrds/LPARS/ and be replaced by appropriate symbolic links. E.g.:

    $ cd /ganglia/rrds/
    $ HST="saixtest.mydomain"; for DR in P550-DC1-R01 P770-DC1-R05 P770-DC2-R35; do
      pushd ${DR}
      [[ -d ${HST} ]] && mv -i ${HST} ../LPARS/
      ln -s ../LPARS/${HST} ${HST}
      popd
    done
    

    There is a helper script 800_ganglia_check_links.sh in the ganglia-addons-base package which should be run daily to bring up possible inconsistencies, but can also be run interactively from a shell on the Ganglia server.

  7. If everything was setup correctly, the LPARs should now report to the Ganglia server based on which managed system they currently run on. This should be reflected in the Ganglia web interface by every LPAR being inserted under the appropriate managed system (aka cluster). After a successful LPM operation, the LPAR should – after a short delay – automatically be sorted in under the new managed system (aka cluster) in the Ganglia web interface.

Debugging

If some problem arises or things generally do not go according to plan, here are some hints where to look for issues:

  • Check if a gmetad and all gmond server processes are running on the Ganglia server. Also check with netstat or lsof if the gmond server processes are listening on the correct UDP ports.

  • Check if the gmond configuration file on the Ganglia client LPARs is successfully written. Check if the content of /etc/ganglia/gmond.conf is accurate with regard to your environment. Especially check if the TCP/UDP ports on the Ganglia client LPAR match the ones on the Ganglia server.

  • Check if the Ganglia client LPARs can reach the Ganglia server (e.g. with ping). On the Ganglia server check if UDP packets are coming in from the LPAR in question (e.g. with tcpdump). On the Ganglia client LPAR, check if UDP packets are send out to the correct Ganglia server IP address (e.g. with tcpdump).

  • Check if the directory or symbolic link /ganglia/rrds/<managed system>/<hostname>/ exists. Check if there are RRD files in the directory. Check if they've recently been written.

  • Enable logging and increase the debug level on the Ganglia client and server processes, restart them and check for error messages.

Conclusions

With the described configuration changes and procedures, the already existing value of Ganglia as a performance monitoring tool for the IBM Power environment can even be increased. A more “firewall-friendly” Ganglia configuration is possible and has been used for several years in a heavily segmented and firewalled network environment. As new features like LPM are added to the IBM Power environment, increasing its overall flexibility and productivity, measures also have to be taken to ensure that surrounding tools and infrastructure like Ganglia are also up to the task.

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information