bityard Blog

// Live Partition Mobility (LPM) with Debian on a IBM Power LPAR - Part 1

Live partition mobility (LPM) in the IBM Power environment is roughtly the same as vMotion in the VMware ESX world. It allows the movement of a LPAR from one IBM Power hardware system to another without (major) interruption to the system running within the LPAR. We use this feature on a regular basis for our AIX LPARs and it has simplified our work to a great extent, since we no longer need downtimes for a lot our regular administrative work. LPM also works for LPARs running Linux as an OS, but since IBM only supports the SuSE and Red Hat enterprise distributions, the necessary service and productivity tools to successfully perform LPM – and also DLPAR – operations are not readyly available to users of the Debian distribution. I still wanted to be able to do DLPAR and LPM operations on our Debian LPARs as well. To that effect, i did a conversion of the necessary RPM packages from the service and productivity tools mentioned before, to the DEB package format. Most of the conversion work was done by the alien tool provided by the Debian distribution. Besides that, there still were some manual patches necessary to adjust the components to the specifics of the Debian environment. A current version of the Linux kernel and a rebuild of the kernel package with some PPC specific options enabled was also necessary. Here are the individual steps:

  1. Install the prerequisite Debian packages:

    libstdc++5
    ksh
    uuid
    libsgutils1
    libsqlite3-0
    $ apt-get install libstdc++5 ksh uuid libsgutils1 libsqlite3-0

    The libsqlite3-0 is currently only necessary for the libservicelog-1-1-1-32bit package. The packages libservicelog and servicelog, which also have a dependency to libsqlite3-0, contain binaries and libraries that are build for a 64bit userland (ppc64) which is currently not available for Debian. Using the binaries or libraries from libservicelog or servicelog will therefore result in an error about unresolved symbols.

  2. Convert or download the IBM service and productivity tools:

    Option 1: Download the already converted IBM service and productivity tools:

    Option 2: Convert the IBM service and productivity tools from RPM to DEB:

    1. Install the prerequisite Debian packages:

      alien
      $ apt-get install alien
    2. Download the necessary patch files:

    3. Convert librtas:

      $ alien -gc librtas-32bit-1.3.6-4.ppc64.rpm
      $ patch -p 0 < librtas-32bit.patch
      $ cd librtas-32bit-1.3.6
      $ ./debian/rules binary
      
      $ alien -gc librtas-1.3.6-3.ppc64.rpm
      $ patch -p 0 < librtas.patch
      $ cd librtas-1.3.6
      $ ./debian/rules binary
    4. Convert src:

      $ alien -gc src-1.3.1.1-11277.ppc.rpm
      $ patch -p 0 < src.patch
      $ rm src-1.3.1.1/debian/postrm
      $ cd src-1.3.1.1
      $ perl -i -p -e 's/ppc64/powerpc/' ./debian/control
      $ ./debian/rules binary
    5. Convert RSCT core and utils:

      $ alien -gc rsct.core.utils-3.1.0.7-11277.ppc.rpm
      $ patch -p 0 < rsct.core.utils.patch
      $ cd rsct.core.utils-3.1.0.7
      $ ./debian/rules binary
      
      $ alien -gc rsct.core-3.1.0.7-11277.ppc.rpm
      $ patch -p 0 < rsct.core.patch
      $ cd rsct.core-3.1.0.7
      $ ./debian/rules binary
    6. Convert ServiceRM:

      $ alien -gc devices.chrp.base.ServiceRM-2.3.0.0-11231.ppc.rpm
      $ patch -p 0 < devices.chrp.base.ServiceRM.patch
      $ cd devices.chrp.base.ServiceRM-2.3.0.0
      $ ./debian/rules binary
    7. Convert DynamicRM:

      $ alien -gc DynamicRM-1.3.9-7.ppc64.rpm
      $ patch -p 0 < DynamicRM.patch
      $ cd DynamicRM-1.3.9
      $ ./debian/rules binary
    8. Convert lsvpd and libvpd:

      $ alien -gc libvpd2-2.1.3-3.ppc64.rpm
      $ patch -p 0 < libvpd.patch
      $ cd libvpd2-2.1.3
      $ ./debian/rules binary
      
      $ alien -gc lsvpd-1.6.11-4.ppc64.rpm
      $ patch -p 0 < lsvpd.patch
      $ cd lsvpd-1.6.11
      $ ./debian/rules binary
    9. Build, package and/or install PowerPC Utils:

      Install from sources at sourceforge.net or use custom build DEB package from:

    10. Convert libservicelog and servicelog:

      $ alien -gc libservicelog-1_1-1-1.1.11-9.ppc64.rpm
      $ patch -p 0 < libservicelog-1_1-1.patch
      $ cd libservicelog-1_1-1-1.1.11
      $ ./debian/rules binary
      
      $ alien -gc libservicelog-1.1.11-9.ppc64.rpm
      $ patch -p 0 < libservicelog.patch
      $ cd libservicelog-1.1.11
      $ ./debian/rules binary
      
      $ alien -gc libservicelog-1_1-1-32bit-1.1.11-9.ppc.rpm
      $ patch -p 0 < libservicelog-1_1-1-32bit.patch
      $ cd libservicelog-1_1-1-32bit-1.1.11
      $ ./debian/rules binary
      
      $ alien -gc servicelog-1.1.9-8.ppc64.rpm
      $ patch -p 0 < servicelog.patch
      $ cd servicelog-1.1.9
      $ ./debian/rules binary
  3. Install the IBM service and productivity tools DEB packages:

    $ dpkg -i librtas_1.3.6-4_powerpc.deb librtas-32bit_1.3.6-5_powerpc.deb \
        src_1.3.1.1-11278_powerpc.deb rsct.core.utils_3.1.0.7-11278_powerpc.deb \
        rsct.core_3.1.0.7-11278_powerpc.deb devices.chrp.base.servicerm_2.3.0.0-11232_powerpc.deb \
        dynamicrm_1.3.9-8_powerpc.deb libvpd2_2.1.3-4_powerpc.deb lsvpd_1.6.11-5_powerpc.deb \
        powerpc-ibm-utils_1.2.12-1_powerpc.deb libservicelog-1-1-1-32bit_1.1.11-10_powerpc.deb \
        libservicelog_1.1.11-10_powerpc.deb servicelog_1.1.9-9_powerpc.deb
  4. Rebuild the stock Debian kernel package as described in HowTo Rebuild An Official Debian Kernel Package. I've confirmed DLPAR and LPM to successfully work with at least the Debian kernel packages versions 2.6.39-3~bpo60+1 and 3.2.46-1~bpo60+1. On the make menuconfig step make sure the following kernel configuration options are selected:

    CONFIG_MIGRATION=y
    CONFIG_PPC_PSERIES=y
    CONFIG_PPC_SPLPAR=y
    CONFIG_LPARCFG=y
    CONFIG_PPC_SMLPAR=y
    CONFIG_PPC_RTAS=y
    CONFIG_RTAS_PROC=y
    CONFIG_NUMA=y
    # CONFIG_SPARSEMEM_VMEMMAP is not set
    CONFIG_MEMORY_HOTPLUG=y
    CONFIG_MEMORY_HOTPLUG_SPARSE=y
    CONFIG_MEMORY_HOTREMOVE=y
    CONFIG_ARCH_MEMORY_PROBE=y
    CONFIG_HOTPLUG_PCI=y
    CONFIG_HOTPLUG_PCI_RPA=y
    CONFIG_HOTPLUG_PCI_RPA_DLPAR=y

    Or use one of the following trimmed down kernel configuration files:

    Install the newly build kernel package, reboot and select the new kernel to be loaded.

  5. A few minutes after the system been started, DLPAR and LPM operations on the LPAR should now be possible from the HMC. A good indication from the HMC GUI is a properly filled field in the “OS Version” column. From the HMC CLI you can check with:

    $ lspartition -dlpar
    ...
    <#108> Partition:<45*8231-E2D*06AB35T, ststnagios02.lan.ssbag, 10.8.32.46>
           Active:<1>, OS:<Linux/Debian, 3.2.0-0.bpo.4.ssb.1-powerUnknown, Unknown>, DCaps:<0x2c7f>, CmdCaps:<0x19, 0x19>, PinnedMem:<0>
    ...

    The LPAR should show up in the output and the value of DCaps should be different from 0x0.

    After a successful LPM operation the output of dmesg from within the OS should look like this:

    ...
    [539043.613297] calling ibm,suspend-me on cpu 4
    [539043.960651] EPOW <0x6240040000000b8 0x0 0x0>
    [539043.960665] ibmvscsi 30000003: Re-enabling adapter!
    [539043.961606] RTAS: event: 21, Type: EPOW, Severity: 1
    [539043.962920] ibmvscsi 30000002: Re-enabling adapter!
    [539044.175848] property parse failed in parse_next_property at line 230
    [539044.485745] ibmvscsi 30000002: partner initialization complete
    [539044.485815] ibmvscsi 30000002: host srp version: 16.a, host partition vios1-p730-222 (1), OS 3, max io 262144
    [539044.485892] ibmvscsi 30000002: Client reserve enabled
    [539044.485907] ibmvscsi 30000002: sent SRP login
    [539044.485964] ibmvscsi 30000002: SRP_LOGIN succeeded
    [539044.525723] ibmvscsi 30000003: partner initialization complete
    [539044.525779] ibmvscsi 30000003: host srp version: 16.a, host partition vios2-p730-222 (2), OS 3, max io 262144
    [539044.525884] ibmvscsi 30000003: Client reserve enabled
    [539044.525897] ibmvscsi 30000003: sent SRP login
    [539044.525943] ibmvscsi 30000003: SRP_LOGIN succeeded
    [539044.884514] property parse failed in parse_next_property at line 230
    ...

Although this was done some time ago and there now have already been several new versions of the packages from the service and productivity tools, DLPAR and LPM still work with this setup. There will be another installment of this post in the future with updated package versions. Another item on my ToDo list is to provide those components from the service and productivity tools which are available in source code as native Debian packages.

// AIX and VIOS DLPAR Operation fails on LHEA Adapters

After upgrading our VIOS from v2.2.1.4 (aka FixPack 25, ServicePack 2) to v2.2.2.1 (aka FixPack 26) i noticed that add/remove DLPAR operations on LHEA adapters would fail with the following error message being displayed at the HMC:

The dynamic logical partitioning operation failed. - <name of VIOS>

The dynamic logical partitioning requested could not be completed.
Logical Port 1 belonging to Port Group 1 of HEA 23000000 failed to be added.
Vary on of the LHEA failed. Please run the command rsthwres to recover the HEA configuration.

..build_tree
ROOT DIR=/usr/lib/dr/scripts
Syslog ch=DRMGR
cal_dr_scriptinfo_file_checksum : Checksum : 0x94b71f75720f0132
File read: string table
s_script:
file_name: 0x200bf758(/usr/lib/dr/scripts/all/IBM.CSMAgentRM_dr.sh)
script_version: 0x200bf7a8(2)
script_vendor_info: 0x200bf7b3(IBM)
script_creation_date: 0x200bf7aa(05252010)
script_info: 0x200bf785(DR script to manage IBM.CSMAgentRM)
s_resource(Before - offsets):
resource_name: 0x5f
resource_use_description: 0x69
s_resource:(After - ptrs)
resource_name: 0x200bf7b7(pmig)
resource_use_description: 0x200bf7c1(Partition migration for IBM.CSMAgentRM)
s_resource(Before - offsets):
resource_name: 0x64
resource_use_description: 0x90
s_resource:(After - ptrs)
resource_name: 0x200bf7bc(phib)
resource_use_description: 0x200bf7e8(Partition hibernation for IBM.CSMAgentRM)
s_script:
file_name: 0x200bf811(/usr/lib/dr/scripts/all/aud_acct_dr)
script_version: 0x200bf860(1)
script_vendor_info: 0x200bf86b(IBM)
script_creation_date: 0x200bf862(03232007)
script_info: 0x200bf835(WPAR DR Script for Auditing and Accounting)
s_resource(Before - offsets):
resource_name: 0x117
resource_use_description: 0x134
s_resource:(After - ptrs)
resource_name: 0x200bf86f(wmig-checkpoint)
resource_use_description: 0x200bf88c(Checkpoint of Auditing and Accounting within a WPAR)
s_resource(Before - offsets):
resource_name: 0x127
resource_use_description: 0x168
s_resource:(After - ptrs)
resource_name: 0x200bf87f(wmig-restart)
resource_use_description: 0x200bf8c0(Restart of Auditing and Accounting within a WPAR)
s_script:
file_name: 0x200bf8f1(/usr/lib/dr/scripts/all/ctrmc_MDdr)
script_version: 0x200bf949(2)
script_vendor_info: 0x200bf954(IBM)
script_creation_date: 0x200bf94b(05252010)
script_info: 0x200bf914(DR script to refresh Management Domain configuration)
s_resource(Before - offsets):
resource_name: 0x200
resource_use_description: 0x20a
s_resource:(After - ptrs)
resource_name: 0x200bf958(pmig)
resource_use_description: 0x200bf962(Partition migration for RSCT Management Domain)
s_resource(Before - offsets):
resource_name: 0x205
resource_use_description: 0x239
s_resource:(After - ptrs)
resource_name: 0x200bf95d(phib)
resource_use_description: 0x200bf991(Partition hibernation for RSCT Management Domain)
s_script:
file_name: 0x200bf9c2(/usr/lib/dr/scripts/all/viosdr)
script_version: 0x200bf9f1(1)
script_vendor_info: 0x200bf9fc(IBM Corp.)
script_creation_date: 0x200bf9f3(11192012)
script_info: 0x200bf9e1(VIOS DR Handler)
s_resource(Before - offsets):
resource_name: 0x2ae
resource_use_description: 0x2b3
s_resource:(After - ptrs)
resource_name: 0x200bfa06(slot)
resource_use_description: 0x200bfa0b(Virtual I/O Slot Handler)
s_script:
file_name: 0x200bfa24(/usr/lib/dr/scripts/all/wpar_drs)
script_version: 0x200bfa62(1)
script_vendor_info: 0x200bfa6d(IBM)
script_creation_date: 0x200bfa64(05312007)
script_info: 0x200bfa45(WPAR DR script for DR Events)
s_resource(Before - offsets):
resource_name: 0x319
resource_use_description: 0x335
s_resource:(After - ptrs)
resource_name: 0x200bfa71(cpu)
resource_use_description: 0x200bfa8d(Propagate CPU add/remove)
s_resource(Before - offsets):
resource_name: 0x31d
resource_use_description: 0x34e
s_resource:(After - ptrs)
resource_name: 0x200bfa75(mem)
resource_use_description: 0x200bfaa6(Propagate Memory add/remove)
s_resource(Before - offsets):
resource_name: 0x321
resource_use_description: 0x36a
s_resource:(After - ptrs)
resource_name: 0x200bfa79(capacity)
resource_use_description: 0x200bfac2(Propagate Capacity changes)
s_resource(Before - offsets):
resource_name: 0x32a
resource_use_description: 0x385
s_resource:(After - ptrs)
resource_name: 0x200bfa82(var_weight)
resource_use_description: 0x200bfadd(Propagate Var Weight changes)
..add_a_slot
do_scripts_if_DR_pre_setup: bp_resource:slot
invoke_DR_scripts: command:7, bp_resource:slot
Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (checkacquire)
invoke_one_DR_script: entry->command:7, timeout:20, script idx:3, resource name: slot
set_env_vars: entry: cmd : checkacquire
set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME=HEA 1
Parent: Waiting on the pipe for any info:out_line:0x2000f5f8, pipeinp: 0xf11674e8
Script returned string:(Execing:(DR_FORCE=FALSE DR_DRC_NAME=HEA 1 /usr/lib/dr/scripts/all/viosdr) with cmd:checkacquire, resource_name: slot)
out of fgets while loop
waiting for status
child returned status: 127
Script returned status: 127
Check acquire phase failed, rc:0x7f

HSCL297B

After wading through a lot of not really interesting output, the actual culprit for the failure can be found almost at the end of the output. The 7th line from the bottom, starting with Script returned string: … shows that the value which is supposed to be assigned to the environment variable DR_DRC_NAME contains a space and is probably not correctly quoted. The shell tries - in this example - to execute the command “1”, which of course cannot be found. So the correct command line should probably look something like this:

DR_FORCE="FALSE" DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr ...

I opened a PMR with IBM support and was given a modified drslot_chrp_slot binary, which has a temorary fix for this issue. The APAR IV36448 was opened to deal with this issue in future AIX and VIOS versions.

Although the DLPAR operation now works with the modified drslot_chrp_slot binary, the cfglog may still contain errors which look like this:

...
C4 4325452 15:04:29 drscript_main.c 544 Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (postacquire)
C4 4325452 15:04:29 drscript_main.c 546 invoke_one_DR_script: entry->command:9, timeout:20, script idx:3, resource name: slot
C4 4325452 15:04:29 drscript_main.c 1673 set_env_vars: entry: cmd : postacquire
C4 4325452 15:04:29 drscript_main.c 1719 set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME="HEA 1"
C4 4325452 15:04:29 drscript_main.c 577 Environment var list: DR_FORCE=FALSE DR_DRC_NAME="HEA 1"
C4 4325452 15:04:29 drscript_main.c 615 Parent: Waiting on the pipe for any info:out_line:0x20010398, pipeinp: 0xf06324e8
C4 4325452 15:04:29 drscript_main.c 782 Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1"  /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot
C4 4325452 15:04:29 drscript_main.c 630 Script returned string: (Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot )
C4 4325452 15:04:29 drscript_main.c 690 out of fgets while loop
C4 4325452 15:04:29 drscript_main.c 697 waiting for status
C4 4325452 15:04:29 drscript_main.c 710 child returned status: 1
C4 4325452 15:04:29 drscript_main.c 711 Script returned status: 1
...

The problem here is with the /usr/lib/dr/scripts/all/viosdr script. Another PMR with IBM support was opened and according to the developer:

Its (N.B.: viosdr) purpose is to identify virtual adapters that are added to the system with DLPAR and automatically run the correct cfgmgr method to configure them.
LHEA adapters are not currently supported by this method so it should have ignored them and just returned VIOSDR_SUCCESS (zero)“.

Unfortunately there is currently no fix available for this issue.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website. More information about cookies