2013-10-06 // Live Partition Mobility (LPM) with Debian on a IBM Power LPAR - Part 1
Live partition mobility (LPM) in the IBM Power environment is roughtly the same as vMotion in the VMware ESX world. It allows the movement of a LPAR from one IBM Power hardware system to another without (major) interruption to the system running within the LPAR. We use this feature on a regular basis for our AIX LPARs and it has simplified our work to a great extent, since we no longer need downtimes for a lot our regular administrative work. LPM also works for LPARs running Linux as an OS, but since IBM only supports the SuSE and Red Hat enterprise distributions, the necessary service and productivity tools to successfully perform LPM – and also DLPAR – operations are not readyly available to users of the Debian distribution. I still wanted to be able to do DLPAR and LPM operations on our Debian LPARs as well. To that effect, i did a conversion of the necessary RPM packages from the service and productivity tools mentioned before, to the DEB package format. Most of the conversion work was done by the alien
tool provided by the Debian distribution. Besides that, there still were some manual patches necessary to adjust the components to the specifics of the Debian environment. A current version of the Linux kernel and a rebuild of the kernel package with some PPC specific options enabled was also necessary. Here are the individual steps:
Install the prerequisite Debian packages:
libstdc++5 ksh uuid libsgutils1 libsqlite3-0
$ apt-get install libstdc++5 ksh uuid libsgutils1 libsqlite3-0
The
libsqlite3-0
is currently only necessary for thelibservicelog-1-1-1-32bit
package. The packageslibservicelog
andservicelog
, which also have a dependency tolibsqlite3-0
, contain binaries and libraries that are build for a 64bit userland (ppc64) which is currently not available for Debian. Using the binaries or libraries fromlibservicelog
orservicelog
will therefore result in an error about unresolved symbols.Convert or download the IBM service and productivity tools:
Option 1: Download the already converted IBM service and productivity tools:
Option 2: Convert the IBM service and productivity tools from RPM to DEB:
Install the prerequisite Debian packages:
alien
$ apt-get install alien
Download the necessary patch files:
Convert
librtas
:$ alien -gc librtas-32bit-1.3.6-4.ppc64.rpm $ patch -p 0 < librtas-32bit.patch $ cd librtas-32bit-1.3.6 $ ./debian/rules binary $ alien -gc librtas-1.3.6-3.ppc64.rpm $ patch -p 0 < librtas.patch $ cd librtas-1.3.6 $ ./debian/rules binary
Convert
src
:$ alien -gc src-1.3.1.1-11277.ppc.rpm $ patch -p 0 < src.patch $ rm src-1.3.1.1/debian/postrm $ cd src-1.3.1.1 $ perl -i -p -e 's/ppc64/powerpc/' ./debian/control $ ./debian/rules binary
Convert RSCT core and utils:
$ alien -gc rsct.core.utils-3.1.0.7-11277.ppc.rpm $ patch -p 0 < rsct.core.utils.patch $ cd rsct.core.utils-3.1.0.7 $ ./debian/rules binary $ alien -gc rsct.core-3.1.0.7-11277.ppc.rpm $ patch -p 0 < rsct.core.patch $ cd rsct.core-3.1.0.7 $ ./debian/rules binary
Convert ServiceRM:
$ alien -gc devices.chrp.base.ServiceRM-2.3.0.0-11231.ppc.rpm $ patch -p 0 < devices.chrp.base.ServiceRM.patch $ cd devices.chrp.base.ServiceRM-2.3.0.0 $ ./debian/rules binary
Convert DynamicRM:
$ alien -gc DynamicRM-1.3.9-7.ppc64.rpm $ patch -p 0 < DynamicRM.patch $ cd DynamicRM-1.3.9 $ ./debian/rules binary
Convert
lsvpd
andlibvpd
:$ alien -gc libvpd2-2.1.3-3.ppc64.rpm $ patch -p 0 < libvpd.patch $ cd libvpd2-2.1.3 $ ./debian/rules binary $ alien -gc lsvpd-1.6.11-4.ppc64.rpm $ patch -p 0 < lsvpd.patch $ cd lsvpd-1.6.11 $ ./debian/rules binary
Build, package and/or install PowerPC Utils:
Install from sources at sourceforge.net or use custom build DEB package from:
Convert
libservicelog
andservicelog
:$ alien -gc libservicelog-1_1-1-1.1.11-9.ppc64.rpm $ patch -p 0 < libservicelog-1_1-1.patch $ cd libservicelog-1_1-1-1.1.11 $ ./debian/rules binary $ alien -gc libservicelog-1.1.11-9.ppc64.rpm $ patch -p 0 < libservicelog.patch $ cd libservicelog-1.1.11 $ ./debian/rules binary $ alien -gc libservicelog-1_1-1-32bit-1.1.11-9.ppc.rpm $ patch -p 0 < libservicelog-1_1-1-32bit.patch $ cd libservicelog-1_1-1-32bit-1.1.11 $ ./debian/rules binary $ alien -gc servicelog-1.1.9-8.ppc64.rpm $ patch -p 0 < servicelog.patch $ cd servicelog-1.1.9 $ ./debian/rules binary
Install the IBM service and productivity tools DEB packages:
$ dpkg -i librtas_1.3.6-4_powerpc.deb librtas-32bit_1.3.6-5_powerpc.deb \ src_1.3.1.1-11278_powerpc.deb rsct.core.utils_3.1.0.7-11278_powerpc.deb \ rsct.core_3.1.0.7-11278_powerpc.deb devices.chrp.base.servicerm_2.3.0.0-11232_powerpc.deb \ dynamicrm_1.3.9-8_powerpc.deb libvpd2_2.1.3-4_powerpc.deb lsvpd_1.6.11-5_powerpc.deb \ powerpc-ibm-utils_1.2.12-1_powerpc.deb libservicelog-1-1-1-32bit_1.1.11-10_powerpc.deb \ libservicelog_1.1.11-10_powerpc.deb servicelog_1.1.9-9_powerpc.deb
Rebuild the stock Debian kernel package as described in HowTo Rebuild An Official Debian Kernel Package. I've confirmed DLPAR and LPM to successfully work with at least the Debian kernel packages versions
2.6.39-3~bpo60+1
and3.2.46-1~bpo60+1
. On themake menuconfig
step make sure the following kernel configuration options are selected:CONFIG_MIGRATION=y CONFIG_PPC_PSERIES=y CONFIG_PPC_SPLPAR=y CONFIG_LPARCFG=y CONFIG_PPC_SMLPAR=y CONFIG_PPC_RTAS=y CONFIG_RTAS_PROC=y CONFIG_NUMA=y # CONFIG_SPARSEMEM_VMEMMAP is not set CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG_SPARSE=y CONFIG_MEMORY_HOTREMOVE=y CONFIG_ARCH_MEMORY_PROBE=y CONFIG_HOTPLUG_PCI=y CONFIG_HOTPLUG_PCI_RPA=y CONFIG_HOTPLUG_PCI_RPA_DLPAR=y
Or use one of the following trimmed down kernel configuration files:
Install the newly build kernel package, reboot and select the new kernel to be loaded.
A few minutes after the system been started, DLPAR and LPM operations on the LPAR should now be possible from the HMC. A good indication from the HMC GUI is a properly filled field in the “OS Version” column. From the HMC CLI you can check with:
$ lspartition -dlpar ... <#108> Partition:<45*8231-E2D*06AB35T, ststnagios02.lan.ssbag, 10.8.32.46> Active:<1>, OS:<Linux/Debian, 3.2.0-0.bpo.4.ssb.1-powerUnknown, Unknown>, DCaps:<0x2c7f>, CmdCaps:<0x19, 0x19>, PinnedMem:<0> ...
The LPAR should show up in the output and the value of
DCaps
should be different from0x0
.After a successful LPM operation the output of
dmesg
from within the OS should look like this:... [539043.613297] calling ibm,suspend-me on cpu 4 [539043.960651] EPOW <0x6240040000000b8 0x0 0x0> [539043.960665] ibmvscsi 30000003: Re-enabling adapter! [539043.961606] RTAS: event: 21, Type: EPOW, Severity: 1 [539043.962920] ibmvscsi 30000002: Re-enabling adapter! [539044.175848] property parse failed in parse_next_property at line 230 [539044.485745] ibmvscsi 30000002: partner initialization complete [539044.485815] ibmvscsi 30000002: host srp version: 16.a, host partition vios1-p730-222 (1), OS 3, max io 262144 [539044.485892] ibmvscsi 30000002: Client reserve enabled [539044.485907] ibmvscsi 30000002: sent SRP login [539044.485964] ibmvscsi 30000002: SRP_LOGIN succeeded [539044.525723] ibmvscsi 30000003: partner initialization complete [539044.525779] ibmvscsi 30000003: host srp version: 16.a, host partition vios2-p730-222 (2), OS 3, max io 262144 [539044.525884] ibmvscsi 30000003: Client reserve enabled [539044.525897] ibmvscsi 30000003: sent SRP login [539044.525943] ibmvscsi 30000003: SRP_LOGIN succeeded [539044.884514] property parse failed in parse_next_property at line 230 ...
Although this was done some time ago and there now have already been several new versions of the packages from the service and productivity tools, DLPAR and LPM still work with this setup. There will be another installment of this post in the future with updated package versions. Another item on my ToDo list is to provide those components from the service and productivity tools which are available in source code as native Debian packages.
2013-03-10 // AIX and VIOS DLPAR Operation fails on LHEA Adapters
After upgrading our VIOS from v2.2.1.4 (aka FixPack 25, ServicePack 2) to v2.2.2.1 (aka FixPack 26) i noticed that add/remove DLPAR operations on LHEA adapters would fail with the following error message being displayed at the HMC:
The dynamic logical partitioning operation failed. - <name of VIOS> The dynamic logical partitioning requested could not be completed. Logical Port 1 belonging to Port Group 1 of HEA 23000000 failed to be added. Vary on of the LHEA failed. Please run the command rsthwres to recover the HEA configuration. ..build_tree ROOT DIR=/usr/lib/dr/scripts Syslog ch=DRMGR cal_dr_scriptinfo_file_checksum : Checksum : 0x94b71f75720f0132 File read: string table s_script: file_name: 0x200bf758(/usr/lib/dr/scripts/all/IBM.CSMAgentRM_dr.sh) script_version: 0x200bf7a8(2) script_vendor_info: 0x200bf7b3(IBM) script_creation_date: 0x200bf7aa(05252010) script_info: 0x200bf785(DR script to manage IBM.CSMAgentRM) s_resource(Before - offsets): resource_name: 0x5f resource_use_description: 0x69 s_resource:(After - ptrs) resource_name: 0x200bf7b7(pmig) resource_use_description: 0x200bf7c1(Partition migration for IBM.CSMAgentRM) s_resource(Before - offsets): resource_name: 0x64 resource_use_description: 0x90 s_resource:(After - ptrs) resource_name: 0x200bf7bc(phib) resource_use_description: 0x200bf7e8(Partition hibernation for IBM.CSMAgentRM) s_script: file_name: 0x200bf811(/usr/lib/dr/scripts/all/aud_acct_dr) script_version: 0x200bf860(1) script_vendor_info: 0x200bf86b(IBM) script_creation_date: 0x200bf862(03232007) script_info: 0x200bf835(WPAR DR Script for Auditing and Accounting) s_resource(Before - offsets): resource_name: 0x117 resource_use_description: 0x134 s_resource:(After - ptrs) resource_name: 0x200bf86f(wmig-checkpoint) resource_use_description: 0x200bf88c(Checkpoint of Auditing and Accounting within a WPAR) s_resource(Before - offsets): resource_name: 0x127 resource_use_description: 0x168 s_resource:(After - ptrs) resource_name: 0x200bf87f(wmig-restart) resource_use_description: 0x200bf8c0(Restart of Auditing and Accounting within a WPAR) s_script: file_name: 0x200bf8f1(/usr/lib/dr/scripts/all/ctrmc_MDdr) script_version: 0x200bf949(2) script_vendor_info: 0x200bf954(IBM) script_creation_date: 0x200bf94b(05252010) script_info: 0x200bf914(DR script to refresh Management Domain configuration) s_resource(Before - offsets): resource_name: 0x200 resource_use_description: 0x20a s_resource:(After - ptrs) resource_name: 0x200bf958(pmig) resource_use_description: 0x200bf962(Partition migration for RSCT Management Domain) s_resource(Before - offsets): resource_name: 0x205 resource_use_description: 0x239 s_resource:(After - ptrs) resource_name: 0x200bf95d(phib) resource_use_description: 0x200bf991(Partition hibernation for RSCT Management Domain) s_script: file_name: 0x200bf9c2(/usr/lib/dr/scripts/all/viosdr) script_version: 0x200bf9f1(1) script_vendor_info: 0x200bf9fc(IBM Corp.) script_creation_date: 0x200bf9f3(11192012) script_info: 0x200bf9e1(VIOS DR Handler) s_resource(Before - offsets): resource_name: 0x2ae resource_use_description: 0x2b3 s_resource:(After - ptrs) resource_name: 0x200bfa06(slot) resource_use_description: 0x200bfa0b(Virtual I/O Slot Handler) s_script: file_name: 0x200bfa24(/usr/lib/dr/scripts/all/wpar_drs) script_version: 0x200bfa62(1) script_vendor_info: 0x200bfa6d(IBM) script_creation_date: 0x200bfa64(05312007) script_info: 0x200bfa45(WPAR DR script for DR Events) s_resource(Before - offsets): resource_name: 0x319 resource_use_description: 0x335 s_resource:(After - ptrs) resource_name: 0x200bfa71(cpu) resource_use_description: 0x200bfa8d(Propagate CPU add/remove) s_resource(Before - offsets): resource_name: 0x31d resource_use_description: 0x34e s_resource:(After - ptrs) resource_name: 0x200bfa75(mem) resource_use_description: 0x200bfaa6(Propagate Memory add/remove) s_resource(Before - offsets): resource_name: 0x321 resource_use_description: 0x36a s_resource:(After - ptrs) resource_name: 0x200bfa79(capacity) resource_use_description: 0x200bfac2(Propagate Capacity changes) s_resource(Before - offsets): resource_name: 0x32a resource_use_description: 0x385 s_resource:(After - ptrs) resource_name: 0x200bfa82(var_weight) resource_use_description: 0x200bfadd(Propagate Var Weight changes) ..add_a_slot do_scripts_if_DR_pre_setup: bp_resource:slot invoke_DR_scripts: command:7, bp_resource:slot Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (checkacquire) invoke_one_DR_script: entry->command:7, timeout:20, script idx:3, resource name: slot set_env_vars: entry: cmd : checkacquire set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME=HEA 1 Parent: Waiting on the pipe for any info:out_line:0x2000f5f8, pipeinp: 0xf11674e8 Script returned string:(Execing:(DR_FORCE=FALSE DR_DRC_NAME=HEA 1 /usr/lib/dr/scripts/all/viosdr) with cmd:checkacquire, resource_name: slot) out of fgets while loop waiting for status child returned status: 127 Script returned status: 127 Check acquire phase failed, rc:0x7f HSCL297B
After wading through a lot of not really interesting output, the actual culprit for the failure can be found almost at the end of the output. The 7th line from the bottom, starting with Script returned string: …
shows that the value which is supposed to be assigned to the environment variable DR_DRC_NAME
contains a space and is probably not correctly quoted. The shell tries - in this example - to execute the command “1
”, which of course cannot be found. So the correct command line should probably look something like this:
DR_FORCE="FALSE" DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr ...
I opened a PMR with IBM support and was given a modified drslot_chrp_slot
binary, which has a temorary fix for this issue. The APAR IV36448 was opened to deal with this issue in future AIX and VIOS versions.
Although the DLPAR operation now works with the modified drslot_chrp_slot
binary, the cfglog
may still contain errors which look like this:
... C4 4325452 15:04:29 drscript_main.c 544 Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (postacquire) C4 4325452 15:04:29 drscript_main.c 546 invoke_one_DR_script: entry->command:9, timeout:20, script idx:3, resource name: slot C4 4325452 15:04:29 drscript_main.c 1673 set_env_vars: entry: cmd : postacquire C4 4325452 15:04:29 drscript_main.c 1719 set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME="HEA 1" C4 4325452 15:04:29 drscript_main.c 577 Environment var list: DR_FORCE=FALSE DR_DRC_NAME="HEA 1" C4 4325452 15:04:29 drscript_main.c 615 Parent: Waiting on the pipe for any info:out_line:0x20010398, pipeinp: 0xf06324e8 C4 4325452 15:04:29 drscript_main.c 782 Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot C4 4325452 15:04:29 drscript_main.c 630 Script returned string: (Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot ) C4 4325452 15:04:29 drscript_main.c 690 out of fgets while loop C4 4325452 15:04:29 drscript_main.c 697 waiting for status C4 4325452 15:04:29 drscript_main.c 710 child returned status: 1 C4 4325452 15:04:29 drscript_main.c 711 Script returned status: 1 ...
The problem here is with the /usr/lib/dr/scripts/all/viosdr
script. Another PMR with IBM support was opened and according to the developer:
Its (N.B.:viosdr
) purpose is to identify virtual adapters that are added to the system with DLPAR and automatically run the correct cfgmgr method to configure them.
LHEA adapters are not currently supported by this method so it should have ignored them and just returnedVIOSDR_SUCCESS
(zero)“.
Unfortunately there is currently no fix available for this issue.