After upgrading our VIOS from v2.2.1.4 (aka FixPack 25, ServicePack 2) to v2.2.2.1 (aka FixPack 26) i noticed that add/remove DLPAR operations on LHEA adapters would fail with the following error message being displayed at the HMC:

The dynamic logical partitioning operation failed. - <name of VIOS>

The dynamic logical partitioning requested could not be completed.
Logical Port 1 belonging to Port Group 1 of HEA 23000000 failed to be added.
Vary on of the LHEA failed. Please run the command rsthwres to recover the HEA configuration.

do_scripts_if_DR_pre_setup: bp_resource:slot
invoke_DR_scripts: command:7, bp_resource:slot
Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (checkacquire)
invoke_one_DR_script: entry->command:7, timeout:20, script idx:3, resource name: slot
set_env_vars: entry: cmd : checkacquire
set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME=HEA 1
Parent: Waiting on the pipe for any info:out_line:0x2000f5f8, pipeinp: 0xf11674e8
Script returned string:(Execing:(DR_FORCE=FALSE DR_DRC_NAME=HEA 1 /usr/lib/dr/scripts/all/viosdr) with cmd:checkacquire, resource_name: slot)
out of fgets while loop
waiting for status
child returned status: 127
Script returned status: 127
Check acquire phase failed, rc:0x7f


After wading through a lot of not really interesting output, the actual culprit for the failure can be found almost at the end of the output. The 7th line from the bottom, starting with Script returned string: … shows that the value which is supposed to be assigned to the environment variable DR_DRC_NAME contains a space and is probably not correctly quoted. The shell tries - in this example - to execute the command “1”, which of course cannot be found. So the correct command line should probably look something like this:

DR_FORCE="FALSE" DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr ...

I opened a PMR with IBM support and was given a modified drslot_chrp_slot binary, which has a temorary fix for this issue. The APAR IV36448 was opened to deal with this issue in future AIX and VIOS versions.

Although the DLPAR operation now works with the modified drslot_chrp_slot binary, the cfglog may still contain errors which look like this:

C4 4325452 15:04:29 drscript_main.c 544 Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (postacquire)
C4 4325452 15:04:29 drscript_main.c 546 invoke_one_DR_script: entry->command:9, timeout:20, script idx:3, resource name: slot
C4 4325452 15:04:29 drscript_main.c 1673 set_env_vars: entry: cmd : postacquire
C4 4325452 15:04:29 drscript_main.c 1719 set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME="HEA 1"
C4 4325452 15:04:29 drscript_main.c 577 Environment var list: DR_FORCE=FALSE DR_DRC_NAME="HEA 1"
C4 4325452 15:04:29 drscript_main.c 615 Parent: Waiting on the pipe for any info:out_line:0x20010398, pipeinp: 0xf06324e8
C4 4325452 15:04:29 drscript_main.c 782 Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1"  /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot
C4 4325452 15:04:29 drscript_main.c 630 Script returned string: (Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot )
C4 4325452 15:04:29 drscript_main.c 690 out of fgets while loop
C4 4325452 15:04:29 drscript_main.c 697 waiting for status
C4 4325452 15:04:29 drscript_main.c 710 child returned status: 1
C4 4325452 15:04:29 drscript_main.c 711 Script returned status: 1

The problem here is with the /usr/lib/dr/scripts/all/viosdr script. Another PMR with IBM support was opened and according to the developer:

Its (N.B.: viosdr) purpose is to identify virtual adapters that are added to the system with DLPAR and automatically run the correct cfgmgr method to configure them.
LHEA adapters are not currently supported by this method so it should have ignored them and just returned VIOSDR_SUCCESS (zero)“.

Unfortunately there is currently no fix available for this issue.


No. 1 @ 2013/04/01 12:54

Nice entry on VIOS upgrade surprises, along with detailed analysis and outlook!

We recently upgraded to FP26 (v2.2.2.1) and found the lsmap command dying:

Command did not complete.                                               

“Determine if backing device is a PV or LV or optical.” was last subcommand run. Determining device type failed.

Seems this is IV36334 - LSMAP THROWS “COMMAND NOT COMPLETE ERROR” MSGS but no fix is available, and FP26-SP1 (v2.2.2.2) has the same problem. We were able to work around this for now by mounting a (dummy) media in the file-backed optical VTD. Or you could delete the virtual DVD device, if not needed for the client LPAR.

