2013-03-10 // AIX and VIOS DLPAR Operation fails on LHEA Adapters
After upgrading our VIOS from v2.2.1.4 (aka FixPack 25, ServicePack 2) to v2.2.2.1 (aka FixPack 26) i noticed that add/remove DLPAR operations on LHEA adapters would fail with the following error message being displayed at the HMC:
The dynamic logical partitioning operation failed. - <name of VIOS> The dynamic logical partitioning requested could not be completed. Logical Port 1 belonging to Port Group 1 of HEA 23000000 failed to be added. Vary on of the LHEA failed. Please run the command rsthwres to recover the HEA configuration. ..build_tree ROOT DIR=/usr/lib/dr/scripts Syslog ch=DRMGR cal_dr_scriptinfo_file_checksum : Checksum : 0x94b71f75720f0132 File read: string table s_script: file_name: 0x200bf758(/usr/lib/dr/scripts/all/IBM.CSMAgentRM_dr.sh) script_version: 0x200bf7a8(2) script_vendor_info: 0x200bf7b3(IBM) script_creation_date: 0x200bf7aa(05252010) script_info: 0x200bf785(DR script to manage IBM.CSMAgentRM) s_resource(Before - offsets): resource_name: 0x5f resource_use_description: 0x69 s_resource:(After - ptrs) resource_name: 0x200bf7b7(pmig) resource_use_description: 0x200bf7c1(Partition migration for IBM.CSMAgentRM) s_resource(Before - offsets): resource_name: 0x64 resource_use_description: 0x90 s_resource:(After - ptrs) resource_name: 0x200bf7bc(phib) resource_use_description: 0x200bf7e8(Partition hibernation for IBM.CSMAgentRM) s_script: file_name: 0x200bf811(/usr/lib/dr/scripts/all/aud_acct_dr) script_version: 0x200bf860(1) script_vendor_info: 0x200bf86b(IBM) script_creation_date: 0x200bf862(03232007) script_info: 0x200bf835(WPAR DR Script for Auditing and Accounting) s_resource(Before - offsets): resource_name: 0x117 resource_use_description: 0x134 s_resource:(After - ptrs) resource_name: 0x200bf86f(wmig-checkpoint) resource_use_description: 0x200bf88c(Checkpoint of Auditing and Accounting within a WPAR) s_resource(Before - offsets): resource_name: 0x127 resource_use_description: 0x168 s_resource:(After - ptrs) resource_name: 0x200bf87f(wmig-restart) resource_use_description: 0x200bf8c0(Restart of Auditing and Accounting within a WPAR) s_script: file_name: 0x200bf8f1(/usr/lib/dr/scripts/all/ctrmc_MDdr) script_version: 0x200bf949(2) script_vendor_info: 0x200bf954(IBM) script_creation_date: 0x200bf94b(05252010) script_info: 0x200bf914(DR script to refresh Management Domain configuration) s_resource(Before - offsets): resource_name: 0x200 resource_use_description: 0x20a s_resource:(After - ptrs) resource_name: 0x200bf958(pmig) resource_use_description: 0x200bf962(Partition migration for RSCT Management Domain) s_resource(Before - offsets): resource_name: 0x205 resource_use_description: 0x239 s_resource:(After - ptrs) resource_name: 0x200bf95d(phib) resource_use_description: 0x200bf991(Partition hibernation for RSCT Management Domain) s_script: file_name: 0x200bf9c2(/usr/lib/dr/scripts/all/viosdr) script_version: 0x200bf9f1(1) script_vendor_info: 0x200bf9fc(IBM Corp.) script_creation_date: 0x200bf9f3(11192012) script_info: 0x200bf9e1(VIOS DR Handler) s_resource(Before - offsets): resource_name: 0x2ae resource_use_description: 0x2b3 s_resource:(After - ptrs) resource_name: 0x200bfa06(slot) resource_use_description: 0x200bfa0b(Virtual I/O Slot Handler) s_script: file_name: 0x200bfa24(/usr/lib/dr/scripts/all/wpar_drs) script_version: 0x200bfa62(1) script_vendor_info: 0x200bfa6d(IBM) script_creation_date: 0x200bfa64(05312007) script_info: 0x200bfa45(WPAR DR script for DR Events) s_resource(Before - offsets): resource_name: 0x319 resource_use_description: 0x335 s_resource:(After - ptrs) resource_name: 0x200bfa71(cpu) resource_use_description: 0x200bfa8d(Propagate CPU add/remove) s_resource(Before - offsets): resource_name: 0x31d resource_use_description: 0x34e s_resource:(After - ptrs) resource_name: 0x200bfa75(mem) resource_use_description: 0x200bfaa6(Propagate Memory add/remove) s_resource(Before - offsets): resource_name: 0x321 resource_use_description: 0x36a s_resource:(After - ptrs) resource_name: 0x200bfa79(capacity) resource_use_description: 0x200bfac2(Propagate Capacity changes) s_resource(Before - offsets): resource_name: 0x32a resource_use_description: 0x385 s_resource:(After - ptrs) resource_name: 0x200bfa82(var_weight) resource_use_description: 0x200bfadd(Propagate Var Weight changes) ..add_a_slot do_scripts_if_DR_pre_setup: bp_resource:slot invoke_DR_scripts: command:7, bp_resource:slot Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (checkacquire) invoke_one_DR_script: entry->command:7, timeout:20, script idx:3, resource name: slot set_env_vars: entry: cmd : checkacquire set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME=HEA 1 Parent: Waiting on the pipe for any info:out_line:0x2000f5f8, pipeinp: 0xf11674e8 Script returned string:(Execing:(DR_FORCE=FALSE DR_DRC_NAME=HEA 1 /usr/lib/dr/scripts/all/viosdr) with cmd:checkacquire, resource_name: slot) out of fgets while loop waiting for status child returned status: 127 Script returned status: 127 Check acquire phase failed, rc:0x7f HSCL297B
After wading through a lot of not really interesting output, the actual culprit for the failure can be found almost at the end of the output. The 7th line from the bottom, starting with Script returned string: …
shows that the value which is supposed to be assigned to the environment variable DR_DRC_NAME
contains a space and is probably not correctly quoted. The shell tries - in this example - to execute the command “1
”, which of course cannot be found. So the correct command line should probably look something like this:
DR_FORCE="FALSE" DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr ...
I opened a PMR with IBM support and was given a modified drslot_chrp_slot
binary, which has a temorary fix for this issue. The APAR IV36448 was opened to deal with this issue in future AIX and VIOS versions.
Although the DLPAR operation now works with the modified drslot_chrp_slot
binary, the cfglog
may still contain errors which look like this:
... C4 4325452 15:04:29 drscript_main.c 544 Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (postacquire) C4 4325452 15:04:29 drscript_main.c 546 invoke_one_DR_script: entry->command:9, timeout:20, script idx:3, resource name: slot C4 4325452 15:04:29 drscript_main.c 1673 set_env_vars: entry: cmd : postacquire C4 4325452 15:04:29 drscript_main.c 1719 set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME="HEA 1" C4 4325452 15:04:29 drscript_main.c 577 Environment var list: DR_FORCE=FALSE DR_DRC_NAME="HEA 1" C4 4325452 15:04:29 drscript_main.c 615 Parent: Waiting on the pipe for any info:out_line:0x20010398, pipeinp: 0xf06324e8 C4 4325452 15:04:29 drscript_main.c 782 Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot C4 4325452 15:04:29 drscript_main.c 630 Script returned string: (Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot ) C4 4325452 15:04:29 drscript_main.c 690 out of fgets while loop C4 4325452 15:04:29 drscript_main.c 697 waiting for status C4 4325452 15:04:29 drscript_main.c 710 child returned status: 1 C4 4325452 15:04:29 drscript_main.c 711 Script returned status: 1 ...
The problem here is with the /usr/lib/dr/scripts/all/viosdr
script. Another PMR with IBM support was opened and according to the developer:
Its (N.B.:viosdr
) purpose is to identify virtual adapters that are added to the system with DLPAR and automatically run the correct cfgmgr method to configure them.
LHEA adapters are not currently supported by this method so it should have ignored them and just returnedVIOSDR_SUCCESS
(zero)“.
Unfortunately there is currently no fix available for this issue.
Comments
Leave a comment…
- E-Mail address will not be published.
- Formatting:
//italic// __underlined__
**bold**''preformatted''
- Links:
[[http://example.com]]
[[http://example.com|Link Text]] - Quotation:
> This is a quote. Don't forget the space in front of the text: "> "
- Code:
<code>This is unspecific source code</code>
<code [lang]>This is specifc [lang] code</code>
<code php><?php echo 'example'; ?></code>
Available: html, css, javascript, bash, cpp, … - Lists:
Indent your text by two spaces and use a * for
each unordered list item or a - for ordered ones.
Nice entry on VIOS upgrade surprises, along with detailed analysis and outlook!
We recently upgraded to FP26 (v2.2.2.1) and found the lsmap command dying:
“Determine if backing device is a PV or LV or optical.” was last subcommand run. Determining device type failed.
Seems this is IV36334 - LSMAP THROWS “COMMAND NOT COMPLETE ERROR” MSGS but no fix is available, and FP26-SP1 (v2.2.2.2) has the same problem. We were able to work around this for now by mounting a (dummy) media in the file-backed optical VTD. Or you could delete the virtual DVD device, if not needed for the client LPAR.