bityard Blog

// AIX and VIOS DLPAR Operation fails on LHEA Adapters

After upgrading our VIOS from v2.2.1.4 (aka FixPack 25, ServicePack 2) to v2.2.2.1 (aka FixPack 26) i noticed that add/remove DLPAR operations on LHEA adapters would fail with the following error message being displayed at the HMC:

The dynamic logical partitioning operation failed. - <name of VIOS>

The dynamic logical partitioning requested could not be completed.
Logical Port 1 belonging to Port Group 1 of HEA 23000000 failed to be added.
Vary on of the LHEA failed. Please run the command rsthwres to recover the HEA configuration.

..build_tree
ROOT DIR=/usr/lib/dr/scripts
Syslog ch=DRMGR
cal_dr_scriptinfo_file_checksum : Checksum : 0x94b71f75720f0132
File read: string table
s_script:
file_name: 0x200bf758(/usr/lib/dr/scripts/all/IBM.CSMAgentRM_dr.sh)
script_version: 0x200bf7a8(2)
script_vendor_info: 0x200bf7b3(IBM)
script_creation_date: 0x200bf7aa(05252010)
script_info: 0x200bf785(DR script to manage IBM.CSMAgentRM)
s_resource(Before - offsets):
resource_name: 0x5f
resource_use_description: 0x69
s_resource:(After - ptrs)
resource_name: 0x200bf7b7(pmig)
resource_use_description: 0x200bf7c1(Partition migration for IBM.CSMAgentRM)
s_resource(Before - offsets):
resource_name: 0x64
resource_use_description: 0x90
s_resource:(After - ptrs)
resource_name: 0x200bf7bc(phib)
resource_use_description: 0x200bf7e8(Partition hibernation for IBM.CSMAgentRM)
s_script:
file_name: 0x200bf811(/usr/lib/dr/scripts/all/aud_acct_dr)
script_version: 0x200bf860(1)
script_vendor_info: 0x200bf86b(IBM)
script_creation_date: 0x200bf862(03232007)
script_info: 0x200bf835(WPAR DR Script for Auditing and Accounting)
s_resource(Before - offsets):
resource_name: 0x117
resource_use_description: 0x134
s_resource:(After - ptrs)
resource_name: 0x200bf86f(wmig-checkpoint)
resource_use_description: 0x200bf88c(Checkpoint of Auditing and Accounting within a WPAR)
s_resource(Before - offsets):
resource_name: 0x127
resource_use_description: 0x168
s_resource:(After - ptrs)
resource_name: 0x200bf87f(wmig-restart)
resource_use_description: 0x200bf8c0(Restart of Auditing and Accounting within a WPAR)
s_script:
file_name: 0x200bf8f1(/usr/lib/dr/scripts/all/ctrmc_MDdr)
script_version: 0x200bf949(2)
script_vendor_info: 0x200bf954(IBM)
script_creation_date: 0x200bf94b(05252010)
script_info: 0x200bf914(DR script to refresh Management Domain configuration)
s_resource(Before - offsets):
resource_name: 0x200
resource_use_description: 0x20a
s_resource:(After - ptrs)
resource_name: 0x200bf958(pmig)
resource_use_description: 0x200bf962(Partition migration for RSCT Management Domain)
s_resource(Before - offsets):
resource_name: 0x205
resource_use_description: 0x239
s_resource:(After - ptrs)
resource_name: 0x200bf95d(phib)
resource_use_description: 0x200bf991(Partition hibernation for RSCT Management Domain)
s_script:
file_name: 0x200bf9c2(/usr/lib/dr/scripts/all/viosdr)
script_version: 0x200bf9f1(1)
script_vendor_info: 0x200bf9fc(IBM Corp.)
script_creation_date: 0x200bf9f3(11192012)
script_info: 0x200bf9e1(VIOS DR Handler)
s_resource(Before - offsets):
resource_name: 0x2ae
resource_use_description: 0x2b3
s_resource:(After - ptrs)
resource_name: 0x200bfa06(slot)
resource_use_description: 0x200bfa0b(Virtual I/O Slot Handler)
s_script:
file_name: 0x200bfa24(/usr/lib/dr/scripts/all/wpar_drs)
script_version: 0x200bfa62(1)
script_vendor_info: 0x200bfa6d(IBM)
script_creation_date: 0x200bfa64(05312007)
script_info: 0x200bfa45(WPAR DR script for DR Events)
s_resource(Before - offsets):
resource_name: 0x319
resource_use_description: 0x335
s_resource:(After - ptrs)
resource_name: 0x200bfa71(cpu)
resource_use_description: 0x200bfa8d(Propagate CPU add/remove)
s_resource(Before - offsets):
resource_name: 0x31d
resource_use_description: 0x34e
s_resource:(After - ptrs)
resource_name: 0x200bfa75(mem)
resource_use_description: 0x200bfaa6(Propagate Memory add/remove)
s_resource(Before - offsets):
resource_name: 0x321
resource_use_description: 0x36a
s_resource:(After - ptrs)
resource_name: 0x200bfa79(capacity)
resource_use_description: 0x200bfac2(Propagate Capacity changes)
s_resource(Before - offsets):
resource_name: 0x32a
resource_use_description: 0x385
s_resource:(After - ptrs)
resource_name: 0x200bfa82(var_weight)
resource_use_description: 0x200bfadd(Propagate Var Weight changes)
..add_a_slot
do_scripts_if_DR_pre_setup: bp_resource:slot
invoke_DR_scripts: command:7, bp_resource:slot
Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (checkacquire)
invoke_one_DR_script: entry->command:7, timeout:20, script idx:3, resource name: slot
set_env_vars: entry: cmd : checkacquire
set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME=HEA 1
Parent: Waiting on the pipe for any info:out_line:0x2000f5f8, pipeinp: 0xf11674e8
Script returned string:(Execing:(DR_FORCE=FALSE DR_DRC_NAME=HEA 1 /usr/lib/dr/scripts/all/viosdr) with cmd:checkacquire, resource_name: slot)
out of fgets while loop
waiting for status
child returned status: 127
Script returned status: 127
Check acquire phase failed, rc:0x7f

HSCL297B

After wading through a lot of not really interesting output, the actual culprit for the failure can be found almost at the end of the output. The 7th line from the bottom, starting with Script returned string: … shows that the value which is supposed to be assigned to the environment variable DR_DRC_NAME contains a space and is probably not correctly quoted. The shell tries - in this example - to execute the command “1”, which of course cannot be found. So the correct command line should probably look something like this:

DR_FORCE="FALSE" DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr ...

I opened a PMR with IBM support and was given a modified drslot_chrp_slot binary, which has a temorary fix for this issue. The APAR IV36448 was opened to deal with this issue in future AIX and VIOS versions.

Although the DLPAR operation now works with the modified drslot_chrp_slot binary, the cfglog may still contain errors which look like this:

...
C4 4325452 15:04:29 drscript_main.c 544 Invoking script (/usr/lib/dr/scripts/all/viosdr) with command (postacquire)
C4 4325452 15:04:29 drscript_main.c 546 invoke_one_DR_script: entry->command:9, timeout:20, script idx:3, resource name: slot
C4 4325452 15:04:29 drscript_main.c 1673 set_env_vars: entry: cmd : postacquire
C4 4325452 15:04:29 drscript_main.c 1719 set_env_vars: exit: DR_FORCE=FALSE DR_DRC_NAME="HEA 1"
C4 4325452 15:04:29 drscript_main.c 577 Environment var list: DR_FORCE=FALSE DR_DRC_NAME="HEA 1"
C4 4325452 15:04:29 drscript_main.c 615 Parent: Waiting on the pipe for any info:out_line:0x20010398, pipeinp: 0xf06324e8
C4 4325452 15:04:29 drscript_main.c 782 Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1"  /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot
C4 4325452 15:04:29 drscript_main.c 630 Script returned string: (Execing:(DR_FORCE=FALSE DR_DRC_NAME="HEA 1" /usr/lib/dr/scripts/all/viosdr) with cmd:postacquire, resource_name: slot )
C4 4325452 15:04:29 drscript_main.c 690 out of fgets while loop
C4 4325452 15:04:29 drscript_main.c 697 waiting for status
C4 4325452 15:04:29 drscript_main.c 710 child returned status: 1
C4 4325452 15:04:29 drscript_main.c 711 Script returned status: 1
...

The problem here is with the /usr/lib/dr/scripts/all/viosdr script. Another PMR with IBM support was opened and according to the developer:

Its (N.B.: viosdr) purpose is to identify virtual adapters that are added to the system with DLPAR and automatically run the correct cfgmgr method to configure them.
LHEA adapters are not currently supported by this method so it should have ignored them and just returned VIOSDR_SUCCESS (zero)“.

Unfortunately there is currently no fix available for this issue.

Comments

MarkD
No. 1 @ 2013/04/01 12:54

Nice entry on VIOS upgrade surprises, along with detailed analysis and outlook!

We recently upgraded to FP26 (v2.2.2.1) and found the lsmap command dying:

lsmap
 ...
Command did not complete.                                               
                                                                      

“Determine if backing device is a PV or LV or optical.” was last subcommand run. Determining device type failed.

Seems this is IV36334 - LSMAP THROWS “COMMAND NOT COMPLETE ERROR” MSGS but no fix is available, and FP26-SP1 (v2.2.2.2) has the same problem. We were able to work around this for now by mounting a (dummy) media in the file-backed optical VTD. Or you could delete the virtual DVD device, if not needed for the client LPAR.

Leave a comment…




I A A B V
  • E-Mail address will not be published.
  • Formatting:
    //italic//  __underlined__
    **bold**  ''preformatted''
  • Links:
    [[http://example.com]]
    [[http://example.com|Link Text]]
  • Quotation:
    > This is a quote. Don't forget the space in front of the text: "> "
  • Code:
    <code>This is unspecific source code</code>
    <code [lang]>This is specifc [lang] code</code>
    <code php><?php echo 'example'; ?></code>
    Available: html, css, javascript, bash, cpp, …
  • Lists:
    Indent your text by two spaces and use a * for
    each unordered list item or a - for ordered ones.
This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information