2013-10-20 // Display the AIX devices in a tree format
In AIX there is the proctree
(or ps -fT 0
) command to display the currently running processes in a tree format. This is very helpful when one is primaryly interested in the parent and child relationship between the individual processes. Unfortunately a similar command for the parent and child relationship of devices is still missing from the stock AIX. There are already several script implementations out on the net to fill that particular gap. As a scripting exercise i wanted to do my own version of a devtree
command. It was included in the aaa_base RPM package. The output for e.g. an LPAR with virtual ethernet and virtual SCSI devices looks like this:
| |-- inet0 | |-- en1 | |-- et1 | |-- lo0 |-- iocp0 |-- lvdd |-- pty0 |-- rootvg | |-- hd1 | |-- hd2 | |-- hd3 | |-- hd4 | |-- hd5 | |-- hd6 | |-- hd8 | |-- hd10opt | |-- hd9var | |-- lv_srv |-- sfw0 |-- sys0 | |-- sysplanar0 | | |-- L2cache0 | | |-- mem0 | | |-- pci0 | | | |-- pci4 | | | |-- pci5 | | | |-- pci6 | | |-- pci1 | | | |-- pci7 | | | |-- pci8 | | |-- pci2 | | | |-- pci9 | | | |-- pci10 | | | |-- pci11 | | |-- pci3 | | | |-- pci12 | | | |-- pci13 | | |-- pci14 | | |-- proc0 | | |-- proc4 | | |-- vio0 | | | |-- ent1 | | | |-- vsa0 | | | | |-- vty0 | | | |-- vscsi0 | | | | |-- hdisk0 | | | |-- vscsi1 | `-- End of the device tree
in the regular mode, or like this:
| |-- inet0 Available Internet Network Extension | |-- en1 Available Standard Ethernet Network Interface | |-- et1 Defined IEEE 802.3 Ethernet Network Interface | |-- lo0 Available Loopback Network Interface |-- iocp0 Defined I/O Completion Ports |-- lvdd Available LVM Device Driver |-- pty0 Available Asynchronous Pseudo-Terminal |-- rootvg Defined Volume group | |-- hd1 Defined Logical volume | |-- hd2 Defined Logical volume | |-- hd3 Defined Logical volume | |-- hd4 Defined Logical volume | |-- hd5 Defined Logical volume | |-- hd6 Defined Logical volume | |-- hd8 Defined Logical volume | |-- hd10opt Defined Logical volume | |-- hd9var Defined Logical volume | |-- lv_srv Defined Logical volume |-- sfw0 Available Storage Framework Module |-- sys0 Available System Object | |-- sysplanar0 Available System Planar | | |-- L2cache0 Available L2 Cache | | |-- mem0 Available Memory | | |-- pci0 Defined PCI Bus | | | |-- pci4 Defined 00-10 PCI Bus | | | |-- pci5 Defined 00-12 PCI Bus | | | |-- pci6 Defined 00-16 PCI Bus | | |-- pci1 Defined PCI Bus | | | |-- pci7 Defined 02-10 PCI Bus | | | |-- pci8 Defined 02-12 PCI Bus | | |-- pci2 Defined PCI Bus | | | |-- pci9 Defined 03-10 PCI Bus | | | |-- pci10 Defined 03-12 PCI Bus | | | |-- pci11 Defined 03-16 PCI Bus | | |-- pci3 Defined PCI Bus | | | |-- pci12 Defined 01-10 PCI Bus | | | |-- pci13 Defined 01-12 PCI Bus | | |-- pci14 Defined PCI Bus | | |-- proc0 Available 00-00 Processor | | |-- proc4 Available 00-04 Processor | | |-- vio0 Available Virtual I/O Bus | | | |-- ent1 Available Virtual I/O Ethernet Adapter (l-lan) | | | |-- vsa0 Available LPAR Virtual Serial Adapter | | | | |-- vty0 Available Asynchronous Terminal | | | |-- vscsi0 Available Virtual SCSI Client Adapter | | | | |-- hdisk0 Available Virtual SCSI Disk Drive | | | |-- vscsi1 Available Virtual SCSI Client Adapter | `-- End of the device tree
in the detailed output mode.
2013-10-06 // Live Partition Mobility (LPM) with Debian on a IBM Power LPAR - Part 1
Live partition mobility (LPM) in the IBM Power environment is roughtly the same as vMotion in the VMware ESX world. It allows the movement of a LPAR from one IBM Power hardware system to another without (major) interruption to the system running within the LPAR. We use this feature on a regular basis for our AIX LPARs and it has simplified our work to a great extent, since we no longer need downtimes for a lot our regular administrative work. LPM also works for LPARs running Linux as an OS, but since IBM only supports the SuSE and Red Hat enterprise distributions, the necessary service and productivity tools to successfully perform LPM – and also DLPAR – operations are not readyly available to users of the Debian distribution. I still wanted to be able to do DLPAR and LPM operations on our Debian LPARs as well. To that effect, i did a conversion of the necessary RPM packages from the service and productivity tools mentioned before, to the DEB package format. Most of the conversion work was done by the alien
tool provided by the Debian distribution. Besides that, there still were some manual patches necessary to adjust the components to the specifics of the Debian environment. A current version of the Linux kernel and a rebuild of the kernel package with some PPC specific options enabled was also necessary. Here are the individual steps:
Install the prerequisite Debian packages:
libstdc++5 ksh uuid libsgutils1 libsqlite3-0
$ apt-get install libstdc++5 ksh uuid libsgutils1 libsqlite3-0
The
libsqlite3-0
is currently only necessary for thelibservicelog-1-1-1-32bit
package. The packageslibservicelog
andservicelog
, which also have a dependency tolibsqlite3-0
, contain binaries and libraries that are build for a 64bit userland (ppc64) which is currently not available for Debian. Using the binaries or libraries fromlibservicelog
orservicelog
will therefore result in an error about unresolved symbols.Convert or download the IBM service and productivity tools:
Option 1: Download the already converted IBM service and productivity tools:
Option 2: Convert the IBM service and productivity tools from RPM to DEB:
Install the prerequisite Debian packages:
alien
$ apt-get install alien
Download the necessary patch files:
Convert
librtas
:$ alien -gc librtas-32bit-1.3.6-4.ppc64.rpm $ patch -p 0 < librtas-32bit.patch $ cd librtas-32bit-1.3.6 $ ./debian/rules binary $ alien -gc librtas-1.3.6-3.ppc64.rpm $ patch -p 0 < librtas.patch $ cd librtas-1.3.6 $ ./debian/rules binary
Convert
src
:$ alien -gc src-1.3.1.1-11277.ppc.rpm $ patch -p 0 < src.patch $ rm src-1.3.1.1/debian/postrm $ cd src-1.3.1.1 $ perl -i -p -e 's/ppc64/powerpc/' ./debian/control $ ./debian/rules binary
Convert RSCT core and utils:
$ alien -gc rsct.core.utils-3.1.0.7-11277.ppc.rpm $ patch -p 0 < rsct.core.utils.patch $ cd rsct.core.utils-3.1.0.7 $ ./debian/rules binary $ alien -gc rsct.core-3.1.0.7-11277.ppc.rpm $ patch -p 0 < rsct.core.patch $ cd rsct.core-3.1.0.7 $ ./debian/rules binary
Convert ServiceRM:
$ alien -gc devices.chrp.base.ServiceRM-2.3.0.0-11231.ppc.rpm $ patch -p 0 < devices.chrp.base.ServiceRM.patch $ cd devices.chrp.base.ServiceRM-2.3.0.0 $ ./debian/rules binary
Convert DynamicRM:
$ alien -gc DynamicRM-1.3.9-7.ppc64.rpm $ patch -p 0 < DynamicRM.patch $ cd DynamicRM-1.3.9 $ ./debian/rules binary
Convert
lsvpd
andlibvpd
:$ alien -gc libvpd2-2.1.3-3.ppc64.rpm $ patch -p 0 < libvpd.patch $ cd libvpd2-2.1.3 $ ./debian/rules binary $ alien -gc lsvpd-1.6.11-4.ppc64.rpm $ patch -p 0 < lsvpd.patch $ cd lsvpd-1.6.11 $ ./debian/rules binary
Build, package and/or install PowerPC Utils:
Install from sources at sourceforge.net or use custom build DEB package from:
Convert
libservicelog
andservicelog
:$ alien -gc libservicelog-1_1-1-1.1.11-9.ppc64.rpm $ patch -p 0 < libservicelog-1_1-1.patch $ cd libservicelog-1_1-1-1.1.11 $ ./debian/rules binary $ alien -gc libservicelog-1.1.11-9.ppc64.rpm $ patch -p 0 < libservicelog.patch $ cd libservicelog-1.1.11 $ ./debian/rules binary $ alien -gc libservicelog-1_1-1-32bit-1.1.11-9.ppc.rpm $ patch -p 0 < libservicelog-1_1-1-32bit.patch $ cd libservicelog-1_1-1-32bit-1.1.11 $ ./debian/rules binary $ alien -gc servicelog-1.1.9-8.ppc64.rpm $ patch -p 0 < servicelog.patch $ cd servicelog-1.1.9 $ ./debian/rules binary
Install the IBM service and productivity tools DEB packages:
$ dpkg -i librtas_1.3.6-4_powerpc.deb librtas-32bit_1.3.6-5_powerpc.deb \ src_1.3.1.1-11278_powerpc.deb rsct.core.utils_3.1.0.7-11278_powerpc.deb \ rsct.core_3.1.0.7-11278_powerpc.deb devices.chrp.base.servicerm_2.3.0.0-11232_powerpc.deb \ dynamicrm_1.3.9-8_powerpc.deb libvpd2_2.1.3-4_powerpc.deb lsvpd_1.6.11-5_powerpc.deb \ powerpc-ibm-utils_1.2.12-1_powerpc.deb libservicelog-1-1-1-32bit_1.1.11-10_powerpc.deb \ libservicelog_1.1.11-10_powerpc.deb servicelog_1.1.9-9_powerpc.deb
Rebuild the stock Debian kernel package as described in HowTo Rebuild An Official Debian Kernel Package. I've confirmed DLPAR and LPM to successfully work with at least the Debian kernel packages versions
2.6.39-3~bpo60+1
and3.2.46-1~bpo60+1
. On themake menuconfig
step make sure the following kernel configuration options are selected:CONFIG_MIGRATION=y CONFIG_PPC_PSERIES=y CONFIG_PPC_SPLPAR=y CONFIG_LPARCFG=y CONFIG_PPC_SMLPAR=y CONFIG_PPC_RTAS=y CONFIG_RTAS_PROC=y CONFIG_NUMA=y # CONFIG_SPARSEMEM_VMEMMAP is not set CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG_SPARSE=y CONFIG_MEMORY_HOTREMOVE=y CONFIG_ARCH_MEMORY_PROBE=y CONFIG_HOTPLUG_PCI=y CONFIG_HOTPLUG_PCI_RPA=y CONFIG_HOTPLUG_PCI_RPA_DLPAR=y
Or use one of the following trimmed down kernel configuration files:
Install the newly build kernel package, reboot and select the new kernel to be loaded.
A few minutes after the system been started, DLPAR and LPM operations on the LPAR should now be possible from the HMC. A good indication from the HMC GUI is a properly filled field in the “OS Version” column. From the HMC CLI you can check with:
$ lspartition -dlpar ... <#108> Partition:<45*8231-E2D*06AB35T, ststnagios02.lan.ssbag, 10.8.32.46> Active:<1>, OS:<Linux/Debian, 3.2.0-0.bpo.4.ssb.1-powerUnknown, Unknown>, DCaps:<0x2c7f>, CmdCaps:<0x19, 0x19>, PinnedMem:<0> ...
The LPAR should show up in the output and the value of
DCaps
should be different from0x0
.After a successful LPM operation the output of
dmesg
from within the OS should look like this:... [539043.613297] calling ibm,suspend-me on cpu 4 [539043.960651] EPOW <0x6240040000000b8 0x0 0x0> [539043.960665] ibmvscsi 30000003: Re-enabling adapter! [539043.961606] RTAS: event: 21, Type: EPOW, Severity: 1 [539043.962920] ibmvscsi 30000002: Re-enabling adapter! [539044.175848] property parse failed in parse_next_property at line 230 [539044.485745] ibmvscsi 30000002: partner initialization complete [539044.485815] ibmvscsi 30000002: host srp version: 16.a, host partition vios1-p730-222 (1), OS 3, max io 262144 [539044.485892] ibmvscsi 30000002: Client reserve enabled [539044.485907] ibmvscsi 30000002: sent SRP login [539044.485964] ibmvscsi 30000002: SRP_LOGIN succeeded [539044.525723] ibmvscsi 30000003: partner initialization complete [539044.525779] ibmvscsi 30000003: host srp version: 16.a, host partition vios2-p730-222 (2), OS 3, max io 262144 [539044.525884] ibmvscsi 30000003: Client reserve enabled [539044.525897] ibmvscsi 30000003: sent SRP login [539044.525943] ibmvscsi 30000003: SRP_LOGIN succeeded [539044.884514] property parse failed in parse_next_property at line 230 ...
Although this was done some time ago and there now have already been several new versions of the packages from the service and productivity tools, DLPAR and LPM still work with this setup. There will be another installment of this post in the future with updated package versions. Another item on my ToDo list is to provide those components from the service and productivity tools which are available in source code as native Debian packages.
2013-10-04 // SSB puts reliable public transport on the fast track with IBM and SAP
Well, i guess here they are what could very well be construed as my 15 minutes of fame See the IBM case study “SSB puts reliable public transport on the fast track with IBM and SAP” (PDF) about our IBM Power, AIX, storage, SAN, backup and SAP environment or directly download the PDF here. It was an amazing experience, especially getting to know how to be able to write very well and precise, without losing the readers interest or compromising on technical accuracy. Many thanks to everyone involved!
2013-09-03 // HMC Update to 7.7.7.0 SP2
Updating the HMC from v7.7.7.0 SP1 to v7.7.7.0 SP2 was once again and like the previous HMC Update to 7.7.6.0 SP1 very painless. The service pack MH01354 was easily installable from the ISO images via the HMC GUI. The previous restriction of having to disconnect one of the HMCs in a redundant setup seems to have been dropped, at least it wasn't mentioned in the release notes anymore. The readme of the eFixes (MH01367 and MH01373) to be installed on top of the service pack was this time very clear on MH01373 superseding MH01367.
This package includes a fix for HMC Version 7 Release 7.7.0 Service Pack 2. You can reference this package by APAR# MB03714. This fix must be installed on top of HMC Version 7 Release 7.7.0 Service Pack 2 with or without PTF MH01367 installed. PTF MH01373 supersedes PTF MH01367.
The service pack and the additional efixes showed the following output during the update process:
MH01354:
Mangement console corrective service installation in progress. Please wait... Corrective service file offload from remote server in progress... The corrective service file offload was successful. Continuing with HMC service installation... Verifying Certificate Information Authenticating Install Packages Installing Packages --- Installing ptf-req .... --- Installing RSCT .... src-3.1.4.4-13032 rsct.core.utils-3.1.4.4-13032 rsct.core-3.1.4.4-13032 rsct.service-3.5.0.0-1 rsct.basic-3.1.4.4-13032 --- Installing CSM .... csm.core-1.7.1.20-1 csm.deploy-1.7.1.20-1 csm_hmc.server-1.7.1.20-1 csm_hmc.hdwr_svr-7.0-3.4.0 csm_hmc.client-1.7.1.20-1 csm.server.hsc-1.7.1.20-1 --- Installing LPARCMD .... hsc.lparcmd-3.0.0.1-1 ln: creating symbolic link `/usr/hmcrbin/lsnodeid' : File exists ln: creating symbolic link `/usr/hmcrbin/lsrsrc-api' : File exists ln: creating symbolic link `/usr/hmcrbin/mkrsrc-api' : File exists ln: creating symbolic link `/usr/hmcrbin/rmrsrc-api' : File exists --- Installing InventoryScout .... --- Installing Pegasus .... --- Installing service documentation .... --- Updating baseOS .... Corrective service installation was successful.
MH01367: Skipped, because - as mentioned in the release notes of MH01373 - it is superseded by MH01373.
MH01373:
Mangement console corrective service installation in progress. Please wait... Corrective service file offload from remote server in progress... The corrective service file offload was successful. Continuing with HMC service installation... Verifying Certificate Information Authenticating Install Packages Installing Packages --- Installing ptf-req .... Corrective service installation was successful.
The MH01354 updates still shows error messages with regard to symlink creation appearing during the update process. This – admittedly minor – issue and the associated PMR i opened with IBM support are virtually racing towards my all time top ten of downright ridiculous experiences. After a lengthy back and forth with L2 support/development things concluded in their position:
Most of our customers want to see the verbose output during upgrades/updates. There is no defect here.
If the customer wishes to do so he can open a DCR, but there is no defect.
Well, verbose output would indeed be nice, i suppose. An option to choose whether to show or to hide said verbose output would be even nicer. What i really don't care for are constant visual reminders of your lack of knowledge in proper shell scripting, not to mention your awful coding style! So here we are again, a simple, non-critical issue which could be fixed by a trivial change in a matter of minutes and still were spending a disproportionate amount of time arguing about whether it's really an issue or not. But alright, lets waste the time of even more people and open a DCR (MR0809134336) on this issue. And while we're at it, have another one (MR0618131954) for botching the mksysplan
command in case a failover SEA is used on the VIO server.
Aside from that, up to now no issues with the new HMC version.
2013-07-30 // Ganglia Fibre Channel Power/Attenuation Monitoring on AIX and VIO Servers
Although usually only available upon request via IBM support, efc_power
is quite the handy tool when it comes to debugging or narrowing down fibre channel link issues. It provides information about the transmit and receive, power and attenuation values for a given FC port on a AIX or VIO server. Fortunately the output of efc_power
:
$ /opt/freeware/bin/efc_power /dev/fscsi2 TX: 1232 -> 0.4658 mW, -3.32 dBm RX: 10a9 -> 0.4265 mW, -3.70 dBm
is very parser-friendly, so it can very easily be read by a script for further processing. In this case further processing means a continuous Ganglia monitoring of the fibre channel transmit and receive, power and attenuation values for each FC port on a AIX or VIO server. This is accomplished by the two RPM packages ganglia-addons-aix
and ganglia-addons-aix-scripts
:
The package ganglia-addons-aix-scripts
is to be installed on the AIX or VIO server which has the FC adapter installed. It depends on the aaa_base
package for the efc_power
binary and on the ganglia-addons-base
package, specifically on the cronjob (/opt/freeware/etc/run_parts/conf.d/ganglia-addons.sh
) defined by this package. In the context of this cronjob all avaliable scripts in the directory /opt/freeware/libexec/ganglia-addons/
are executed. For this specific Ganglia addon an iteration over all fscsi
devices in the system is done and efc_power
is called for each fscsi
device. Devices can be excluded by assigning a regex pattern to the BLACKLIST
variable in the configuration file /opt/freeware/etc/ganglia-addons/ganglia-addons-efc_power.cfg
. The output of each efc_power
call is parsed and via the gmetric
command fed into a Ganglia monitoring system that has to be already set up.
The package ganglia-addons-aix
is to be installed on the host running the Ganglia webinterface. It contains templates for the customization of the FC power and attenuation metrics within the Ganglia Web 2 interface. See the README.templates
file for further installation instructions. Here are samples of the two graphs created with those Ganglia monitoring templates:
In the section “1” of the graphs, the receive attenuation on FC port fscsi2
was about -7.7 dBm, which means that of the 476.6 uW sent from the Brocade switchport:
$ sfpshow 1/10 Identifier: 3 SFP Connector: 7 LC Transceiver: 540c404000000000 200,400,800_MB/s M5,M6 sw Short_dist Encoding: 1 8B10B Baud Rate: 85 (units 100 megabaud) Length 9u: 0 (units km) Length 9u: 0 (units 100 meters) Length 50u: 5 (units 10 meters) Length 62.5u:2 (units 10 meters) Length Cu: 0 (units 1 meter) Vendor Name: BROCADE Vendor OUI: 00:05:1e Vendor PN: 57-1000012-01 Vendor Rev: A Wavelength: 850 (units nm) Options: 003a Loss_of_Sig,Tx_Fault,Tx_Disable BR Max: 0 BR Min: 0 Serial No: UAF1112600001JW Date Code: 110619 DD Type: 0x68 Enh Options: 0xfa Status/Ctrl: 0x82 Alarm flags[0,1] = 0x5, 0x40 Warn Flags[0,1] = 0x5, 0x40 Alarm Warn low high low high Temperature: 41 Centigrade -10 90 -5 85 Current: 7.392 mAmps 1.000 17.000 2.000 14.000 Voltage: 3264.9 mVolts 2900.0 3700.0 3000.0 3600.0 RX Power: -4.0 dBm (400.1 uW) 10.0 uW 1258.9 uW 15.8 uW 1000.0 uW TX Power: -3.2 dBm (476.6 uW) 125.9 uW 631.0 uW 158.5 uW 562.3 uW
only about 200 uW actually made it to the FC port fscsi2
on the VIO server. Section “2” shows even worse values during the time the FC connections and cables were checked, which basically means that the FC link was down during that time period. Section “3” shows the values after the bad cable was found and replaced. Receive attenuation on FC port fscsi2
went down to about -3.7 dBm, which means that of the now 473.6 uW sent from the Brocade switchport, 427.3 uW actually make it to the FC port fscsi2
on the VIO server.
The goal with the continuous monitoring of the fibre channel transmit and receive, power and attenuation values is to catch slowly deterioration situations early on, before they become a real issue or even a service interruption. As shown above, this can be accomplished with Ganglia and the two RPM packages ganglia-addons-aix
and ganglia-addons-aix-scripts
. For ad hoc checks, e.g. during the debugging of the components in a suspicious FC link, efc_power
is still best to be called directly from the AIX or VIO server command line.