bityard Blog

// HMC Update to 7.7.5.0

Today i did an update of our two IBM HMC appliances from v7.7.4.0 SP2 to v7.7.5.0 (MH01311 and MH01312). Like in other cases with HMC, AIX, TSM, SVC, storage and Power systems microcode over the last year or so, IBM failed to impress with the product they released. I guess this is what happens when as a vendor you start skipping quality control altogether and start doing your beta testing in the field.

First off was the update process, which was a huge pain in the rear. Who in this day and age dreams up an update process that is only described as purley media based? I guess i didn't read the system requirement that states, you have to be located geographically near your HMC equipment. After digging around for a while, i found the nice TechDoc 635220142 over at the i5 folks, which in detail describes how to do a network based update. The question remains why this really nice description didn't make it into the release notes?

Next up was the backup of the managed system profile and HMC upgrade data, which could only be saved locally or on a removeable media (DVD or USB). Again, network (NFS, SCP, FTP, etc.) anyone?

The update itself went fine, except for the usual error output on the KVM console which make you wonder if anyone is ever going to fix the messages that have been popping up there for ages. Apparently it is a considerable challenge to check for the existence of a symlink before trying to create one!

The first things i noticed after the update were that now the “OS Version” column actually displays the ioslevel instead of the oslevel for VIOS partitions (nice!), and that our Nagios monitoring wasn't working anymore for the HMC CPU check (WTF?). Manually checking on the HMC CLI showed this:

hscroot@hmc:~> monhmc -n 0 -r proc
/opt/hsc/bin/MONHmc: line 80: awk: command not found

That's a nice one, isn't it? Could probably be easily fixed like this:

diff -u MONHmc.orig MONHmc 
--- /opt/hsc/bin/MONHmc.orig 2012-06-19 19:34:15.000000000 +0200
+++ /opt/hsc/bin/MONHmc      2012-06-19 19:13:41.000000000 +0200
@@ -77,7 +77,7 @@
    if [ ! -f ${HOME}/.toprc ];then
       /bin/cp /opt/hsc/data/toprc ${HOME}/.toprc
    fi
-   /usr/bin/top -b -n 2 -p $PPID | awk '/^top/{i++}i==2' | /usr/bin/grep -i cpu[0-9,\(s]
+   /usr/bin/top -b -n 2 -p $PPID | /usr/bin/awk '/^top/{i++}i==2' | /usr/bin/grep -i cpu[0-9,\(s]
 }
 
 showMem()

if one only had unrestricted access to the Linux system running within the appliance.

I'm pretty sure those were only the first roadblocks to be hit, stay tuned …

// Webserver - Windows vs. Unix

Recently at work, i was given the task of evaluating alternatives for the current OS platform running the company homepage. Sounds trivial enough, doesn't it? But every subject in a moderately complex corporate environment has some history, lots of pitfalls and a considerable amount of politics attached to it, so why should this particular one be an exception.

The current environment was running a WAMP (Windows, Apache, MySQL, PHP) stack with a PHP-based CMS and was not performing well at all. The systems would cave under even minimal connection load, not to mention user rushes during campaign launches. The situation dragged on for over a year and a half, while expert consultants were brought in, measurements were made, fingers were pointed and even new hardware was purchased. Nothing helped, the new hardware brought the system down even faster, because it could serve more initial user requests thus effectively overrunning the system. IT management drew a lot of fire for the situation, but nontheless stuck with the “Microsoft, our strategic platform” mantra. I guess at some point the pressure got too high for even those guys.

This is where i, the Unix guy with almost no M$ knowledge, got the task of evaluating whether or not an “alternative OS platform” could do the job. Hot potatoe, anyone?

So i went on and set up four different environments that were at least somewhere within the scope of our IT departments supported systems (so no *BSD, no Solaris, etc.):

  1. Linux on the newly purchased x86 hardware mentioned above

  2. Linux on our VMware ESX cluster

  3. Linux as a LPAR on our IBM Power systems

  4. AIX as a LPAR on our IBM Power systems

Apache, MySQL and PHP were all the same version as in the Windows environment. The CMS and content were direct copies from the Windows production systems. Without any special further tweaking i ran some load tests with siege:

Webserver performance comparison - Transaktions per second Webserver performance comparison - Response time

Compared to the Windows environment (gray line), scenario 1 (dark blue line) was giving about 5 times the performance on the exact same hardware. The virtualized scenarios 2, 3 and 4 did not perform so well in absolute values. But since their CPU resources were only about 1/2 of the ones available in scenario 1, their relative performance isn't too bad after all. Also notable is the fact that all scenarios served requests up to the test limit of a thousend parallel clients. Windows started dropping requests after about 300 parallel clients.

Presented with those numbers, management decided the company webserver environment should be migrated to an “alternative OS platform”. AIX on Power systems was chosen for operational reasons, even though it didn't have the highest possible performance out of the tested scenarios. The go-live of the new webserver environment was wednesday last week at noon, with the switchover of the load-balancing groups. Take a look what happened to the response time measurements around that time:

Webserver performance - Daily after migration

Also very interesting is the weekly graph a few days after the migration:

Webserver performance - Weekly after migration

Note the largely reduced jitter in the response time!

// Cacti Monitoring Templates and Nagios Plugin for TMS RamSan-630

Some time ago we got two TMS RamSan-630 SAN-based flash storage arrays at work. They are integrated in our overall SAN storage architecture and thus provide their LUNs to the storage virtualization layer based on a four node IBM SVC cluster. The TMS LUNs are used in two different ways. Some are used as dedicated flash-backed MDiskGroups for applications with moderate space, but very high I/O and very low latency requirements. Some are used in existing disk-based MDiskGroups as an additional SSD-tier, using the SVCs “Easy Tier” feature to do a dynamic relocation of “hot” extends to the flash and “cold” extends from the flash. With the two different use cases we try to get an opimal use out of the TMS arrays, while simultaniously reducing the I/O load on the existing disk based storages.

So far the TMS boxes work very well, the documentation is nothing but excellent. Unlike other classic storage arrays (e.g. IBM DS/DCS, EMC Clariion, HDS AMS, etc.) the TMS arrays are conveniently self-contained. All management operations are available via a telnet/SSH interface or an embedded WebGUI, no OS-dependent management software is neccessary. All functionality is already available, no additional licenses for this and that are neccessary. Monitoring could be improved a bit, especially the long term storage of performance metrics. Unfortunately only the most important performance metrics are presented via SNMP to the outside, so you can't really fill that particular gap yourself with a third party monitoring application.

With the metrics that are available via SNMP i created a Nagios plugin for availability and health monitoring and a Cacti template for performance trends. The Nagios configuration for the TMS arrays monitors the following generic services:

  • ICMP ping.

  • Check for the availability of the SNMP daemon.

  • Check for SNMP traps submitted to snmptrapd and processed by SNMPTT.

in addition to those, the Nagios plugin for the TMS arrays monitors the following more specific services:

  • Check for the overall status (OID: .1.3.6.1.4.1.8378.10.1.3.0).

  • Check for the fan status (OID: .1.3.6.1.4.1.8378.10.1.6.0.1.6).

  • Check for the temperature status (OID: .1.3.6.1.4.1.8378.10.1.6.1.1.6).

  • Check for the power status (OID: .1.3.6.1.4.1.8378.10.1.6.2.1.6).

  • Check for the FC connectivity status (OID: .1.3.6.1.4.1.8378.10.2.1.5).

The Cacti templates graph the following metrics:

  • FC port bandwidth usage.

    Example Read/Write Bandwidth on Port fc-1a:

    Example Read/Write Bandwidth on Port fc-1a

    Example Read/Write Bandwidth on Port fc-1b:

    Example Read/Write Bandwidth on Port fc-1b

  • FC port cache values (although they seem to remain at zero all the time).

  • FC port error values.

  • FC port received and transmitted frames.

  • FC port I/O operations.

    Example Read/Write IOPS on Port fc-1a:

    Example Read/Write IOPS on Port fc-1a

    Example Read/Write IOPS on Port fc-1b:

    Example Read/Write IOPS on Port fc-1b

  • Fan speed values.

  • Voltage and current values.

  • Temperature values.

The Nagios Plugin and the Cacti templates can be downloaded here Nagios Plugin and Cacti Templates. Beware that they should be considered as quick'n'dirty hacks which should generally work but don't come with any warranty of any kind ;-)

// SAP Gateway, Firewalls and TCP Keepalives

If you're maintaining moderately complex SAP landscapes with network connections traversing firewall or other devices with access lists, you're bound the experience network connection issues sooner or later. Usually this is due to the dynamic nature of modern firewall ACLs, which are on demand being build up and teared down again based on the actual packet flow. SAP RFC and other network connections on the other hand are set up once and can remain inactive for an indefinite amount of time until the next attempt for data exchange becomes neccessary. The order of events where both kinds of behaviour become a problem is roughtly something like this:

  1. The SAP processes initiate a network connection to a remote system. With TCP based connections this causes the OS to send a SYN packet to the remote system.

  2. A firewall along the way recognises the SYN packet as an attempt to build up a connection. It compares the parameters (usually source and destination IP addresses and port numbers) of the connection request to a set defined rules, finds an entry allowing the connection to be made and inserts a temporary rule into its state table. Along with the original connection request a rule is also added to allow the corresponding traffic in the reverse direction. Both state table entries are associated with predefined timeout values.

  3. At some point the SAP processes finish their data exchange, but the connection is not being teared down. It's usually kept up for future communication and to avoid the overhead introduced by the connection buildup.

  4. As soon as the SAP processes stop exchanging data the state table entry timeout counters of the firewall start ticking down. Once the time of communication inactivity has reached the predefined timeout values, the state table entries are removed. The firewall will now block future communication attempts, unless it's a connection initiation containing a SYN packet.

  5. At some point the SAP processes want to start to exchange data over the connection again. From their perspektive the connection is still established, so there seems to be no need to initiate it again with a SYN packet. The non-SYN packets arrive at the firewall which either drops them silently or sends a RST packet back. Either way, this causes a connection breakdown within the SAP system.

The actual issue here is, that the firewall or network device has no knowledge on how the SAP systems intends to use the network connection. This is a design implication of the independent layers of the OSI stack and actually not a SAP specific problem. The issue described above is usually addressed by sending empty keepalive packets in regular intervals once the actual data transfer cedes for a configurable amount of time. This simulates ongoing network traffic over the connection and in effect keeps the state table enties from timing out. The transmission of keepalive packets is handled by the network stack of the OS and the application has to request sending them via an option of OS system call to set up the network connection. SAP has an instance profile configuration parameter to request keepalives at the start of the SAP system:

gw/so_keepalive = 1

(see SAP Note 743888). The parameter can also be queried and dynamically changed via the SAP Transaction SMGW → Goto → Parameters → Display/Change. A change performed this way is only valid until the next start of the Gateway process and only for connections established after the parameter has been changed.

Another problem arises if the timeout values for sending keepalives (e.g. 2 hours) and the timeout values for state table enties (e.g. 10 min.) are not properly matched. Obviously the timeout value for sending keepalives needs to be lower or equal than the timeout value for state table entries. Otherwise the system might start sending keepalives well after the state table entries have already been removed. Since the handling of keepalive packets is done by the OS the timeout values need to be set there (see SAP Note 1410736). They are OS specific and they apply globally to the OS network stack. For example a 10 min. timeout for keepalives and a resend interval of every 10 min. for IBM AIX is set by:

no -p -o tcp_keepidle = 600
no -p -o tcp_keepintvl = 600

Since those are global values you need to choose the lowest required value of all applications running on the system and of all the network devices involved.

In order to determine if an existing network connection has keepalives enabled, you can use tcpdump to sniff the actual network traffic. On busy systems this produces copious amounts of data and might be difficult to catch because you have to wait for the keepalive mechanism to kick in. With AIX there's another way to determine the socket options of existing network connections:

  1. Run the netstat -Aan command and find the network connection in question. Gather PCB/ADDR value for that connection from the left-most column:

    $ netstat -Aan | egrep "PCB|tcp"
    PCB/ADDR         Proto Recv-Q Send-Q  Local Address      Foreign Address    (state)
    ...
    f1000e0003a7d3b8 tcp4       0      0  1.2.3.4.3311       4.3.2.1.53487      ESTABLISHED
    ...
    
  2. Start the kernel debugger kdb with root privileges. Within kdb at the (0)> prompt run the sockinfo command on the PCB/ADDR value gathered during the previous step. Since this prints all the socket option values, filter the output with the grep command for the KEEP keyword:

    $ kdb
    (0)> sockinfo f1000e0003a7d3b8 tcpcb | grep KEEP
        t_timer....... 0000021B (TCPT_KEEP)
        opts........ 000C (REUSEADDR|KEEPALIVE)
    (0)>
    

    The filtered output shows:

    • a KEEPALIVE flag in the opts line, indicating that the socket option SO_KEEPALIVE is set for this connection and keepalive packets will be sent.

    • a hex value of 0000021B in the t_timer line, representing the time in half-seconds that is left before the next keepalive packet is sent. In this example the next keepalive packet will be sent in 269.5 seconds (21B hex half-seconds == 539 decimal half-seconds == 269.5 decimal seconds).

// Cleanup leftover SAPOSCOL shared memory segment

The SAP saposcol process sometimes refuses to start, claiming there's already a saposcol process running even if there really isn't. Most of the time i found that this is due to a leftover shared memory segment. This usually happens when saposcol was previously not properly shutdown or otherwise being killed. Normally a:

$ saposcol -c pf=<path to profile>

takes care of this, but sometimes even that doesn't seem to do the trick. In this case SAP Note 548699, especially item 7, comes in handy. Basically, you look for the leftover shared memory segment identified by the saposcols key and you remove it manually with the OS tools:

$ ipcs -ma | grep '4dbe'
T        ID     KEY        MODE       OWNER    GROUP  CREATOR   CGROUP NATTCH     SEGSZ  CPID  LPID     ...
m   1048576 0x00004dbe --rw-rw-rw-     root   system     root   system      1   1839766 4915206 5046426 ...

$ ipcrm -m 1048576

Once the shared memory segment has been removed, saposcol can be started again. Be careful though, in recent versions or if you installed the SMD agent, saposcol is started from the SAP host agent!

Related or otherwise interesting SAP Notes: 710975.

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information