Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
2012:06:08:sap_gateway_firewalls_and_tcp_keepalives [2012/06/09 00:23] – created Frank Fegert | 2012:06:08:sap_gateway_firewalls_and_tcp_keepalives [2012/06/10 13:13] (current) – Frank Fegert | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== SAP Gateway, Firewalls and TCP Keepalives ====== | ====== SAP Gateway, Firewalls and TCP Keepalives ====== | ||
- | If you're maintaining moderately complex SAP landscapes with network connections traversing firewall or router | + | If you're maintaining moderately complex SAP landscapes with network connections traversing firewall or other devices with access lists, you're bound the experience network connection issues sooner or later. Usually this is due to the dynamic nature of modern firewall ACLs, which are on demand being build up and teared down again based on the actual packet flow. SAP RFC and other network connections on the other hand are set up once and can remain inactive for an indefinite amount of time until the next attempt for data exchange becomes neccessary. The order of events where both kinds of behaviour become a problem is roughtly something like this: |
- The SAP processes initiate a network connection to a remote system. With TCP based connections this causes the OS to send a SYN packet to the remote system. < | - The SAP processes initiate a network connection to a remote system. With TCP based connections this causes the OS to send a SYN packet to the remote system. < | ||
- | - A firewall along the way recognises the SYN packet as an attempt to build up a connection. It compares the parameters (usually source and destination | + | - A firewall along the way recognises the SYN packet as an attempt to build up a connection. It compares the parameters (usually source and destination |
+ | - At some point the SAP processes finish their data exchange, but the connection is not being teared down. It's usually kept up for future communication and to avoid the overhead introduced by the connection buildup. < | ||
+ | - As soon as the SAP processes stop exchanging data the state table entry timeout counters of the firewall start ticking down. Once the time of communication inactivity has reached the predefined timeout values, the state table entries are removed. The firewall will now block future communication attempts, unless it's a connection initiation containing a SYN packet. < | ||
+ | - At some point the SAP processes want to start to exchange data over the connection again. From their perspektive the connection is still established, | ||
+ | The actual issue here is, that the firewall or network device has no knowledge on how the SAP systems intends to use the network connection. This is a design implication of the independent layers of the OSI stack and actually not a SAP specific problem. The issue described above is usually addressed by sending empty keepalive packets in regular intervals once the actual data transfer cedes for a configurable amount of time. This simulates ongoing network traffic over the connection and in effect keeps the state table enties from timing out. The transmission of keepalive packets is handled by the network stack of the OS and the application has to request sending them via an option of OS system call to set up the network connection. SAP has an instance profile configuration parameter to request keepalives at the start of the SAP system: | ||
+ | |||
+ | < | ||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | Another problem arises if the timeout values for sending keepalives (e.g. 2 hours) and the timeout values for state table enties (e.g. 10 min.) are not properly matched. Obviously the timeout value for sending keepalives needs to be lower or equal than the timeout value for state table entries. Otherwise the system might start sending keepalives well after the state table entries have already been removed. Since the handling of keepalive packets is done by the OS the timeout values need to be set there (see | ||
+ | [[https:// | ||
+ | |||
+ | < | ||
+ | no -p -o tcp_keepidle = 600 | ||
+ | no -p -o tcp_keepintvl = 600 | ||
+ | </ | ||
+ | |||
+ | Since those are global values you need to choose the lowest required value of all applications running on the system and of all the network devices involved. | ||
+ | |||
+ | In order to determine if an existing network connection has keepalives enabled, you can use tcpdump to sniff the actual network traffic. On busy systems this produces copious amounts of data and might be difficult to catch because you have to wait for the keepalive mechanism to kick in. With AIX there' | ||
+ | - Run the '' | ||
+ | <cli> | ||
+ | $ netstat -Aan | egrep " | ||
+ | PCB/ | ||
+ | ... | ||
+ | f1000e0003a7d3b8 tcp4 | ||
+ | ... | ||
+ | </ | ||
+ | - Start the kernel debugger '' | ||
+ | <cli> | ||
+ | $ kdb | ||
+ | (0)> sockinfo f1000e0003a7d3b8 tcpcb | grep KEEP | ||
+ | t_timer....... 0000021B (TCPT_KEEP) | ||
+ | opts........ 000C (REUSEADDR|KEEPALIVE) | ||
+ | (0)> | ||
+ | </ | ||
+ | * a '' | ||
+ | * a hex value of 0000021B in the '' | ||
+ | < | ||