bityard Blog

// Apache Logfile Analysis with AWStats and X-Forwarded-For Headers

When running a webserver behind a (reverse) proxy or load-balancer it's often necessary to enable the logging of the original clients IP address by looking at a possible X-Forwarded-For header in the HTTP request. Otherwise one will only see the IP address of the proxy or load-balancer in the webservers access logs. With the Apache webserver this is usually done by changing the LogFormat directive in the httpd.conf from:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

to something like this:

LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

Unfortunately this solution fails if your so “lucky” to have a setup where clients can access the webserver both directly and through the proxy/load-balancer. In this particular case the HTTP request of the direct access to the webserver has no X-Forwarded-For header and will be logged with a “-” (dash). There are some solutions to this out on the net, utilizing the Apache SetEnvIf configuration directive.

Since this wasn't really satisfying for me, because it prevents you from knowing which requests were made directly and which were made through the proxy/load-balancer, i just inserted the X-Forwarded-For field right after the regular remote hostname field (%h):

LogFormat "%h %{X-Forwarded-For}i %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

An example of two access log entries looks like this: - - - [21/Jul/2012:18:32:45 +0200] "GET / HTTP/1.1" 200 22043 "-" "User-Agent: Lynx/2.8.4 ..." - - [21/Jul/2012:18:32:45 +0200] "GET / HTTP/1.1" 200 22860 "-" "User-Agent: Lynx/2.8.4 ..."

The first one shows a direct access from the client and the second one shows a proxied/load-balanced access from the same client and the proxies/load-balancers IP address

Now if you're also using AWStats to analyse the webservers access logs things get a bit tricky. AWStats has support for X-Forwarded-For entries in access logs, but only for the example in the second code block above as well as the mentioned SetEnvIf solution. This is due to the fact that the responsible AWStats LogFormat strings %host and %host_proxy are mutually exclusive. An AWStats configuration based on the Type 1 LogFormat like:

LogFormat = "%host %host_proxy %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot"

will not work, since %host_proxy will overwrite %host entries even if the %host_proxy field does not contain an IP address, like in the above direct access example. The result is a lot of hosts named “-” (dash) in your AWStats statistics. This can be fixed by the following simple, quick'n'dirty patch to the AWStats sources:
diff -u wwwroot/cgi-bin/ wwwroot/cgi-bin/
--- wwwroot/cgi-bin/     2012-07-18 19:28:59.000000000 +0200
+++ wwwroot/cgi-bin/  2012-07-18 19:30:52.000000000 +0200
@@ -17685,6 +17685,18 @@
+               my $pos_proxy = $pos_host - 1;
+               if ( $field[$pos_host] =~ /^-$/ ) {
+                       if ($Debug) {
+                               debug(
+                                       " Empty field host_proxy, using value "
+                                       . $field[$pos_proxy] . " from first host field instead",
+                                       4
+                               );
+                       }
+                       $field[$pos_host] = $field[$pos_proxy];
+               }
                if ($Debug) {
                        my $string = '';
                        foreach ( 0 .. @field - 1 ) {

Please bear in mind that this is a “works for me”TM kind of solution, which might break AWStats in all kinds of other ways.

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information