CounterGuide.com Article

How log analyzers work

Log analysis is the most reliable way to track hits to your site, though it may be difficult for the average user to set up.

Every major web server records every request that it receives in a "log file". This is just a text file which is continually appended to with some information from each request in a standard format.

Here are some lines from my log file:


83.216.4.207 - - [05/Mar/2006:06:20:31 -0500] "GET / HTTP/1.1" 200 864 "http://www.adeveloper.com/resource.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; hu-HU; rv:1.7.12) Gecko/20050919 Firefox/1.0.7"
83.216.4.207 - - [05/Mar/2006:06:20:32 -0500] "GET /counterguide.css HTTP/1.1" 200 1172 "http://www.counterguide.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; hu-HU; rv:1.7.12) Gecko/20050919 Firefox/1.0.7"
83.216.4.207 - - [05/Mar/2006:06:20:33 -0500] "GET /layout_images/background_black.gif HTTP/1.1" 200 107 "http://www.counterguide.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; hu-HU; rv:1.7.12) Gecko/20050919 Firefox/1.0.7"
83.216.4.207 - - [05/Mar/2006:06:20:33 -0500] "GET /layout_images/cguide_main3.gif HTTP/1.1" 200 4331 "http://www.counterguide.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; hu-HU; rv:1.7.12) Gecko/20050919 Firefox/1.0.7"
83.216.4.207 - - [05/Mar/2006:06:20:34 -0500] "GET /favicon.ico HTTP/1.1" 404 229 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; hu-HU; rv:1.7.12) Gecko/20050919 Firefox/1.0.7"
65.214.84.41 - - [05/Mar/2006:06:28:48 -0500] "GET /listing/pay/32.html HTTP/1.0" 200 634 "-" "Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)"
88.110.201.216 - - [05/Mar/2006:06:30:11 -0500] "GET /rev/rapid_axcess.shtml HTTP/1.1" 302 26 "http://www.euronet.nl/users/w_solarz/count.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; (R1 1.5); .NET CLR 1.1.4322; .NET CLR 2.0.50727; Alexa Toolbar)"


This log file shows three different entities accessing this site.

The first several lines are all from the same user -- their browser has made multiple requests for different page elements in addition to the page itself, such as image files, CSS files and the like. This illustrates how one visit can produce many, many hits.

You can also see a spider in these results -- the ask jeeves spider. Often robot traffic from Googlebot, Yahoo! Slurp, or others may be very high -- often higher than the number of real users.

You can also configure your web server to record more or less information. In Apache or IIS, it's quite straightforward -- assuming, naturally, you have access to your web server configuration. For instance, you can set your server to record the HTTP_REFERER header with every hit (see the article About tracking referers).

Log analysis programs analyze these log files into reports. These programs range from the very, very simple to the very, very complex. For instance, some may try to identify unique users and track their paths through your site. Others may simple compile each piece of information into a table and make a pretty graph.

What are the advantages to this method of hit tracking?

It's reliable. Hosted counters will get overloaded and fail sometimes, and they will not be able to log at all certain kinds of requests (see the free hosted counter directory or the how hosted counters work article). Server logs are simply the best record there is.

It's already there. Your server logs are already sitting there even if you don't use them, recording every request that comes in. Why not analyze them?

Why wouldn't you want to use web server log analysis?

For one thing, if you host your site at a large, shared host, you may not have access to the logs in the first place! This is obviously a non-starter. However, many hosts do provide users with their own access logs. If you use a shared host, ask your hosting provider where your log files can be accessed.

Second, setting up a log analyzer can be a bit of a hassle, even for experienced users.

The program ideally should be set to analyze the log files repeatedly at a certain schedule, so that the analysis is updated and hit rates can be remembered indefinitely.

Logs do not last forever -- they are "rotated" at a certain schedule. On my server, the log files last a week, after which the existing log file is moved from "access_log" to "access_log.1" and "access_log" is reset. Older weekly log files are kept around for 4 weeks.

Log analysis programs must be able to deal with log rotation optimally. They must run in a way that does not miss and does not double count hits. Some tools are intelligent enough to skip log file entries that they've already seen before. They must run in a way that is compatible with your particular schedule for rotating logs, especially if the log analysis is set to run automatically at a certain schedule. (For an example of how to set this up using Apache and AWStats, see here.)

If you can't manage that, then your log analysis will be inaccurate or may not even work at all.

To find a log analyzer program, see the Log analysis tool list. For an alternative, see the article see the how hosted counters work article or the free web counter list.


See more articles

Return to the main page