0

I moved a web service to a new server. I figured out that, in the old server, the following gives approximately the same number of hits as awstats (e.g., for a given day the following gives 5537, whereas awstats indicates 5557 hits):

grep -v bot myaccess.log|     # file contains given vhost for given date range \
grep -v rss2email|\
grep -v Slurp|\
grep -v pider|                # Ignore spiders \
egrep 'HTTP/.... (200|304) '| # Catch only 200 and 304 responses \
grep -v Wget|\
grep -v Bot|\
grep -v rawler|               # Ignore crawlers \
grep -v favicon.ico|\
grep -v robots.txt|\
grep -v HTTrack|\
grep -v simplepie|\
grep -v BingPreview|\
wc -l

Adding the following to the end of the grep chain gives approximately the same number of pages (e.g. 2916 for a given day) as awstats (3042):

egrep -v '(css)|(js)|(class)|(gif)|(jpg)|(jpeg)|(png)|(bmp)|(ico)|(swf) HTTP'

Now, I moved server. Lots of things changed: apache became nginx; the log format changed; the awstats configuration has been rewritten; Debian squeeze became wheezy, and awstats 6.9.5 became 7.0.

The large pipeline above still approximates awstats's hits well (e.g. 5521 vs. 5541), but adding the egrep that excludes the NotPageList does not: I get, for a given day, 2948, whereas awstats gives 1580. (Whether the exclusion list contains rss and xml makes no significant difference.) Indeed, since the day the service was moved, the hits remain approximately the same, whereas the pages and the visits have approximately halved. I can't figure out why.

Antonis Christofides
  • 2,556
  • 2
  • 22
  • 35
  • 1
    Try analyzing both old and new server logs with some other tool, too. Fine examples are Webalizer, Analog and Visitors. This way you can see if there's something wrong with awstats. – Janne Pikkarainen Aug 08 '14 at 10:25

1 Answers1

2

The main difference appears to be a new feature in awstats 7.0: downloads. It assumes that certain file extensions (pdf, zip, txt, mp3, doc, ppt, and more) are "downloads". Older awstats versions considered these to be "pages".

I also understand that each new version of awstats has a more complete filter-out list (e.g. a more complete bot list), resulting in each new version reporting fewer pages (overestimation always exists and is probably impossible to eliminate entirely); but this must have a smaller effect.

Antonis Christofides
  • 2,556
  • 2
  • 22
  • 35