1

I have really large nginx log files - as much as 250MB.

When I have ran like 10 days of the month of log files - then the next daily logs would cause my awstats to die. Like so:

/usr/lib/cgi-bin/awstats.pl -config=mydomain.com -update
....
Flush history file on disk (unique hosts reach flush limit of 20000)
Flush history file on disk (unique hosts reach flush limit of 20000)
Killed

I know it has something to do with the big data - because when i delete the awstats generated database file - any (date) log files will run through awstats.pl just fine.

Giacomo1968
  • 3,522
  • 25
  • 38
David
  • 167
  • 1
  • 1
  • 8

2 Answers2

1

It looks to me like you have hit a hard resource limit while processing the logs. There's a good SU page on ulimit you can take a look at. TL;DR would be to take a look at your current limits with 'ulimit -a'; then watch your awstats process with something like 'top' on its next run. You will most likely see it hitting the memory or stack size limits.

  • It’s indeed hitting resource limits. AWStats is not great at DNS lookups. Which is what the `unique hosts reach flush limit` is connected to. Best option is to disabled DNS lookups in AWStats & use GeoIP lookups instead. – Giacomo1968 Nov 17 '13 at 05:02
  • yup its resource limit, thanks guys. i moved the log file into my computer which has 8GB memory and awstast parsed through more than 1GB worth of logs just fine. – David Nov 20 '13 at 21:46
  • Glad I could help! – bizzyunderscore Nov 20 '13 at 23:00
0

The problem is that DNS lookups in AWStats is inherently inefficient & not really that great. Read up on that here. A much better strategy that I use on all servers I setup is to use GeoIP lookups instead of domain name lookups. More details in this tutorial here. Basically you will go into your AWStats config file for mydomain.com—which should be awstats.mydomain.com.conf—if I recall the formatting scheme correctly. And setting DNSLookup=0. But then to resolve IP addresses to something more than just numbers, you need GeoIP setup. If you don’t care about adding more data to IPs, you don’t have to do anything else but disable DNS lookups. But sharing my method of using GeoIP lookups just in case.

Remember, the process can be complex if you are not comfortable with doing compiling on your own packages, but this is how I do it under Ubuntu 12.04 LTS.

First, get the GeoIP tool from MaxMind:

wget http://www.maxmind.com/download/geoip/api/c/GeoIP-latest.tar.gz

Extract the archive:

tar -xvzf GeoIP-latest.tar.gz

It should be GeoIP-1.5.1` but check what it expands to if the version changed. Assuming it is still version 1.5.1, go into the directory:

cd ./GeoIP-1.5.1

In a few cases I have had to run libtoolize to get the configuration to work:

libtoolize -f

Then do “the usual” config & make routine:

./configure
make
make check
sudo make install

In some cases, I have had to run dh-autoreconf as described here:

sudo aptitude install dh-autoreconf

Then do the following in your GeoIP-1.5.1 directory:

autoreconf --force --install
./configure
make

Okay, if that is done & works, move onto the CPAN component that will bridge the Perl AWStats code with GeoIP functions:

sudo cpan Geo::IP::PurePerl Geo::IP

Okay, that went well? Now get the databases from MaxMind:

wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
wget http://geolite.maxmind.com/download/geoip/database/asnum/GeoIPASNum.dat.gz

Move them into the local /usr/share/GeoIP/ directory:

sudo mv ~/GeoIP.dat.gz /usr/share/GeoIP/
sudo mv ~/GeoLiteCity.dat.gz /usr/share/GeoIP/
sudo mv ~/GeoIPASNum.dat.gz /usr/share/GeoIP/

And decompress them:

sudo gzip -d /usr/share/GeoIP/GeoIP.dat.gz
sudo gzip -d /usr/share/GeoIP/GeoLiteCity.dat.gz
sudo gzip -d /usr/share/GeoIP/GeoIPASNum.dat.gz

Now in the awstats.mydomain.com.conf do the following. First, find the DNSLookup line and disable it:

DNSLookup=0

Now find the line that has GEOIP_STANDARD and add this or edit that line to account for the three new databases:

LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"
LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /usr/share/GeoIP/GeoIPCity.dat"
LoadPlugin="geoip_org_maxmind GEOIP_STANDARD /usr/share/GeoIP/GeoIPASNum.dat"

Make sure those .dat files match the names of the files that are actually in /usr/share/GeoIP/. All good? Great! Now rerun your AW Stats command. To confirm the GeoIP plugins are in place, check the very bottom of the AWStats page after you run the AWStats scripts with the GeoIP settings in place. It should read something like:

Created by awstats (plugins: geoip_org_maxmind, geoip_city_maxmind, geoip)

This is probably more info than you need, but hope this helps!

Giacomo1968
  • 3,522
  • 25
  • 38
  • hi Jake, appreciate for the very informative reply on GeoIP but unfortunately, i have my DNS Lookup turned off so this isn't a case for me. – David Nov 19 '13 at 17:15
  • Hey David, well, if you say DNS lookups are turned off, then how are you getting the errors that read: `Flush history file on disk (unique hosts reach flush limit of 20000)`? From my experience, `unique hosts reach flush limit` refers to DNS lookups in AW Stats. – Giacomo1968 Nov 19 '13 at 17:48
  • 1
    Hey Jake, it just means that awstats already parsed 20,000 hosts and its going to probably flush the data into its database file and then continuing on to parsing the log file. bizzyunderscore was actually correct it was a memory issue. i moved the log files to a 8GB memory machine and it work just fine. my remote server had 512 MB only. – David Nov 20 '13 at 21:45
  • Well hey, all good. – Giacomo1968 Nov 20 '13 at 21:47