2

I don't want to give an external company like s3stat access to my logs. I know that AWS logs S3 and Cloud Front in a format readable by AWStats. Has anyone used AWSats to analyze them?

S3stat used to offer a hosted version of their software that was in beta but I believe it has been discontinued.

I am not tied to AWStats, I will consider other self hosted web log analyzing software options.

ckliborn
  • 2,750
  • 4
  • 24
  • 36
  • I struggled a long time until I came upon this on a forum: LogFormat="ogFormat="%date %time2 %x-edge-location %sc-bytes %c-ip %cs-method %cs(Host) %cs-uri-stem %sc-status %cs(Referer) %cs(User-Agent) %cs-uri-query %cs(Cookie) %x-edge-result-type %x-edge-request-id". Now awstats doesn't complain at all. The format given in the answer didn't work for me (dropped records or corrupted depending how I tweaked things). Use this to update awstats: perl /usr/share/awstats/tools/logresolvemerge.pl S3/bucket/* > awstats/input.txt && perl /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -update -config=XXX – Aki Nov 27 '13 at 13:43

1 Answers1

2

I don't use AWStats with S3, but would suggest there are 3 problems with processing the logs:

  1. You need to obtain the data - it is stored on S3

    With Cloudfront, AWS gives you the option of which bucket you wish to use - it does not have to be the source (origin) bucket. You can easily setup a specific bucket for your logs and can mount this via s3fs - this should provide the simplest access to the files - retaining the timestamps, etc. that are often needed for incremental processing of logs. Alternatively, if you don't wish to mount a bucket as a local file system, you could use s3cmd, aws, or one of the SDKs to download the files. (There is a python script (using boto) for this purpose - here - although, I can't vouch for its effectiveness.)

  2. You need to decompress and combine the logs

    Cloudfront logs are compressed (gzipped), and stored as multiple files - the filenames contain the date and hour (e.g. XXXXXXXXXXXXX.YYYY-MM-DD-HH.XXXXXXXXX), although, there can be multiple files per hour. The files can be decompressed with gunzip and combined with the (AWStats provided tool) logresolvemerge.pl.

  3. You need to provide a custom log format to AWStats

    The file format is tab separated and resembles:

    date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query
    2011-06-27  08:31:10    JFK1    587 xxx.xxx.xxx.xxx GET xxxxxxxxxxxxxx.cloudfront.net   /path/to/your/file  304 http://www.mydomain.com/page/requesting/file    User-agent-string - 

    You would, therefore, setup AWStats with:

    LogType=W
    LogSeparator="\t"
    LogFormat="%time2 %cluster %bytesd %host %method %virtualname %url %code %referer %ua %query"
cyberx86
  • 20,620
  • 1
  • 60
  • 80