5

I'm trying to find a way to parse our Amazon S3 access logs to get some webstats.

I've been trying to use AWStats 7, but I got to the point of where after day 9 of a given month it can't process any more logs because it runs out of memory. This server has 4gigs of memory

Our S3 logs are rather big(~1gig/day) and soon CloudFront logs could be 10-20gigs/day.

Is there any software that can generate webstats from S3(and soon cloudfront) logs?

I know about s3stat.com but I want something I can run on my own.

Mxx
  • 2,312
  • 2
  • 26
  • 40
  • I process them using Webalizer with a little Python script to moosh the data around a bit so it works properly. Run each log through webalizer individually to update the database rather than combining the logs into one big file and trying to process that. – Smudge Sep 26 '11 at 14:44
  • Do you have code that converts s3 logs into format that webalizer can understand? or preferably patch for webalizer to understand s3/cloudfront logs? for now my logs are split by day. – Mxx Sep 26 '11 at 15:16
  • Not offhand, I'll see if I can grab it when I get home tonight and stick it on gist – Smudge Sep 26 '11 at 15:30
  • Hey @Sam did you have a chance to find that script? – Mxx Oct 05 '11 at 03:31

2 Answers2

3

I'd suggest GoAccess. We are parsing about 120 million hits in about ~35mins, which is way faster than awstats. Seems like it doesn't consume a lot ram. (< 1GB) It's running on a 8GB RAM system.

You should give it a try though.

Mike
  • 31
  • 1
  • Note: I'm not familiar with Amazon S3 access logs. So feel free to delete this answer if it doesn't apply to it. – Mike Sep 27 '11 at 02:23
  • GoAccess looks very interesting and I'll be trying it on our servers. Thank you for sharing. However, I don't think it's a good match for this task. GoAccess is designed more for a snapshot-like realtime/near-realtime information. We don't need this kind of immediacy. A day old stats are sufficient. Even a week old is good too since we need it more for historical/analytical information. Also it looks like GoAccess can't natively parse S3 logs or save parsed reports. Each time going through 30gigs+/month of logs will be slow. Also we'd have to keep all the longs, I really don't want that. – Mxx Sep 27 '11 at 03:00
  • This actually looks pretty interesting for Amazon S3 or Cloudfront: https://pypi.python.org/pypi/s3stat – Kayla Jul 31 '14 at 13:04
0

I'd consider running karmasphere analyst on EMR to run SQL queries against your CloudFront log directory (KSA knows how to query from bucket->folder->gzip->.log)

http://aws.amazon.com/elasticmapreduce/karmasphere/

Gil
  • 1
  • 1
    Is there some ready to use templates/presets for karmasphere for web stats? I think it'd take me forever to think of and write all the standard things I'd expect to see from webstats package.(plus as of right now I never worked with EMR) – Mxx Feb 07 '12 at 04:40