20

I have a bunch of Apache log files that I would like to analyze. I'm looking for a tool that doesn't require much setup; something that I can run a log through the command line, without messing around on our live web servers.

Any recommendations?

mmattax
  • 1,284
  • 7
  • 19
  • 30

7 Answers7

8

While the tools above are all cool I think I know what the questioner was asking. It often pains me that I can't pull the information out of an access-log in the way I can with other files.

It's because of the dumb access log format:

127.0.0.1 - - [16/Aug/2014:20:47:29 +0100] "GET /manual/elisp/index.html HTTP/1.1" 200 37230 "http://testlocalhost/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"

Why did they use [] for the date and "" for other things? did they think we wouldn't know a date was in field 4? It's incredibly frustrating.

The best tool right now for this is gawk:

gawk 'BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" } { print $5 }'

on the data above this would give you:

"GET /manual/elisp/index.html HTTP/1.1"

In other words, the FPAT gives you the ability to pull out the fields of the apache-log as if they were actual fields instead of just space separated entities. This is always what I want. I can then parse that a bit more with a pipeline.

Making the FSPAT work is defined here: http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

You can therefore set up an alias to make a gawk that can parse apache logs:

alias apacheawk="gawk -vFPAT='([^ ]+)|(\"[^\"]+\")|(\\\\[[^\\\\]]+\\\\])' " apacheawk '$6 ~ /200/ { print $5 } | sort | uniq

made this for me:

"GET / HTTP/1.1"
"GET /manual/elisp/index.html HTTP/1.1"
"GET /manual/elisp/Index.html HTTP/1.1"
"GET /scripts/app.js HTTP/1.1"
"GET /style.css HTTP/1.1"

and of course almost anything else is now possible.

Enjoy!

nic ferrier
  • 263
  • 2
  • 8
  • 1
    2 remarks: The date is not really in field 4 but in fields 4 + 5 ^^ (without the shift from GMT, the date has little value). And an access_log has most of the time 12 fields form (actually, there could be more than 12 fields, as the 12th is the http agent, which can contain many spaces in its name too.. the first 11 fields are easy to parse, and the 12th field (and maybe more) remaining should be the http agent). So you can just: `awk '($9 == 200) {print $6,$7,$8}'` to display the same thing as in your example. No need to use FPAT there (even though this method can be usefull in other cases) – Olivier Dulac Feb 10 '15 at 13:50
  • I think you're over criticizing. The date is in field 4 if you consider the field to be bounded by []. Most of the time a log file is in one time zone, so the zone isn't necessary. The point of showing the example was not to show that something was exclusively possible this way, but to show the general trick. – nic ferrier Feb 10 '15 at 18:21
  • 1
    I'm very surprised... I didn't "criticize" at all, just pointed 2 remarks (and said that indeed the method you used can be usefull in other cases, but here is just not needed)... – Olivier Dulac Feb 12 '15 at 06:38
7

wtop is cool. There's other utilities as well. Often, I'll parse logs using bash, sed, and awk.

Warner
  • 23,440
  • 2
  • 57
  • 69
  • wtop, and specially their log analyzer logrep are great, once you adapt the .conf to your log format it will provide a fast way to get what you need (top url, traffic, etc..) – aseques Aug 26 '13 at 11:05
6

apachetop is pretty cool; it prints live statistics. You run it with

apachetop -f /var/log/apache2/www.mysite.com.access.log

To install it in Debian/Ubuntu:

apt-get install apachetop

or from source: https://github.com/JeremyJones/Apachetop

kjones
  • 155
  • 1
  • 6
Oriettaxx
  • 211
  • 2
  • 2
1

What sort of output do you want?

If you are you just looking to count things then grep something logfile.txt | wc -l works great. If you want pretty graphs... not so much.

Chris Nava
  • 1,147
  • 1
  • 7
  • 9
1

Instead of using a command line tool I would suggeset to try Apache Logs Viewer. It's a free tool which can monitor and analyze the Apache Log File. It can generate some pretty cool graphs and reports on the fly.

More info from http://www.apacheviewer.com

0

if you have a windows workstation that you can use then logparser is the tool of choice!

tony roth
  • 3,844
  • 17
  • 14
0

analog works well out of the box and doesn't require a lot of setup. logwrangler is a package that works with analog to generate nicer output and also requires little setup.

BillThor
  • 27,354
  • 3
  • 35
  • 69