Get some meaning from apache logs with awk and grep

Question

I am trying to have some meaning from my apache log files: I want to parse my access log and have some statistics about 200 status code hits (how many times each of them was hit).

So I tried to learn some awk magic and here what I got right now:

grep "HTTP/1.1\" 200" access.log | awk '{print $7 } ' | sort | uniq -c | sort -n

This is doing most of the things I want: it selects all logs entries that ended up with 200 hit from access.log then selecting a part which corresponds to path that hit generated, sort them, count each unique element and sort based on the number of times this unique element was hit.

So result looks like this:

  1 /public/img/upload/image_3.jpg
  2 /public/img/upload/image_2.jpg
  8 /public/img/upload/image_1.jpg
 18 /public/js/main.js
 33 /
236 /index.html

I am trying to push it a little bit further:

because I have logrotate, I have many other files like access.log.1, ..., access.log.N and I want to get this statistics for all of them together. The only solution I have found is to use grep "my 200 expression" -R /pathToDirWithLogs which will grep over all files in directory, which clearly is not nice, because it will grep not only logs. Listing the files is also not an option, because I do not know the number N.
I do not really care about each individual file in /public/img/upload/, I just care how many of them was hit. Here I am totally lost, having no idea how to start. But here is an example what I want for my simple output 11 /public/img/upload/* 18 /public/js/main.js 33 / 236 /index.html

Here public/img/upload represents all hits which were generated there: 8 from image_1, + 2 from image_2 and 1 from image_3.

Are there any awk, grep magicians to show me the way?

@MirceaVutcovici it does no look as a duplicate to me, but it has a lot of cool things there. — Salvador Dali, Mar 25 '14 at 22:39
@DennisWilliamson can you please show me how can it do it? And the most important thing what benefit will it give. Will it be faster (assuming that the log file will be big enough to make this speed difference visible) — Salvador Dali, Mar 26 '14 at 02:05

score 2 · Accepted Answer · answered Mar 25 '14 at 22:34

2

The grep command (like many utilities) will accept multiple files supplied by shell globbing so

grep -h "HTTP/1.1\" 200" /path/to/log/dir/access.log*

(or similar) should do what you want in that respect.

The second part of your question is unclear - how are we to know that you're not interested in individual files in /public/img/upload but you are interested in the individual files in /public/js and / but apparently not /index.html.

Clarifying that requirement will probably lead to a solution but you should spend some time deciding what you want and then have a go yourself.

Get some meaning from apache logs with awk and grep

1 Answers1