1

I am trying to have some meaning from my apache log files: I want to parse my access log and have some statistics about 200 status code hits (how many times each of them was hit).

So I tried to learn some awk magic and here what I got right now:

grep "HTTP/1.1\" 200" access.log | awk '{print $7 } ' | sort | uniq -c | sort -n

This is doing most of the things I want: it selects all logs entries that ended up with 200 hit from access.log then selecting a part which corresponds to path that hit generated, sort them, count each unique element and sort based on the number of times this unique element was hit.

So result looks like this:

  1 /public/img/upload/image_3.jpg
  2 /public/img/upload/image_2.jpg
  8 /public/img/upload/image_1.jpg
 18 /public/js/main.js
 33 /
236 /index.html

I am trying to push it a little bit further:

  • because I have logrotate, I have many other files like access.log.1, ..., access.log.N and I want to get this statistics for all of them together. The only solution I have found is to use grep "my 200 expression" -R /pathToDirWithLogs which will grep over all files in directory, which clearly is not nice, because it will grep not only logs. Listing the files is also not an option, because I do not know the number N.
  • I do not really care about each individual file in /public/img/upload/, I just care how many of them was hit. Here I am totally lost, having no idea how to start. But here is an example what I want for my simple output 11 /public/img/upload/* 18 /public/js/main.js 33 / 236 /index.html

Here public/img/upload represents all hits which were generated there: 8 from image_1, + 2 from image_2 and 1 from image_3.

Are there any awk, grep magicians to show me the way?

Salvador Dali
  • 925
  • 6
  • 19
  • 31

1 Answers1

2

The grep command (like many utilities) will accept multiple files supplied by shell globbing so

grep -h "HTTP/1.1\" 200" /path/to/log/dir/access.log*

(or similar) should do what you want in that respect.

The second part of your question is unclear - how are we to know that you're not interested in individual files in /public/img/upload but you are interested in the individual files in /public/js and / but apparently not /index.html.

Clarifying that requirement will probably lead to a solution but you should spend some time deciding what you want and then have a go yourself.

You may also be interested in awstats.

user9517
  • 114,104
  • 20
  • 206
  • 289
  • Thanks for helping me for the second time. Regarding the second problem I was thinking that there is a way to somehow sum up all the numbers that have a corresponding path starting with `/public/img/upload`. So no matter what follows this path - just to aggregate the number. Because `/public/js` does not start with the above-mentioned path, just output it in a normal way. – Salvador Dali Mar 25 '14 at 22:44
  • @SalvadorDali: You will need to produce a list of paths then aggregate them. You may find dirname will help. – user9517 Mar 26 '14 at 07:35