I am trying to have some meaning from my apache log files: I want to parse my access log and have some statistics about 200 status code
hits (how many times each of them was hit).
So I tried to learn some awk magic and here what I got right now:
grep "HTTP/1.1\" 200" access.log | awk '{print $7 } ' | sort | uniq -c | sort -n
This is doing most of the things I want: it selects all logs entries that ended up with 200 hit from access.log then selecting a part which corresponds to path that hit generated, sort them, count each unique element and sort based on the number of times this unique element was hit.
So result looks like this:
1 /public/img/upload/image_3.jpg
2 /public/img/upload/image_2.jpg
8 /public/img/upload/image_1.jpg
18 /public/js/main.js
33 /
236 /index.html
I am trying to push it a little bit further:
- because I have logrotate, I have many other files like
access.log.1, ..., access.log.N
and I want to get this statistics for all of them together. The only solution I have found is to usegrep "my 200 expression" -R /pathToDirWithLogs
which will grep over all files in directory, which clearly is not nice, because it will grep not only logs. Listing the files is also not an option, because I do not know the number N. - I do not really care about each individual file in /public/img/upload/, I just care how many of them was hit. Here I am totally lost, having no idea how to start. But here is an example what I want for my simple output
11 /public/img/upload/* 18 /public/js/main.js 33 / 236 /index.html
Here public/img/upload represents all hits which were generated there: 8 from image_1, + 2 from image_2 and 1 from image_3.
Are there any awk, grep magicians to show me the way?