8

I have nginx log file, and I want to find out market share for each major version of browsers. I am not interested in minor versions and operating systems. I would like to get something like this:

100 IE6
 99 IE7
 20 IE8
200 FF2
300 FF3

I know how to get the list of user agents from the file, but I want to aggregate the list to see only the major versions of the browsers. Is there a tool that does it?

Željko Filipin
  • 223
  • 1
  • 3
  • 12

7 Answers7

23
awk -F'"' '/GET/ {print $6}' /var/log/nginx-access.log | cut -d' ' -f1 | sort | uniq -c | sort -rn
  • awk(1) - selecting full User-Agent string of GET requests
  • cut(1) - using first word from it
  • sort(1) - sorting
  • uniq(1) - count
  • sort(1) - sorting by count, reversed

PS. Of course it can be replaced by one awk/sed/perl/python/etc script. I just wanted to show how rich unix-way is.

SaveTheRbtz
  • 5,621
  • 4
  • 29
  • 45
7

While the one liner by SaveTheRbtz does the job, it took several hours to parse my nginx access log.

Here is a faster version based on his, which takes less than 1 minute per 100MB of log file (corresponding to about 1 million lines):

sed -n 's!.* "GET.* "\([[:alnum:].]\+/*[[:digit:].]*\)[^"]*"$!\1!p' /var/log/nginx/access.log | sort | uniq -c | sort -rfg

It works with the default access log format of nginx, which is the same as the combined format of Apache's httpd and has the User-Agent as the last field, delimited by ".

3

To get user agent

sudo awk -F"\"" '{print $6}' /var/log/nginx/access.log | sort | uniq -dc
Arun
  • 131
  • 2
3

This is a slight variation of the accepted answer, using fgrep and cut.

cat your_file.log | fgrep '"GET ' | cut -d'"' -f6 | cut -d' ' -f1 | sort | uniq -c | sort -rn

There is something appealing about using "weaker" commands when it is possible.

Aakash Jain
  • 103
  • 3
solidsnack
  • 131
  • 3
2

Awstats should do the trick, but will supply far more information. I hope this helps...

Kevin Worthington
  • 327
  • 2
  • 6
  • 19
2

Webalizer can do it.

Example:

webalizer -o reports_folder -M 5 log_file
  • -o reports_folder specifies folder where report is generated
  • -M 5 displays only the browser name and the major version number
  • log_file specifies log file name
  • source: ftp://ftp.mrunix.net/pub/webalizer/README
Željko Filipin
  • 223
  • 1
  • 3
  • 12
0

I'd use shell script for that: cat, awk pipe, sort and uniq will do the job

alexus
  • 12,342
  • 27
  • 115
  • 173
  • Thanks. I know how to parse logs. I was looking for a tool that already knows how to do it, so I do not have to write a script. – Željko Filipin Dec 02 '09 at 11:07