1

I'm searching for a command line tool working with a stream of lines (tail -f typically) and counting them like : tail -f /var/log/apache2/access.log | cut -d' ' -f1 | SOME_COMMAND and displaying a top-like view as :

52 xxx.xxx.xxx.xxx 12 xxx.xxx.xxx.xxx 6 xxx.xxx.xxx.xxx 2 xxx.xxx.xxx.xxx

It could be so handy, associated for example to this sh :

#!/bin/sh
# NCSA structure :
#IP - - [DATE] "METHOD URL HTTP/VERSION" STATUS LENGTH "REFERER" "USER AGENT"
QUERY=""
while [ "$1" ] ; do
  case "$1" in
      ip) QUERY="$QUERY"'\1' ;;
      date) QUERY="$QUERY"'\4' ;;
      method) QUERY="$QUERY"'\5' ;;
      url) QUERY="$QUERY"'\6' ;;
      version) QUERY="$QUERY"'\7' ;;
      status) QUERY="$QUERY"'\8' ;;
      length) QUERY="$QUERY"'\9' ;;
      referer) QUERY="$QUERY"'\10' ;; # Does not work...
      useragent) QUERY="$QUERY"'\11' ;; # Does not work
      *) QUERY="$QUERY""$1" ;;
  esac
  shift
done
sed -r 's/^([^ ]+) ([^ ]+) ([^ ]+) \[([^]]+)] "([^ ]+) ([^"]+) HTTP\/([^"]+)" ([^ ]+) ([^ ]+) "([^"]+)" "([^"]+)"$/'"$QUERY"'/g'

With this command i'm searching and my script you could do : cat somelog | ncsa.sh url | SOME_COMMAND and get an top of your viewed url'z, or referer, or what you want

(and if someone can fix the bug of \10 interpreted as \1 followed by a 0 ... :p )

Have a great day !

Mandark
  • 251
  • 1
  • 2
  • 8
  • 1
    You mean something like [apachetop](http://freshmeat.net/projects/apache-top/)? – Zoredache Jun 23 '10 at 22:20
  • @Zoredache : I think the Apache logfile is just an example. It sounds like @Mandark wants to do this for logfiles in general. But maybe I'm misreading his request. – Stefan Lasiewski Jun 24 '10 at 00:22
  • @Stefan Lasiewski, you are probably right, but I thought I would post that just in case he was looking for a tool that addressed the specific case described in his example. – Zoredache Jun 24 '10 at 04:31
  • 1
    @Zoredache : Yes apache is juste an example, i'm searching for a tool working in all situations, a great big one doing its job well, in my current situation i wanna show a top of my "MISS, PASS, HIT" in my cache server from a tail -f | grepped by specific part of my web sites ... | some_top PS : Apachetop sucks big time, as it's not maintained, and it reports me some funny values sometime... – Mandark Jun 24 '10 at 07:57
  • I think i'll start to wrote it in python tomorrow... :p – Mandark Jun 24 '10 at 16:43
  • @Mandark : When you're done, post it here! Post it here! I would love to see what you've done. – Stefan Lasiewski Jun 24 '10 at 20:48
  • @Stefan : Yeah don't worry i'll post it everywhere :-) but i'll wrote a first in Python and recode it with more time in C – Mandark Jun 25 '10 at 10:17
  • @Mandark : Just my 2-cents, but Python will be easier for most of the rest of us to understand, maintain and modify ;) – Stefan Lasiewski Jun 25 '10 at 16:51
  • Got the first working version, in C, using AVL trees, will test it all the day, release it soon – Mandark Jun 29 '10 at 07:38
  • 1st version working in C commited on GitHub : http://github.com/JulienPalard/logtop – Mandark Jul 01 '10 at 17:10
  • `sed` only supports back references `\1` through `\9`. I believe Perl supports `$1` through `$99`. – Dennis Williamson Jul 07 '10 at 04:52

2 Answers2

2

First version of a program solving this problem commited here :

http://github.com/JulienPalard/logtop

Mandark
  • 251
  • 1
  • 2
  • 8
0

Are you looking for uniq -c and tr ?

cat /var/log/apache2/access.log | cut -d' ' -f1 | uniq -c | tr -s "\n " " "

From the uniq man page:

Filter  adjacent  matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).

       -c, --count
              prefix lines by the number of occurrences

From the tr man page:

Translate, squeeze, and/or delete characters from standard input, writing to standard output.

   -s, --squeeze-repeats
          replace  each  input  sequence  of  a repeated character that is listed in SET1 with a single occurrence of that character

To have it sorted descendant:

cat /var/log/apache2/access.log | cut -d' ' -f1 | uniq -c | sort -gr | tr -s "\n " " "

An example of output (I obfuscated the IP's):

 87 71.255.255.11 54 95.255.222.255 50 84.255.255.120 50 178.255.255.14 49 92.255.255.240 49 91.255.36.215 49 255.52.126.184 49 217.255.110.23 49 216.255.45.4 49 255.8.27.5

Note: My examples are using cat because I don't think using tail -f would work as there is no End of File, but you could instead just use tail -100 for example and do it periodically.

Weboide
  • 3,275
  • 1
  • 23
  • 32
  • 1
    I agree. The performance with tail is going to be bad. As described above (top-like behavior), you'd want to add 'watch' around that whole shebang, making it that much worse. A dedicated application (maybe in python?) would be a decent approach. – Slartibartfast Jun 24 '10 at 02:42
  • I'm not looking for uniq -c, bug for i use daily {cat/cut/grep/sed} | sort | uniq -c | sort -gr, and this line is really valuable. But i want it to work with data incoming not from a file, so i really need a ... | cut -d' ' -f1 | some_top_cmd. – Mandark Jun 24 '10 at 08:02
  • Yes in Python i can write it in minutes, or in C, but, i can't imagine that it's not already packaged ! If i need this, someone needed this ans wrote this before me. – Mandark Jun 24 '10 at 08:03
  • Mandark please check out my new solution. it gives exactly what you are looking for. – Weboide Jun 24 '10 at 11:23
  • Just read your 6mn old edit, thanks for it, I agree, your line works as i'm actually watching my stats with a very similar line (watching cache hit and misses for a particular subset of pages from my cache log, grepping as : watch './analyse.sh | grep --line-buffered -v "PASS$" | head -n 100 | grep -o "[^ ]*$" | sort | uniq -c | sort -gr' and the result is what i expect, but, it updates every 100 lines, having a program to do it would permit to update the statistics every new lines, to show multiples column as "last 10, minutely, hourly" counts etc... – Mandark Jun 24 '10 at 11:35