Bash: amount of bytes used in a log file grouped by token

Answers

Try:

awk -F "|" '{ a[$5]+=1+length($0) } END{for (name in a) print name,a[name]}' trace.log

Example

Let's consider this test file:

$ cat trace.log
1|2|3|4|jerry|6
a|b|c|d|phil|f
1|2|3|4|jerry|6

The original command produces this output:

$ awk -F "|" '{ print $5 }' trace.log | sort | uniq | xargs -l sh -c 'echo -n $0 && grep "$0" trace.log | wc -c'
jerry32
phil15

The suggested command, which loops through the file just once, produces this output:

$ awk -F "|" '{ a[$5]+=1+length($0) } END{for (name in a) print name,a[name]}' trace.log
jerry 32
phil 15

How it works

-F "|"

This sets the field separator for input.
a[$5]+=1+length($0)

For each line, we add the length of the line to the count stored in associative array a under this line's user name.

The quantity length($0) does not include the newline that ends the line. Consequently, we add one to this to account for the \n.
END{for (name in a) print name,a[name]}

After we have read through the file once, we print out the sums.

John1024

Posted 2016-07-11T18:15:37.907

Reputation: 13 893

1awk for the win. Well done and in about 20 seconds, impressive – Sergio – 2016-07-11T18:32:59.697

Why not write a more sophisticated script (bash/perl/python/ruby/etc)? One that does a single pass through the file, reading line by line, and maintaining a map of accumulated byte counts for each user. – jehad – 2016-07-11T18:21:36.843

@jehad: because 1) that would remove all the fun :) 2) I'm lazy and 3) I explicitly asked for a way of doing that via shell commands – Sergio – 2016-07-11T18:25:24.710

1You're right, and I've +1'd the answer from john1024. My awk skills are not great, but he has shown me the light! – jehad – 2016-07-11T18:27:18.640