How to Extract Multiple substrings from Log FIle

2

1

I am trying to extract the timestamp and the number string in the URL called in an apache logfile that looks like this:

123.456.78.90 - - [16/Dec/2014:06:27:30 +0100] "GET /servlet/something.something=%2B2341231231234&subappid=hello&pass=hello&from=somebody&dlrreq=true&intflag=TRUE HTTP/1.1" 200 31 "-" "python-requests/2.5.0 CPython/2.7.3 Linux/2.6.32-431.el6.x86_64"

So far I'm able to use awk to extract the timestamp and the entire URL.

awk '{print $4,$5} {print $6}' /var/log/httpd/access_log

Please how can I strip out just the number string 2341231231234 so that just the timestamp and this string are on the same line?

SinaOwolabi

Posted 2014-12-16T08:10:13.923

Reputation: 133

Answers

1

Assuming that all your lines have the same format for URL, you could get the timestamp and number string with a sed command like this one:

$ sed -r 's|.*\[(.*)\].*=%(.*)&sub.*|\1 \2|g' /var/log/httpd/access_log
16/Dec/2014:06:27:30 +0100 2B2341231231234

That expression takes whatever exist inside [ and ] (should be the timestamp) and whatever exists between =% and &sub (should be the number string).

jherran

Posted 2014-12-16T08:10:13.923

Reputation: 1 693

1This may work but I'd caution against making assumptions as to the order of URL parameters. Since HTTP doesn't recognize parameter order, you should probably make your command also work without regard to order. – krowe2 – 2014-12-16T22:05:36.477