How to parse multi-line log file in awk and output only single line with last known ip address


I am stuck and looking for assistance. I want to trigger an event which I'd like to process further through a bash script. The data is retrieved from a log file. Before I start to explain I'll show you some lines of that particular log file for better understanding.

What it looks like


24/04/2017 20:14:29 [ 7910] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:14:34 [10355] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:14:38 [10355] [INFO] [bob] Processed '1' incoming changes
24/04/2017 20:14:47 [22518] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:14:50 [ 7910] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:14:53 [ 7910] [INFO] [bob] Processed '1' incoming changes
24/04/2017 20:15:08 [10355] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:15:14 [22518] [INFO] [bob] method='POST' from='' cmd='Search' getUser='bob' some other colums
24/04/2017 20:15:15 [ 7910] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:15:16 [10355] [INFO] [bob] method='POST' from='' cmd='Search' getUser='bob' some other colums
24/04/2017 20:15:49 [32637] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:15:53 [22518] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:15:56 [22518] [INFO] [bob] Processed '1' incoming changes
24/04/2017 20:16:05 [10355] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
24/04/2017 20:16:09 [32637] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
01/05/2017 03:27:45 [ 4985] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
01/05/2017 03:27:49 [13971] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
01/05/2017 03:28:05 [13970] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
01/05/2017 03:28:10 [ 4985] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
01/05/2017 03:28:25 [13971] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
01/05/2017 03:28:31 [13970] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
15/03/2018 14:49:19 [12918] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
15/03/2018 14:49:21 [12834] [INFO] [alice] method='POST' from='' getUser='alice' some other colums
15/03/2018 14:49:22 [12834] [INFO] [alice] SyncCollections->CheckForChanges(): Waiting for store changes... (lifetime 470 seconds)
15/03/2018 14:55:26 [12843] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
15/03/2018 14:55:26 [12918] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
15/03/2018 14:55:26 [12882] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
15/03/2018 14:55:27 [12970] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
15/03/2018 14:55:28 [12882] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
15/03/2018 14:55:28 [12918] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
15/03/2018 14:55:32 [12970] [INFO] [bob] method='POST' from='' getUser='bob' some other colums
15/03/2018 14:55:32 [12970] [INFO] [bob] SyncCollections->CheckForChanges(): Waiting for store changes... (lifetime 470 seconds)


I'm interested in retrieving the user name (in this example "alice" or "bob") from the log file which appears in the 5th column and the appropriate ip address which is listed in the 7th column. In case the ip address differs from the last state an email notification should be sent through a small bash script.

The condition should be:

  • if the line contains "alice" OR "bob" AND the line contains "from=" then output the user name and appropriate ip address.

Final output shoud look like


Note: Only the last known ip address is wanted, so the output correctly should generate only 2 lines in this example as shown above (one for each user)

What I tried so far

I started with awk but quickly faced a hurdle because awk by default uses white-space as field separator. My intension was to start with a '{ print $4,$6 }' statement. I realized that the third column sometimes break this filtering because of a leading space in the process id, e.g.

24/04/2017 20:14:50 [ 7910] ...

What my awk command currently looks like

With following command I am searching for the string "alice" OR "bob" AND the string "from=" and then generate an output of two unformatted columns

awk 'BEGIN { FS = "[?!([ )]+" } /alice|bob/ && /from=/ { print $5,$7 }' test.log

Output -->

bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
alice] from=''
alice] from=''
alice] from=''
alice] from=''
alice] from=''
alice] from=''
alice] from=''
alice] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''
bob] from=''

I am stuck here. I tried playing around by storing the last known line into a variable and output that "{a=$0}" but obviously I am doing something wrong because I get either errors or the output is wrong. My next idea was to use "tac" and start reading the logfile from its end and exit after the first match. Something like that:

tac test.txt | awk 'BEGIN { FS = "[?!([ )]+" } /alice|bob/ && /from=/ { print $5,$7; exit }'

but this immediately stops after 1st match and output is:

bob] from=''

I need additionally output formatting by stripping out the right bracket ']' and the string 'from=' and the single quotes around the IP address.

Any help really appreciated. Thanks in advance.


Posted 2018-03-15T16:11:08.740

Reputation: 1



You can extend your regex field separator to include ] and ' and then you will have the name and ip cleanly in fields 5 and 9. You can save these in an associative array indexed by the name, and holding the last ip address. At the end of file you print this array.

awk 'BEGIN { FS = "[?!([ )\\]'\'']+" }
/alice|bob/ && /from=/ { 
    user = $5; ip = $9;
    userip[user] = ip
END{ for(user in userip)print user,userip[user] }'


Posted 2018-03-15T16:11:08.740

Reputation: 4 273


Hello meuh and thanks a lot for your suggestion with example. That works pretty fine. But I'm still wondering if it wouldn't be better to reverse the processing and start reading from the end of the file. Because in that case if the log file to be read has thousands of line it consumes a lot of processing power. I guess it would be more effective related to performace to start reading from the tail and stop after the first match for each user.

On the other hand I'm wondering if it's possible to include my whole project into awk as a one-liner.

The goal is to run a cron job each minute and read the log file. If the IP address changed and is newer than the last known one and the ip subnet is not lying within Subnet-C (LAN) then the email notification should be send.


*/1 * * * * root nice -n5 /usr/bin/awk 'BEGIN { FS = "[?!([ )\]'\'']+" } /alice|bob/ && /from=/ { user = $5; ip = $9; userip[user] = ip } END{ for(user in userip)print user,userip[user] }' | ...

I don't know how to accomplish that. Do I need to touch a flag file where I store the current IP adress of each user and then query that later somehow? Is it possible to do everything in awk ?


Posted 2018-03-15T16:11:08.740

Reputation: 1