Bash script to get specific prior context

I am searching through some log files where there are groups of actions performed. At the start of each group there is a line that has information about the group and then lots of verbose informion about each action is printed with a PASS/FAIL status printed at the end of each individual test.

What I want to do is find any actions that failed and print the header line followed by some amount of context prior to the fail line.

For example:

Start test group ID 12345
verbose info
verbose info
Test 1 PASSED
verbose info
verbose info
Test 2 PASSED
Start test group ID 238284
verbose info
verbose info
Test 1 PASSED
verbose info
verbose info
Test 2 FAILED

The above would be condensed into

Start test group ID 238284
verbose info
verbose info
Test 2 FAILED

The number of lines prior to the FAILED flag is not constant for each test, and each test is a different average length too, but a constant number is fine with me. I generally only care about the last few lines anyway.

I feel like this might be a bit complicated for grep, but I've never really used awk for anything and don't know where to start with it.

ben

Posted 2012-06-14T16:17:54.770

Reputation: 103

Answers

Here's an awk solution that simplifies things by processing the output in reverse (requires the tac command, which is part of the GNU coreutils):

First, the awk script (put in a file such as 'process.awk'). It's just a little too long for a bash one-liner.

BEGIN                           { output=0; any=0; }
/^Test .* FAILED/               { output=1; any=1; }
/^Test .* PASSED/               { output=0; }
/^Start test group/ && any == 1 { output=1; any=0; }
output == 1                     { print; }

Then, run that script on the reversed log file, and reverse the output:

tac logfile | awk -f process.awk | tac

How does it work?

First, we pass our input through tac in order to reverse the order of the lines (so we can determine if the "following" lines belong to a FAILED or PASSED test before reading them).

The script works as follows. Each action consists of a condition that must be matched, followed by a block of code to execute if the current line matches the condition.

The first action is a BEGIN action, which is always executed once before we start looking at the input. It initialized two boolean flags that control what gets printed. output will be set to 1 if we want to print the current line, 0 otherwise. any will be set to 1 any time we encounter a FAILED test, and reset to 0 after we've finished processing a test group. Both values start at 0.

The next action tests the current line to see if it is the beginning of a failed test (remember, we're processing the output in reverse). If so, set both output and any.

The next action tests the current line to see if it is the beginning of a passed test. If so, clear the output flag, but leave any alone. (There might still be a failed test before the end of the test group).

The next action tests the current line to see if it is the test group header and if the any flag is set. If it is, we want to print the header (we had at least one failed test), so set output and clear any (to prepare for the next test group). Otherwise, we don't need to do anything; any is already 0, and output can't have been set to 1 if any never was.

Finally, we have an action that doesn't look at the current line, but just checks if any of the previous actions have set output. If they have, we print the current line (which may be a "Test FAILED" line, some verbose info that "precedes" the FAILED line, or a test group header).

Once all the actions are exhausted, we move to the next input line and try to apply each action again. After all the input is exhausted, we'll have printed each of the output lines we want, but in reverse order. Piping the output through tac fixes that.

Note that script could be made a bit more efficient at the cost of making it more complex, but it should be fast enough.

chepner

Posted 2012-06-14T16:17:54.770

Reputation: 5 645

+1, compact AWK with a long description and nice trick with tac for easier parse. – nik – 2012-06-14T17:57:02.560

A script derived from Bgs' script in bash:

buffer=""; cat /your/file | while read line
do
    echo $line | grep -Eq "^Start" && start=$line && continue
    echo $line | grep -q "FAILED" && echo -e "$start$buffer\n$line" \
            && buffer="" && continue
    echo $line | grep -q "PASSED" && buffer="" || buffer="$buffer\n$line"
done

For each "FAILED" line, the "Start" line together with all lines prior to the "FAILED" line are printed up to the last "FAILED" or "PASSED" line (excluding).

Example input file:

Start test group ID 12345
verbose info #1
verbose info #2
Test 1 PASSED
verbose info #3
verbose info #4
Test 2 PASSED
verbose info #5
verbose info #6
Test 3 PASSED
verbose info #7
verbose info #8
verbose info #9
Test 4 FAILED
Start test group ID 98765
verbose info #10
verbose info #11
verbose info #12
Test 5 FAILED
verbose info #13
verbose info #14
Test 6 PASSED
verbose info #15
verbose info #16
verbose info #17
Test 7 FAILED
verbose info #18
verbose info #19
verbose info #20
Test 8 PASSED

Script output:

Start test group ID 12345
verbose info #7
verbose info #8
verbose info #9
Test 4 FAILED
Start test group ID 98765
verbose info #10
verbose info #11
verbose info #12
Test 5 FAILED
Start test group ID 98765
verbose info #15
verbose info #16
verbose info #17
Test 7 FAILED

speakr

Posted 2012-06-14T16:17:54.770

Reputation: 3 379

This Golf work shows why the tools of Unix environment matter. Nothing against your answer speakr. Another answer in Perl/Python/etc will best complement this one :-) – nik – 2012-06-14T18:05:12.707

1@nik Perl: cat /your/file | perl -e 'while(<>){if($_=~m/^Start/){$s=$_;next;}if($_=~m/FAILED/){print$s.$b.$_;$b="";next;}$b=($_=~m/PASSED/)?"":$b.$_;}' gives the same output. ;-) – speakr – 2012-06-14T18:26:42.820

@speakr's Perl solution: Beautiful :-D – Daniel Andersson – 2012-06-14T19:48:29.063

Thinking about ideas... there are a couple of ways you can get here,

Use AWK to catch the Start lines and then catch from the last PASSED or FAILED line onwards till you get a FAILED at which point you dump the Start line and the last pack leading to the FAILED line
Or, Filter the Start lines and the FAILED context separately with grep and merge them. For that you will need to retain line numbers.

Try this crude AWK as a starter,

# script.awk
BEGIN {buffer1="";buffer2=""}
{ 
 if ($1 == "Start") 
 {
  buffer1=$0
 } 
 else 
 { 
  if ($3 == "PASSED") 
  {
   buffer2=""
  } 
  else 
  {
   buffer2=buffer2 "\n" $0; 
   if ($3 == "FAILED") 
   {
    printf "%s%s\n",buffer1,buffer2
   }
  }
 }
}

Run with awk -f script.awk file.txt

Notes:

This will need adjustments if your Start, PASSED or FAILED lines are different
- quite straight forward if they are as consistent as in your example
This might also misfire if your verbose parts have one of the above 3 'keywords' in the 'right' place.
- if so, you'll need to add in more context to catch the right keywords
This will get you all the FAILED section lines
- you can play around with the buffers a bit to get lesser data

nik

Posted 2012-06-14T16:17:54.770

Reputation: 50 788

A simple bash solution (I saved your example into foo.txt):

buff="" ; cat foo.txt| while read line; do echo $line| grep -q "^Start" && buff="" ; buff="$buff\n$line" ; echo $line | grep -q FAILED && echo -e $buff; done

Bgs

Posted 2012-06-14T16:17:54.770

Reputation: 268

welcome Bgs. Did you get a match to the expected output with that? – nik – 2012-06-14T17:15:01.410

Yes: `# buff="" ; cat foo.txt| while read line; do echo $line| grep -q "^Start" && buff="" ; buff="$buff\n$line" ; echo $line | grep -q FAILED && echo -e $buff; done

Start test group ID 238284 verbose info verbose info Test 1 PASSED verbose info verbose info Test 2 FAILED` The formatting doesn't get through in comment but the output is multi-line as it should be :) – Bgs – 2012-06-14T17:20:07.267

@Bgs Your script buffers each line of the input file and prints the whole buffer each time a line contains FAILED. I don't think this is the desired result. – speakr – 2012-06-14T17:31:45.960

As Ben said 'he didn't mind having only the last few lines of the output' I went for less outside calls that can speed up the log filtering. We actually do not know how big the input is. If he has to filter gigabytes of logs, fewer calls are better. If only a few hundred kilobytes, nicer output is better IMHO :) (Example: echo is a bash builtin, grep is an outside call). – Bgs – 2012-06-15T08:06:18.637