Unix command to list the portion of the end of a log file from a line containing only hyphens to end of file

I have a long log file where each entry begins with a line containg only hyphens.

suresh

Posted 2010-07-13T12:02:24.180

Reputation:

Answers

You can do it with a shell script thus:

#!/bin/bash
if [[ -z "$1" ]] ; then
    echo Usage: $0 '<inputFile>'
    exit 1
fi
line=$(grep -n '^--*$' "$1" | tail -1 | sed 's/:.*//')
if [[ -z "${line}" ]] ; then
    cat "$1"
else
    sed "1,${line}d" "$1"
fi

Given the input file:

this is line 1
-------
this is line 3
-------
this is line 5
this is line 6

it produces:

this is line 5
this is line 6

By way of explanation, the grep -n produces a series of lines like:

2:-------
4:-------

where the 2 and 4 are the line numbers. The tail -1 then just filters out all but the last and the sed strips out everything from the colon to the end of the line, leaving just the line number

Then, if there was no lines with the desired pattern, it just outputs the entire file. Otherwise it deletes all the lines between 1 and the last hyphen line.

As an aside, my original answer included this awk snippet which will process the file only once:

awk '/^--*$/{s=""}{s=s$0"\n";}END{print s}'

However, keep in mind that it works by accumulating lines into a string and clearing the string out whenever it finds a hyphen line. Then, at the end, it simply outputs the string (all the lines after the last hyphen line).

While at first glance, this may appear to be more efficient, it doesn't seem to be in reality. In (admittedly non-exhaustive) tests on my system, it actually ran quite a bit slower, I think to do with the many string appends going on. The fact is that the script solution seems to be faster despite the fact that it makes multiple passes of the data (possibly because each pass is very limited in what it does).

user53528

Posted 2010-07-13T12:02:24.180

Reputation:

This finds the first instead of the last hyphen-only line. If I understand correctly, the OP wants only line 5 in this example. – Philipp – 2010-07-13T12:17:54.387

1@Philipp, the second awk command and the bash script both deliver the final section of the input file. – None – 2010-07-13T12:24:17.377

And, since the bash script is a lot more efficient, I've ditched the awk answer anyway. – None – 2010-07-13T12:39:50.003

But your awk version was surely better, in that it only read the file once, and was only a single program invocation. In this shell version, grep has to read the entire file, then tail, sed and cat or sed have to be invoked (on top of the shell invocation). – None – 2010-07-13T13:29:10.503

The awk script that read the file once and used no extra storage was the one that started at the first hyphen line. The one that started on the last hyphen line did so by storing every line in-process so potentially used a lot of memory. – None – 2010-07-13T13:42:38.540

If I recall correctly, your second awk script only stored the lines from the preceding all-hyphens line: the {s=""} clause discarded those whenever if came to the next such line, and it printed out the current block of accumulated lines at the end. Simple, neat, and similar to the sed solution. – None – 2010-07-13T13:58:34.583

Okay, @Norman, I've put it back in but initial tests don't seem to indicate it's as fast. I think the continual string appending may be slowing it down. – None – 2010-07-13T14:26:29.513

its slow because the script iterate the file multiple times. Once in the grep/tail/sed statement, twice at the cat/sed portion (last if/else ). – user31894 – 2010-07-14T01:23:01.650

@ghostdog, you misunderstand. The awk is slower, not the script. My comment was that the continual appending of strings in awk is most likely slower than processing the file more than once in the shell script. In fact, the script only iterates the file twice as the stream that gets passed out of grep will probably be substantially smaller since it consists of only the hyphen lines. – None – 2010-07-14T01:58:38.327

@paxdioblo: OK, I'll take your word for it -- that's interesting, and unexpected. One can obsess too much about speed, though, and on that ground, I still think this is the neater solution. It's shorter, which equals clearer, and (relevant in this present context) more educational! – Norman Gray – 2010-07-14T23:45:03.863

awk -vRS="-+" 'END{print}' ORS="" file

user31894

Posted 2010-07-13T12:02:24.180

Reputation: 2 245

Minor improvement: BEGIN{ORS=""} – Paused until further notice. – 2010-07-13T13:56:58.763

That is fast (and sneaky, which I particularly like in solutions) but I think it will match every line that has one or more hyphens in them, not "a line containing only hyphens". In other words, the line "abc-xyz" will match as well. Is there any way to put start- and end-line markers in the record separator? – None – 2010-07-13T14:30:47.987

awk's RS regex doesn't "recognise" start of line. A more appropriate RS regex to use would be RS="\n-+". – user31894 – 2010-07-13T14:35:52.300

Wouldn't it be RS="\n-+\n" to ensure it's the whole line containing nothing but hyphens? – None – 2010-07-14T02:00:42.583

You can also do it with sed:

% cat t.txt
this is line 1
this is line 2
-------
this is line 3
----
this is line 4
-------
this is line 5
this is line 6
% sed -n -e '/^---*/{h;d;}' -e H -e '${g;p;}' t.txt
-------
this is line 5
this is line 6
%

(with some seds, those semicolons would have to be newlines).

Norman Gray

Posted 2010-07-13T12:02:24.180

Reputation: 951

If you don't want to print the dashes, make the last -e section like this: '${g;s/^-\+\n//;p}'. Also, your pattern would be better if it was /^-\+$/ or /^--*$/. – Paused until further notice. – 2010-07-13T14:06:05.913

~~I think this can be easily done using sed. You want a command to find the final (i.e. last) line of only-hyphens, and you want to print from that point to the end of file.~~

~~Unfortunately, I'm not very good with sed. Hoping someone else can elaborate.~~

EDIT

OK, sed is not ideal. Here's how to do it with ex, the text-only twin of vi:

ex filename
$
?----------
.,$p
q

Carl Smotricz

Posted 2010-07-13T12:02:24.180

Reputation: 629

I think it's very hard. If you read a file line-by-line, how would you know whether a specific line is the last of its kind before reading the remaining lines? I think there is no other way than to read the whole file. – Philipp – 2010-07-13T12:10:11.627

1@Philipp: You're right, this is not a good job for a sequential editor. It does seem to be easy for any "real" editor, though. See my update. – Carl Smotricz – 2010-07-13T12:17:21.090

tac file | grep -B 10000 -m 1 -- '------' | tac

Sjoerd

Posted 2010-07-13T12:02:24.180

Reputation: 1 131

2Very ingenuous solution, but I think you can replace the grep by sed '/^-\+$/Q' – Philipp – 2010-07-13T12:25:57.940

This is probably not the most efficient solution:

#!/bin/bash

file=$1
pattern='^-+$'
declare -i count=0
declare -i index=0

while read -r line
do
    count+=1
    [[ $line =~ $pattern ]] && index=$count
done < "$file"

tail -n "$((count - index))" "$file"

Philipp

Posted 2010-07-13T12:02:24.180

Reputation: 261

Use tac and sed:

$ cat log-file 
---
first
------
second
---
last

$ tac log-file | sed -e '/^-\+$/,$d' | tac
last

Greg Bacon

Posted 2010-07-13T12:02:24.180

Reputation: 813

echo "`sed -n '/^--*$/=' <file> | tail -1`,\$p" <file>  | xargs sed -n

But I like Norman Gray's solution much better. May like it even more if he explained it :-)

sureshvv

Posted 2010-07-13T12:02:24.180

Reputation: 153

Thanks! It uses the 'hold space', which is the one bit of state that sed can use. At every '---' line, the 'h' replaces the hold space with the current line, thus discarding anything else that was there; every other line is appended to the hold space; then on the last line, the pattern space is replaced by the current hold space, and printed. – Norman Gray – 2010-07-14T23:50:49.450