78

What's the best way of getting only the final match of a regular expression in a file using grep?

Also, is it possible to begin grepping from the end of the file instead of the beginning and stop when it finds the first match?

Acorn
  • 947
  • 1
  • 6
  • 10

5 Answers5

120

You could try

grep pattern file | tail -1

or

tac file | grep pattern | head -1

or

tac file | grep -m1 pattern
jp48
  • 103
  • 3
Cakemox
  • 24,141
  • 6
  • 41
  • 67
  • 29
    `tac file | grep -m 1 pattern` – Dennis Williamson Nov 02 '10 at 00:54
  • 1
    With the added constraint that I wanted to get the line number (`grep -n`) in the actual file, I think `tac` pretty much had to be avoided, unless I wanted to do some subtraction with `wc -l`. Otherwise `tac` with `grep -m1` makes a lot of sense. – Nick Merrill Jul 04 '14 at 18:48
  • 1
    I'd love to see a more performant version than this, since I am trying to search a 20GB file. – Jeff Sep 04 '15 at 17:56
  • @DennisWilliamson 's answer is much better because `grep` will stop working after the first match. without `-m 1`, `grep` will first **find all matching patterns in the file**, then `head` will show only the first - much less efficient. Dennis, please consider posting this in a separate answer! – gilad905 May 18 '17 at 16:33
  • To keep grep colors when piping, use `--color=always`. Tail works great when grepping multiple files (e.g. `grep pattern -r path`), but `tac` option is not recommended for multiple files (probably have high memory consumption). – Noam Manos Feb 25 '20 at 11:35
2

I am always using cat (but this makes it a little longer way): cat file | grep pattern | tail -1

I would blame my linux admin course teacher at college who love cats :))))

-- You don't have to cat a file first before grepping it. grep pattern file | tail -1 and is more efficient, too.

wjandrea
  • 125
  • 6
  • 6
    This is just the first part of Cakemox's answer, except worse. – augurar Sep 15 '17 at 23:57
  • 1
    It works, but it does unnecessary steps. For light usage, this solution works fine, but it does not perform well. The reason is because you don't need to `cat` the file and pipe it to `grep`. You can have `grep` search the file directly via `grep pattern file` (and then use `tail` to return the last result), as in Cakemox's answer. – jvriesem Jul 26 '19 at 17:54
2

For someone working with huge text files in Unix/Linux/Mac/Cygwin. If you use Windows checkt this out about Linux tools in Windows: https://stackoverflow.com/questions/3519738/what-is-the-best-way-to-use-linux-utilities-under-windows.

One can follow this workflow to have good performance:

  1. compress with gzip
  2. use zindex (on github: https://github.com/mattgodbolt/zindex) to index the file with appropriate key
  3. query the indexed file with zq from the package.

Quote from its github readme:

Creating an index

zindex needs to be told what part of each line constitutes the index. This can be done by a regular expression, by field, or by piping each line through an external program.

By default zindex creates an index of file.gz.zindex when asked to index file.gz.

Example:

create an index on lines matching a numeric regular expression. The capture group indicates the part that's to be indexed, and the options show each line has a unique, numeric index.

$ zindex file.gz --regex 'id:([0-9]+)' --numeric --unique

Example: create an index on the second field of a CSV file:

$ zindex file.gz --delimiter , --field 2 

Example:

create an index on a JSON field orderId.id in any of the items in the document root's actions array (requires jq). The jq query creates an array of all the orderId.ids, then joins them with a space to ensure each individual line piped to jq creates a single line of output, with multiple matches separated by spaces (which is the default separator).

$ zindex file.gz --pipe "jq --raw-output --unbuffered '[.actions[].orderId.id] | join(\" \")'" 

Querying the index

The zq program is used to query an index. It's given the name of the compressed file and a list of queries. For example:

$ zq file.gz 1023 4443 554 

It's also possible to output by line number, so to print lines 1 and 1000 from a file:

$ zq file.gz --line 1 1000
wjandrea
  • 125
  • 6
biocyberman
  • 263
  • 2
  • 7
1

The above solutions only work for one single file, to print the last occurrence for many files (say with suffix .txt), use the following bash script

#!/bin/bash
for fn in `ls *.txt`
do
    result=`grep 'pattern' $fn | tail -n 1`
echo $result
done

where 'pattern' is what you would like to grep.

zyy
  • 111
  • 2
0

If you have several files, use inline-for:

for a in *.txt; do grep "pattern" $a /dev/null | tail -n 1; done

The /dev/null provides a second file so grep will list the filename where the pattern is found.