Matching a sentence with grep


I'm trying to grep for the full sentence containing a search term. I've tried

grep (^.|\.\s).*searchterm.*(\.\s|\n)

but it's not working and I'm not sure why.

To clarify: I want stdout to print the full sentence of the search term. I am using grep to search through a single text file.

As an example, if my file has

"Foo blah. Blah blah searchterm blah blah. Foo bar."

I want stdout to print Blah blah searchterm blah blah


Posted 2015-07-02T01:19:39.057

Reputation: 77

This one should be possible, but we might need to make some assumptions about your input. Does it have newlines? Might the sentences have abbreviations (ie containing periods) in them? – bertieb – 2015-07-02T10:07:11.487


If you seriously mean "the full sentence containing a search term", see How to put sentences on separate lines to get a clue as to how open-ended this challenge is.

– Scott – 2015-07-02T12:01:52.070



Tried this on my sh-compatible terminal:

$ grep --only-matching --perl-regexp "[^.]*searchterm[^.]*" \
       <<< "Foo blah. Blah blah searchterm blah blah. Foo bar."
Blah blah searchterm blah blah

Can be abbreviated to grep -oP.

I think the problem with the regex you provided is specifying .*to how greedy you wanted it to be (as stated by bertieb). What I did was just reformulate your request from "anything as long as it ends with dot" to "anything that's not a dot"

Felipe Lema

Posted 2015-07-02T01:19:39.057

Reputation: 401

slaps forehead How did I miss that one? +1 – bertieb – 2015-07-02T11:48:00.347


This is an interesting question as it seems relatively straightforward at first glance- "Oh, just add -P to get PCRE parsing... no, wait. Add some lookahead and lookbehind... Negative lookahead and lookbehind... Replace those greedy matches... Why am I hitting the PCRE backtracking limit? Hmmm..." Suddenly it's much later and my pot of tea is nearly gone.


Assume there are no abbreviations or other extraneous periods in the input. Use sed replace periods with newlines. Simple grep for searchterm:

$ sed 's/\./\n/g' input.txt | grep searchterm

Assume nothing except perl installation (and newlines in input). Use Lingua::EN::Sentence to extract sentences, whilst dealing with abbrevations and such.

$ perl -MLingua::EN::Sentence=get_sentences -ne 'print "$_\n" for grep { /searchterm/ } @{get_sentences($_)}' <(tr '\n' ' ' < input.txt)

(many thanks to Tom Fenech in this answer over on SO)

One other potential advantage of this approach beyond matching where there are extraneous periods is it also includes the final full stop. This isn't specified in your original question, but depending on what you are using the output it may save appending one.

Note that for this you might have to install Lingua::EN::Sentence; if you have perl you might well have cpan and can (sudo) cpan install Lingua::EN::Sentence.

Both of these have assumptions and use tools other than plain grep; and basically don't actually modify your regex. But they get the job done as described, at least in my testing on lorem ipsum text.

Edit: Felipe Lema's answer is much more straightforward, and I'm not sure how I skipped over it in testing. I'm leaving these solutions here for other interest; particularly the second for anyone else looking for more complex input.


Posted 2015-07-02T01:19:39.057

Reputation: 6 181