Grep: count number of matches per line

26

10

I'm trying to get the number of matches (in this case occurrences of { or }) in each line of a .tex file.

I know that the -o flag returns only the match, but it returns each match on a new line, even combined with the -n flag. I don't know of anything I could pipe this through to count the repeats. The -c flag only returns the total number of matches in the entire file - maybe I could pipe one line at a time to grep?

Chris H

Posted 2014-06-16T10:03:59.093

Reputation: 1 279

Answers

29

grep -o -n '[{}]' <filename> | cut -d : -f 1 | uniq -c

The output will be something like:

3 1
1 2

Meaning 3 occurrences in the first line and 1 in the second.

Taken from https://stackoverflow.com/a/15366097/3378354 .

Moebius

Posted 2014-06-16T10:03:59.093

Reputation: 428

I thought it was a neat command so I tried it on an existing 5k file that I found on my ubuntu vm. If I am not mistaken, the order of the commands is wrong. It should be grep -o -n '[{}]' file | cut -d : -f 1 | uniq -c. -w1 isn't needed either. The difference is 12 lines vs 118! – Dude named Ben – 2014-06-20T19:25:39.877

@Vic, thanks for noticing it! The -w1 option of uniq was blocking the comparison to the first character, therefore only working on line numbers of 1 digit. Changing the order, as you suggested, makes it unnecessary. – Moebius – 2014-06-22T07:48:34.430

Thanks - google found lots of regex hits on SU, but not that one on SO, which doesn't even seem to have a regex tag. The sort isn't strictly necessary as grep's output is sorted by line number, but I guess it's good practice before uniq. – Chris H – 2014-06-16T10:45:03.040

2Probably not tagged regex because the regex is the easy part. – Tom Zych – 2014-06-16T10:51:51.710

Is it actually necessary to sort -n? Doesn't it come out in line number order anyway? – Tom Zych – 2014-06-16T10:52:31.167

You are right, sort -n is not necessary. Thanks. – Moebius – 2014-06-16T10:58:23.360

@TomZych, it turned out you were right, but had I known that I might not have asked. The mental jump from grep to tag:regex was perhaps a bit too much though. – Chris H – 2014-06-16T12:54:59.077

3

After reading various solutions, I think this is the easiest approach to the problem:

while read i; do echo $i |grep -o "matchingString"| wc -l;  done < input.txt

alfredocambera

Posted 2014-06-16T10:03:59.093

Reputation: 131

3Best solution, in my opinion. Could be even more simplified by reducing by one pipe: grep -o "matchingString" <<< $i | wc -l. – Benjamin W. – 2015-12-20T05:57:14.770

1This will be orders of magnitude slower then other options though – Rahul – 2018-06-21T16:45:17.067

1

Is using grep a requirement?  Here’s an alternative:

sed 's/[^{}]//g' your_file | awk '{print NR, length }'

The sed strips out all characters other than { and } (i.e., leaving only { and } characters), and then the awk counts the characters on each line (which are just the { and } characters).  To suppress lines with no matches,

sed 's/[^{}]//g' your_file | awk '/./ {print NR, length }'

Note that my solution assumes (requires) that the strings you are looking for are single characters.  Moebius’s answer is more easily adapted to multi-character strings.  Also, neither of our answers excludes quoted or escaped occurrences of the characters/strings of interest; e.g.,

{ "nullfunc() {}" }

would be considered to contain four brace characters.

Scott

Posted 2014-06-16T10:03:59.093

Reputation: 17 653

grep wasn't really a requirement, it was just where I started looking for a solution, because it gave me something close. I've never had a need for awk, so had I not used the answer above I'd have used this as a chance to experiment -- I may still. What I failed to make clear (but it doesn't affect either answer) is that I wanted to run the script once per bracket, to help me track down a mismatch (in LaTeX source, here for a table) where most pairs occur in a single line. – Chris H – 2014-06-16T15:46:30.030

I’m not quite sure what you mean by “run the script once per bracket,” but if you want to track down a brace mismatch, you might want to try something like sed 's/{[^{}]*}//g' your_file | grep –n '[{}]', where the sed strips out (matched) pairs. If you have nested pairs, use sed 's/{[^{}]*}//g;s/{[^{}]*}//g;s/{[^{}]*}//g;…' …, repeating the s/{[^{}]*}//g as many times as your deepest nesting. – Scott – 2014-06-16T16:41:22.203

I meant execute `sed 's/[^}]//g' your_file | awk '{print NR, length }' and 's/[^{]//g' your_file | awk '{print NR, length }'. I do indeed have nesting, and working out the deepest level seemed like a chore. Turning many lines into a handful (there are a few cases where the braces only match over multiple lines for valid reasons) worked well (I use jedit which highlights the matching bracket -- for any type of bracket it understands -- so I really did just need to narrow it down). – Chris H – 2014-06-16T17:53:24.797