Why are the number of lines of grep and grep -v with same pattern not equal to number of input lines?

2

How is this possible (the two regexes are identical):

tmp$ grep    "^[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*$" 2018.csv > 2018a.csv
tmp$ grep -v "^[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*$" 2018.csv > 2018-wrong.csv
tmp$ wc -l 2018*
  289211 2018a.csv
  292005 2018.csv
       1 2018-wrong.csv

I want to split the file 2018.csv into two sets, 2018a.csv containing the lines which match the pattern, 2018-wrong.csv the lines which don't. Since this is either/or each line go in one of the two files and therefore the sum of number of lines in both files should match the number of lines in the input file. Why are the sum of the number of lines of files 2018a.csv and 2018-wrong.csv not equal the number of lines in file 2018.csv? Why are there 2795 lines missing?

Any ideas why the sum of lines does not match?

user1022110

Posted 2019-04-15T19:31:02.813

Reputation: 31

Answers

1

The answer was in the 2018-wrong.csv file which contains the one line:

Binary file 2018.csv matches

The file wasn't a pure text file because of some Umlauts...

When I do the grepping with grep -a and grep -av the number of lines sum up.

user1022110

Posted 2019-04-15T19:31:02.813

Reputation: 31