Shell command to find files containing one word but not the second word

5

All

I have the below two files in my linux machine and I wanted to find out file which contains "word1" and doesn't contain "word99"

file1.txt
  word1
  word2
  word3
  word4
  word5

file2.txt
  word1
  word2
  word3
  word99

I have been using the below command for files including "word1", but couldn't find any information on how to modify it to get the filenames containing "word1" but not "word99"

find . -name '*.*' -exec grep -r 'word1' {} \; -print > output.txt

Any pointers would be helpful.

Thanks Sandy

Sandeep K Gujje

Posted 2016-08-07T17:26:05.073

Reputation: 51

Answers

5

    $ grep -lr 'word1' * | xargs grep -L 'word99'
    file1.txt

where:

    -l, --files-with-matches
         Only the names of files containing selected lines are written
         to standard output.
    -R, -r, --recursive
         Recursively search subdirectories listed.
    -L, --files-without-match
         Only the names of files not containing selected lines are written
         to standard output.

In the first part of the command before the pipe, we get:

    $ grep -lr 'word1' * 
    file1.txt
    file2.txt

The above command recursively parses the files inside the subdirectories and lists the files that contain the word word1, i.e. file1.txt and file2.txt.

Later in the second part | xargs grep -L 'word99', the pipe sends file1.txt and file2.txt as input to xargs which provides them to grep as arguments. grep then lists the file that does not contain word99 using the -L option, i.e. file1.txt.

We need xargs here since in the first part of the command, we get file1.txt and file2.txt as the output on the stdout. We need to parse the contents of these files and not the strings file1.txt and file2.txt.

The following command also gives the same result(reversing the way we search/exclude the strings):

      $ grep -Lr 'word99' * | xargs grep -l 'word1'
      file1.txt

Santios

Posted 2016-08-07T17:26:05.073

Reputation: 51

1grep -r … * is almost always better written grep -r … .. The asterisk version gets ugly if there are too many files in the current directory, etc. – Eric – 2016-08-25T01:44:58.713

0

This finds files that contain word1:

$ find . -name '*.*' -type f -exec grep -q 'word1' {} \; -print
./file1.txt
./file2.txt

This finds files that contain word1 but not word99:

$ find . -name '*.*' -type f -exec grep -q 'word1' {} \; '!' -exec grep -q 'word99' {} \; -print 
./file1.txt

To save the output in a file:

find . -name '*.*' -type f -exec grep -q 'word1' {} \; '!' -exec grep -q 'word99' {} \; -print >output.txt

The test -exec grep -q word99 {} \; returns True for files with word99. We put ! in front of it to negate the return value. Thus, ! -exec grep -q word99 {} \; returns True for files that do not have word99. The ! is in single-quotes because, if history expansion is turned on, ! can be a shell-active character.

Notes:

  1. The -q option was added to grep to make it quiet. With -q, grep will set the correct exit code but it does not display matching lines on stdout.

  2. The -type f test was added to find so that it only returns names of regular files.

John1024

Posted 2016-08-07T17:26:05.073

Reputation: 13 893

Thanks John for the answer, but what if I have to do a search in all folders(recursive). Weather adding just a "-r" works ? – Sandeep K Gujje – 2016-08-08T05:35:04.470

@SandeepKGujje find, itself, does a recursive search on all folders. – John1024 – 2016-08-08T06:19:03.243

0

Your question title says "files containing" a word. However, in your question, you do mention "get the filenames containing" a word. These are different things. Fortunately, they are both rather simple, so I will simply show you both.

To find files containing a word:

grep -iR "word1" .

The -i says to ignore case. The -R is recursive (meaning sub-directories are searched). (Capital letter is documented by OpenBSD and more similar to ls, so I prefer that over -r.) The period specifies where to start looking.

To find filenames containing a word:

find . -iname "word1"

The -iname is a case-insensitive version of "name".

The period specifies where to start looking. The current directory is often a good choice.

Note: You referenced "." in one of your examples. That was great for DOS, and typically good in Microsoft Windows, but is a really bad habit for Unix environment. Seeing that makes me think you're familiar with Windows. Well, understand that in Windows, "FIND" (or "find") locates text in files. Unix is different: "grep" locates text in files, and "find" locates filenames.

Now, to exclude word 99, and to place that in a text file, add the following text:

| grep -v word99 >> output.txt

This is the pipe key, almost always Shift-Backslash.

So, as an example, if you wanted to do both, use:

grep -iR "word1" . | grep -v word99 >> output.txt
find . -iname "word1" | grep -v word99 >> output.txt

The part before the pipe character will run a command, and send the output into a Unix-style pipe. Then, the content gets sent from the pipe into the next command's standard input. grep -v will look at the standard input it receives, and exclude what you want. grep -v will send the remaining results to its standard output. The >> will redirect the prior command's standard output to the end of the specified text file.

The reason why you don't see documented options in the "find" command, about how to exclude text, is that Unix was very heavily designed with this idea of making simpler programs, and using the piping technique to cause elaborate effects. In the Microsoft environments, old Microsoft code was particularly more cumbersome with pipe-handling, so programs basically tried to incorporate more functionality into each program. On one hand, that seems simpler for the end user (having everything built-in), but that approach lacks consistency. When you're using Unix, don't be afraid of the piping: once you get used to it, you may find it simplifies things greatly, but cause you can use your simple tools in many situations, and so you don't need to re-learn simple techniques over and over (for each different program).

TOOGAM

Posted 2016-08-07T17:26:05.073

Reputation: 12 651