Find files with string

1

How can i grep e move pdf files that containing the string "RELAÇÃO DOS TRABALHADORES" inside? Using ubuntu 14.4.

I already try to use:

grep -i -Z -r -l 'RELAÇÃO DOS TRABALHADORES' . | xargs -I{} mv {} ./destination

grep -lir 'RELAÇÃO DOS TRABALHADORES' target/* | xargs mv -t destionation/

mv `grep -lir 'RELAÇÃO DOS TRABALHADORES' target` destination/

But none of this work, at all. No errors, no outputs, nothing.

Thanks.

FXux

Posted 2016-03-18T12:54:25.690

Reputation: 47

I cannot check it at present, but if you can generate correctly the list (put attention to the spaces) then you can pipe to awk and create the command to move them. Mind the "" before and after the full path and filename. – Hastur – 2016-03-18T13:15:44.900

Your main problem is that text in PDF files is encoded, so will never be found by grep. @techraf's answer suggests using pdfgrep, but other filters are possible, such as using lesspipe or pdftotext. – AFH – 2016-03-18T13:58:25.747

Answers

3

You should install pdfgrep package with:

sudo apt-get install pdfgrep

and run:

pdfgrep -Hc 'RELAÇÃO DOS TRABALHADORES' target/* | cut -d : -f1 | xargs -I{} mv -i '{}' ./destination

Test first!

Replace mv with echo mv:

pdfgrep -Hc 'RELAÇÃO DOS TRABALHADORES' target/* | cut -d : -f1 | xargs -I{} echo mv -i '{}' ./destination

and see if you get correct mv commands with arguments.


Just for safety I suggest explicitly adding -i argument to mv so that it will demand confirmation if executing would result in overwriting an existing file.

techraf

Posted 2016-03-18T12:54:25.690

Reputation: 4 428

I installed the pdfgrep. How could i use? – FXux – 2016-03-18T13:19:34.357

Done, just please always test and see before executing the real mv command. – techraf – 2016-03-18T13:26:27.030

grep will never find text content in PDF files, though it will find some of the PDF control strings, such as job/endjob, etc. – AFH – 2016-03-18T13:56:53.840

IUPI! It works!!!!! – FXux – 2016-03-18T14:06:15.720

Hei techraf, i was wrong, is not working :(.. i have 100 pdfs with this string inside and is moving 90% of the files. – FXux – 2016-03-18T14:46:19.863

I'm afraid if pdfgrep does not find the string there is not much you can do. Are all PDFs searchable ie. can you find the string using some reader search function (like Ctrl+f)? – techraf – 2016-03-18T14:48:23.743

Thats not the case at all i think @techraf cause when i try to use "pdfgrep -in 'RELAÇÃO DOS TRABALHADORES' PDF/*.pdf" will print the files with the string – FXux – 2016-03-18T14:50:46.380

No solution for that i'm guessing @techraf?? – FXux – 2016-03-18T15:11:32.647

You didn't say if you can find the string in a pdf-reader. There could be extra whitespace or newlines between the words. If you shorten the searchphrase, is there any chance for false positives? – Lenne – 2016-03-24T00:14:44.603