2

Hey, I am on a HP-UX server here. When recursively grepping a directory tree, I have problems when the tree also contains binary files: grep treats them as text files and displays very long lines containing a lot of non-printable characters. This not only makes the output hard to scan, but also often makes my terminal unusable (and writes funny strings to its title).

GNU-grep has an option --binary-file= which would help (and it does not print the matching line anyway for binary files), but I do not have GNU-tools availabe.

Is there a way to simulate the behavior of GNU-grep or to ignore files that look like they are binary?

Btw. if there is an easy way to do this in perl, that would be fine, too.

0x89
  • 6,345
  • 3
  • 21
  • 14

2 Answers2

3

Building on the previous answer, you can use the "file" command to identify text files, and then limit your grep to only those files. For example:

  find dir -type f -print |
    xargs file |
    grep text |
    cut -f1 -d: |
    xargs grep "expression"

That's:

  • Find all files in directory "dir"
  • Pass these as arguments to "file"
  • Look for output from "file" containing the word "text"
  • Chop out the first colon-delimited field and use it as a filename
  • Search these files using grep.

This will fail in the case of filenames containing whitespace or colons, but will otherwise do what you want.

larsks
  • 41,276
  • 13
  • 117
  • 170
  • That one works a bit better, but the file command on hp-ux seems to be as bad as the grep command - 'file | grep text' is not enough to weed out the binary files.. – 0x89 Dec 21 '09 at 14:05
1

There might be a better way, but maybe pass all the files to a shell loop, and do something like the following with the file command:

if file "$i" | grep text; then
  ...
fi

...?

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444