3

I am importing data from a large text file into a database and am getting an error on line X of the file. If I look at the line with the less viewer I do not see anything strange, because, most probably, the line has non printable characters. Then I tried to sed the line and check it with a hexdump:

sed -n 2540283p 30gb_large_file.fzp | hexdump -C

again, nothing, most probably, because sed filtered out all non-printable characters.

Any comments how I could see what's happening on the specific line of a large file in hex?

arthur
  • 167
  • 1
  • 8

1 Answers1

1

sed should not be "[filtering] out all non-printable characters" - you're not telling it to do so. In fact a simple test on a convenient binary file (the FreeBSD kernel) demonstrates that this is not the case - sed happily passes non-printable characters.

Shame on you for publicly accusing poor innocent sed of doing something heinously wrong without giving it the benefit of a proper test first -- I'll leave it to your guilty conscience to come up with an appropriate act of contrition!

If sed is not giving you any output it's because there's nothing to give -- either that line doesn't exist (maybe the file ends abruptly - Didjya check with wc -l? Maybe there's an EOF in there somewhere it shouldn't be and your program is aborting when it sees it?).
It's also possible that the line in question consists of just a newline or a NUL character (which sed should dutifully return, but which wouldn't be of much use to you in a hexdump)...

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • An appropriate act being feeding `sed` a steady diet of 1s and 0s. – Nathan C Dec 10 '13 at 18:31
  • 1
    After 10 hours of crunching the issue I found the problem. Since the file was huge, the input was compressed. I was decompressing it on the fly, piping it to my program, and importing the data from the stdin. Unfortunatelly, the data was compressed with rar. Even more unfortunatelly, the `unrar p` (decompress to stdout) buggily decompressed my data, causing the error. Completely unfortunatelly, `unrar x` decompressed everything without any errors whatsoever (not showing the symbols `unrar p` was generating). What a day. – arthur Dec 10 '13 at 18:50
  • @arthur that is a delightfully disgusting situation. Have a free shot of the debugging tequila. (And add one more reason to the "Why I despise `rar` archives" list) – voretaq7 Dec 10 '13 at 18:54