How do I find corrupted PDF files?

3

3

I have over 100,000 .pdf files. Among them I need to find out the corrupted files.

Is there a way to get the files which are corrupted – or vice versa, get those that are working (in an automated way rather than manually examining the files one at a time)?

I searched a lot but could not find any. All the results were showing me software to fix broken PDFs.

user1917830

Posted 2011-08-22T07:44:04.010

Reputation: 133

Question was closed 2017-07-12T00:06:32.923

Maybe also loosely related: How do I find and remove corrupt images from directory?  and Automating the scanning of graphics files for corruption.

– Scott – 2017-07-12T21:19:05.653

What's your definition of corrupted? Unreadable by Adobe Reader? Zero pages long? ... – None – 2011-08-22T07:47:14.177

Yes, that cant be open with adobe reader. – user1917830 – 2011-08-22T07:49:01.277

Answers

0

You could use something like Ghostscript to read them all and convert them to bitmap images which are not written to a file (e.g. on Linux redirect output to /dev/null). A script could check for return codes and error messages.

RedGrittyBrick

Posted 2011-08-22T07:44:04.010

Reputation: 70 632