Is it possible to check a PDF for data corruption?

4

I have some PDF documents and I'd like to check them for possibile data corruption, even if I'm able to display them without problems. I don't really know if PDF documents store an embedded checksum string for this kind of purposes. My operating system of choice is GNU/Linux. Thanks.

Francesco Turco

Posted 2010-04-11T08:42:50.530

Reputation: 471

If they display OK why do you suspect they're corrupt? – Hugh Allen – 2010-04-11T11:06:11.167

I don't think they are corrupted. I just have to archive them and preserve them from future corruption. So I should choose between computing a MD5/SHA1/SHA2 checksum myself or relying on an embedded checksum. – Francesco Turco – 2010-04-11T18:10:15.533

Just use a free tool which gives you a checksum and provide it with the pdf (in a zip package for example). – Apache – 2010-05-15T18:11:04.427

Answers

2

Browsing through PDF Reference sixth edition (2006), it appears that PDF files do not have an overall checksum, though embedded files within the PDF (similar to attachments in an email message) may optionally have an MD5 hash.

You should therefore archive your PDFs in a container which supports error detection / correction. For example, a zip file, or optical media (CD-R etc).

Hugh Allen

Posted 2010-04-11T08:42:50.530

Reputation: 8 620

File compression(zip,rar) is also known to use CRC same as used on optical medias. – Algific – 2010-04-12T12:35:23.873