How can I fix/repair a corrupted PDF file?

85

53

Does anyone have any recommendation or procedures for repairing a corrupt PDF? When I open the file I get "There was an error opening this document. the file is damaged and cannot be repaired." There seems to be a myriad of tools out there but none that I could describe as reputable. Are there any opensource linux based solutions for this possibly?

Tim Alexander

Posted 2011-05-03T14:35:04.287

Reputation: 1 798

Opensource PDF tools tend to be pretty crappy, I'm afraid. What are you using? – Satanicpuppy – 2011-05-03T14:38:49.960

didnt like the look of any of the tools as they looked like the myriad of "Registry Cleaners" out there that are useless. Have been trying Adobe Pro and have just started looking if Ghostscript or PDFForge have any repair switches. – Tim Alexander – 2011-05-03T14:48:26.580

Ghostscript is okay, but it's certainly not better than Acrobat. It's completely bare bones. – Satanicpuppy – 2011-05-03T18:41:07.867

6@Satanicpuppy I disagree :: I use ghostscript to rebuild damaged or low-quality pdfs quite often and it performs very well. – Eddie B – 2013-02-05T20:16:33.227

I've used qpdf to repair forms that pdftk couldn't open. – Jon Hulka – 2013-09-08T17:26:24.320

Answers

104

Ghostscript will repair your corrupted PDF automatically... if it can open it in the first place (that is, if it is not damaged beyond repair). But afterwards you'll still need to double-check the result...

On Linux, try this command:

 gs \
  -o repaired.pdf \
  -sDEVICE=pdfwrite \
  -dPDFSETTINGS=/prepress \
   corrupted.pdf

On Windows, try this one:

 gswin32c.exe ^
  -o repaired.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/prepress ^
   corrupted.pdf

Kurt Pfeifle

Posted 2011-05-03T14:35:04.287

Reputation: 10 024

1The /prepress make the quality really good compared to /screen. Thanks. – Dolanor – 2015-09-13T22:17:44.523

I get "An error occurred while reading an XREF table." What does that mean? – Geremia – 2019-06-18T15:26:35.347

It means the internal table of contents (what PDFs have to contain as XREF table) had an error, pointing to a wrong byte offset for a PDF object. Ghostscript very likely repaired that error and inserted a correct XREF table into the output. You can check this by running the output through Ghostscript one more time and see if this message still appears. – Kurt Pfeifle – 2019-06-18T18:13:17.663

2Ghostscript does a fantastic job of rendering pdfs ... I regularly use gs to rebuild pdfs to improve font quality. – Eddie B – 2013-02-05T20:14:15.853

40

I had a corrupted PDF file, print.pdf , that Ghostscript couldn't open, but the usual graphical Linux PDF viewers (Okular, Evince) opened fine. (In my case, the file had garbage at the start instead of a PDF header, when opened in a hex editor.)

These PDF viewers use Poppler as a back-end PDF renderer. So you can repair the PDF using Poppler's command-line tools. In Ubuntu these are in the poppler-utils package. I used:

pdftocairo -pdf print.pdf print_repaired.pdf

which generated a PDF file with correct headers, which tools like Ghostscript now accepted.

Mechanical snail

Posted 2011-05-03T14:35:04.287

Reputation: 6 625

This didn't work for at least one weird PDF I came across, but it seems like a good start. – Brian Peterson – 2014-11-11T20:00:40.597

1Works perfectly on a PDF on which Ghostscript wanted to remove some arbitrary elements on pages. – Andrea Lazzarotto – 2014-11-22T16:14:37.643

Ghostscript failed to read the document but this worked like a charm. BTW I did this on Windows using the new linux subsystem, so cool! – HyLian – 2016-06-05T17:44:01.317

3+1 this read my Quartz generated PDF without complaints, and immediately started generating output. Ghostscript, Adobe Acrobat Pro and others insisted on rebuilding my 120GB pdf first. – Orwellophile – 2013-12-14T14:17:46.880

26

mutool (project page, manpage) will repair broken PDFs without printing them.

  • Installation e.g. on Ubuntu: sudo apt-get install mupdf-tools
  • Run it like this: mutool clean input.pdf output.pdf
mutool clean [options] input.pdf [output.pdf] [pages]

  The clean command pretty prints and rewrites the syntax of a PDF file.
   It can be used to repair broken files, expand compressed streams,
   filter out a range of pages, etc.
  If no output file is specified, it will write the cleaned PDF to
   "out.pdf" in the current directory.

Alternatively, there are a few tools and frameworks that can decompose/decompile PDFs into their components without rendering them. These could be useful for extracting text, scripts, and images. See this answer for a list of such tools: https://reverseengineering.stackexchange.com/q/1526/8210. E.g. you can try the current top answer Origami, it has a GTK-based viewer.

jmiserez

Posted 2011-05-03T14:35:04.287

Reputation: 977

3This solution works "better" than the solutions offered above or better ranked, as it does not "print" the PDF file and keeps active the links, clickable items, etc... To me, it sounds a more elegant solution than using ghostscript or cairo. – Speredenn – 2015-06-05T15:21:11.097

1Unfortunately, mutool clean doesn't fix all possible errors. I have a file that has various errors in the font and content streams, and mutool will keep those errors. – Dominik Honnef – 2016-06-09T20:52:50.160

1

@DominikHonnef You can always try tools/frameworks that decompose the PDF and allow you to view all the parts without rendering them. That should enable you to get text, scripts, images, etc. directly. See this answer for a list of tools: http://reverseengineering.stackexchange.com/q/1526/8210

– jmiserez – 2016-06-24T10:29:55.963

Only thing that worked for me! – jamadagni – 2017-08-22T14:37:14.463

This worked better since this does not render the pdf it examinate the document. – riccs_0x – 2017-10-04T00:28:44.807

Worked for me, where gs and pdftocairo failed – antonio – 2019-03-22T23:00:18.930

10

I had a corrupted pdf file, because the php file used to download it echoed some errors (in HTML) and NUL characters at the end.

The solution was to open the pdf with Notepad++ and remove all text after the line

%%EOF

Oriol

Posted 2011-05-03T14:35:04.287

Reputation: 1 199

I had a PDF with two %%EOF. I deleted everything after the first %%EOF using a hex editor. Now everything works fine. – Adrian – 2017-06-17T08:21:17.503

had same, Adobe Reader didn't open but native Mac, Chrome and Firefox PDF plugin displayed PDF file fine. Reason was also extra "NUL" at last line added during the upload. – Tilo – 2014-04-08T19:23:42.673