Can text be extracted from a PDF with an “Invalid XRef entry” error?

4

I have a PDF which I’m trying to read, but won’t open in Adobe Reader. When using pdftotext, I saw it said “Invalid XRef entry.” PDFtk and Ghostscript haven’t been able to parse the file. I tried to repair it manually, but quickly realized that it was way over my head.

I was wondering if there’s any way to recover any text from the file? I can see a lot of the image resources, but none of the text is clearly there. Does anyone know if it can be recovered?

KnightOfNi

Posted 2015-10-01T01:44:14.977

Reputation: 177

Can we see the PDF file? – Edi – 2015-10-01T07:55:47.910

One of the most lenient readers in terms of handling broken PDFs is IMO the Chrome browser default PDF reader (based on pdfjs). Could give that a try and see if it renders your file – Edi – 2015-10-01T07:57:45.527

@Edi It just says "failed to load pdf document." That was a good thought, though. – KnightOfNi – 2015-10-01T21:39:28.147

Answers

0

Manually noodling around in a PDF is guaranteed to fail (unless you really know what you are doing, and how to do it).

If the current version of Acrobat (Reader) is not able to fix the problem, you might try to get your hands on Acrobat/Reader 7 or or even older; older Acrobat/Reader versions did more attempts to repair messed up documents than newer ones.

Otherwise… chances are pretty small that you can fix it.

Max Wyss

Posted 2015-10-01T01:44:14.977

Reputation: 1 481