Editing PDF via text editor

2

1

I'm trying to add page labels to a PDF file by modifying the file directly with a text editor.

When I open the PDF in a text editor and save it, without making any changes, the file becomes corrupted and can't be opened by Adobe Reader.

Why does this happen?

The solution that came to my mind is using HEX editor, but that doesn't seem to be a comfortable way of working with files. Is there any other way?

As a text editor, I use Sublime Text.

Draex_

Posted 2016-12-01T13:19:00.257

Reputation: 185

1The problem is probably relating to the text encoding. You should check which encoding the text editor is defaulting to and change it if necessary. – James P – 2016-12-01T13:34:04.630

I've tried using several encodings with no success. Which encoding should I use? The file is mostly binary. However, since I'm not changing the file, I don't understand why encoding matters. – Draex_ – 2016-12-01T13:40:28.527

2Well, PDF's aren't designed to be edited this way anyway, but if your text editor attempts to change the encoding then it just makes matters worse. Have you tried using Notepad++ instead? If I open a PDF and save it then it still seems to work. – James P – 2016-12-01T13:51:35.730

1The question is not "which encoding should I use", the point is that your text editor probably assumes the PDF binary data is text in some particular encoding, and makes some changes that are valid for that particular encoding (like adding BOM marks), but that are invalid for the PDF binary data. So your text editor does make changes just by opening the file. Fix the problem by using a text edit which doesn't do that. The next problem is that by editing the file, you'll make the xref table invalid, so you need to recompute it. – dirkt – 2016-12-01T13:55:01.903

Thanks guys, using Notepad++ solves the issue. @dirkt Even though I didn't touch the xref table, the document opens okay. Any idea why? xref table should contain byte offsets of several objects in the file, right? Positions of objects are now changed. – Draex_ – 2016-12-01T14:25:05.487

1Some viewers repair the xref table automatically if they detect that it's corrupt, some don't. I'm on Linux and use mainly xpdf and mupdf, so I can't tell you what Windows viewers do. But if the position of the objects changed, the xref table is corrupt and should be regenerated if you want to have a standard-conforming file. – dirkt – 2016-12-01T14:59:01.317

One thing that can happen: your editor may strip trailing spaces when you save the file, for instance, which can make the PDF no longer valid. (Happened to me just now.) – ShreevatsaR – 2017-09-18T22:24:36.683

Answers

1

Using Notepad++ instead of Sublime Text solved the issue.

Apparently, my Sublime Text made some changes to the file even when it wasn't asked to do so.

Draex_

Posted 2016-12-01T13:19:00.257

Reputation: 185