How to import, export and edit bookmarks of a pdf file?

I heard that bookmarks of a pdf file are stored in plain text somewhere in the file. I was wondering if it is possible to import and export bookmarks of a pdf file into and from a text file, for batch processing?

If yes, is there any description on the syntax for editing the text file containing bookmarks of a pdf file?

I was hoping for free software solutions for Ubuntu 10.10 and for Windows 7.

Thanks and regards!

Tim

Posted 2011-04-28T06:17:22.943

Reputation: 12 647

Answers

There's quite a variety of tools that can extract bookmarks from a pdf to a plain text file, and vice versa. Some of which are as follows:

pdftk
iText toolbox (older versions only, get itext-2.0.1.jar)
pdfWritebookmarks tool that I use
JPdfBookmarks which even has a GUI.

Also, I have a script that can convert between the formats of many of these tools : bmconverter.py.

Another very nice way is to add bookmarks to a pdf via pdflatex.

Michael Goerz

Posted 2011-04-28T06:17:22.943

Reputation: 560

You can use pdftk for this. More info: How to Export and Import PDF Bookmarks.

Export PDF bookmarks on the command-line like this:

pdftk C:\Users\Sid\Desktop\doc.pdf dump_data output C:\Users\Sid\Desktop\doc_data.txt

Import PDF bookmarks from a data file like this:

pdftk C:\Users\Sid\Desktop\doc.pdf update_info C:\Users\Sid\Desktop\doc_data.txt output C:\Users\Sid\Desktop\updated.pdf

pdftk bookmark format is a little bit tedious to write. Instead I created my own script using bash, sed, pdftk and python3. Check it out at this repo: https://github.com/SiddharthPant/booky

So now I can create a text file(bkmrks.txt) like this which takes just 5 minutes to write even for a 1000 page pdf.

{
  Title1, 1
  Title2, 2
  {
    Subtitle1, 3
    Subtitle2, 4
    {
      SubSubtitle1, 5
      ...
    }
  }
}

and then use my script

./booky.sh pdf_file.pdf bkmrks.txt

this automatically creates a pdf(pdf_file_new.pdf) that has my bookmarks in it.

This is going to work in *nix systems if instead you are on a Windows machine. Then first install python3 and pdftk just use the booky.py file in the repo to convert bkmrks.txt to pdftk compatible format

python3 booky.py < bkmrks.txt > output.txt

and then use the export command to generate a dumped data file. Remove the previous bookmarks from that file and insert content of output.txt instead using a simple copy paste. And then import that data back.

Siddharth Pant

Posted 2011-04-28T06:17:22.943

Reputation: 141

If you have a version of a document that has bookmarks and want to copy them over, a much simpler way is to use PDF-XChange Viewer (I used v2.5.211). Open the PDF that has the bookmarks (the source PDF), select all the bookmarks in the bookmarks pane, copy them using Ctrl+C, open the PDF that doesn't have the bookmarks (the target PDF), and paste them (Ctrl+V) in that PDF's bookmarks pane. PDF-Xchange Viewer preserves bookmark properties as they were from the source PDF (including any bold / italic formatting on the bookmark text). If for some reason some of the sections of the target PDF are lower or higher due to revisions made to the document, you can click the bookmark needing correction, scroll to where on the page you'd like the bookmark to open to, right-click the bookmark again and click "Set Destination". Repeat this last part as needed for any offending bookmark. Save the target PDF when finished.

This worked great for me, was quite intuitive, and I was done in a few minutes. In my particular scenario, a co-worker had produced a very long document using Word for Mac which didn't have bookmarks. Due to the length of the document, I wanted bookmarks corresponding to the document's outline. I could get Word for Windows to save the document as a PDF with bookmarks, but some formatting differences between Word for Windows and Word for Mac threw off the page count quite off (in particular, there were differences in white space around footers, and differences in the spacing between figures and the caption). I was able to play around with the headers & footers and figure sizes to get the pagination correct in Word for Windows, then saved to PDF w/ bookmarks. Unfortunately, there still were some differences in the formatting such that I wished to just apply the bookmarks to the original PDF, and that's when I figured out the solution above.

Jason

Posted 2011-04-28T06:17:22.943

Reputation: 41

1+1 for PDF-Xchange. The less tools the merrier – Ooker – 2017-10-26T17:47:39.333

To export bookmarks, I follow a different approach that requires the use of Microsoft OneNote:

I open the PDF reader (I use the free version of Foxit) with the bookmark structure visible and then, in OneNote, I ask to take a snapshot, and select the Foxit bookmark structure.

Back to OneNote, I select the "Copy text from image" option (in the menu that appears after right-clicking the snapshot image), and I paste it on the side, to correct the indentation (usually with bullets).

C.Delgado

Posted 2011-04-28T06:17:22.943

Reputation: 11

HandyOutline. 1 drag, 1 click, done. https://sourceforge.net/projects/handyoutlinerfo/. Free. Indents sub-bookmarks. Doesnt require any PDF reader/editor. Also edit, export all details to text (copy into word write a macro to tidy it into a fully functional word document) or XML, repaginate, import to PDF. Dev deserves donations.

PDF-Xchange Editor (replaced PDFViewer) randomly duplicated/missed bookmarks exported to text

JPDF required Java, exported formatting garbage, couldnt clean it to get the names only

PDFtk gave me a headache just looking at the instructions

:-)

Piecevcake

Posted 2011-04-28T06:17:22.943

Reputation: 171

Love that this one exports to XML, instead of a more idiosyncratic format. The drag and drop interface for exports couldn't be simpler also.

I only wish it could do multiple at once. – Evan Donovan – 2019-04-18T14:02:06.350

The specification for PDF files is available as a freely downloadable PDF from Adobe - or at least it was last time I checked. However, most PDF files have most compressible data in them compressed. There probably was a basically plaintext version of PDF once upon a time, and if so it will still be valid now, but actually getting a file in that form may be a problem.

Although I haven't done it, one very likely possibility (if you're willing to pay) is to buy Acrobat Pro, and to use the Javascript scripting abilities built into that application. To get you started...

http://acrobatusers.com/tutorials/2008/10/auto_bookmark_creation

This tutorial shows how to create bookmarks automatically using Javascript in Acrobat 7.0 Pro (the version included in Creative Suite CS2). Although that's getting a bit old, the same technique should work fine for newer versions.

Adobe applications do include a library for reading/writing text files using Javascript (something that Javascript doesn't have as standard), so it's possible to write your own import/export scripts, though non-trivial to make those scripts robust.

Steve314

Posted 2011-04-28T06:17:22.943

Reputation: 1 569

Thanks! Is there a Linux version of Acrobat Pro ? – Tim – 2011-04-28T06:58:46.700

Sorry - I very much doubt it. AFAIK its a Mac or Windows thing, and Adobe are unlikely to support Linux unless a huge number of creative professionals (1) start using that platform, and (2) show that they're willing to pay lots for proprietary software rather than use FOSS alternatives. Seems unlikely. For a free solution, you might try a library such as http://blog.rubypdf.com/2007/12/12/podofo-an-open-source-library-that-parse-pdf-files-and-modify-their-contents-into-memory/ (for Ruby). I know even less about this - I just found it on Google.

– Steve314 – 2011-04-28T07:10:19.987

I found another rather "stupid" solution to copy all of the bookmarks in a PDF as a text for use elsewhere. In Acrobat Pro (for Mac OS) there is no way to select all the bookmarks and copy/paste them in a Wordprocessor. You can however export the whole PDF as an HTML-file with the option "one single HTML-page + add navigationframe based on bookmarks". Then open the HTML in a browser, select all text in the navigationframe and copy/paste it to a Wordprocessor...

Johan Morris

Posted 2011-04-28T06:17:22.943

Reputation: 1

To read all bookmarks from a PDF to a text file, you can use this command with pdftk:

pdftk input.pdf dump_data output output.txt

I then used regex on Notepad++ to remove the extra parts. The following I replaced by an empty string (in order), and then I ended up with a list of bookmarks (don't forget to replace using regex in your text editor):

BookmarkLevel.*
BookmarkPageNumber.*
BookmarkBegin.*
\n\s+\n

If you want to remove the numbers, replace this expression:

BookmarkTitle: A8.\d.\d+\s

The Quantum Physicist

Posted 2011-04-28T06:17:22.943

Reputation: 648