Converting djvu to pdf AND preserving table of contents , how is it possible?

update: user3124688 has coded up this process in the script dpsprep.

I don't know of any tools that will do the conversion for you. You certainly should be able to do it, but it might take a little work. I'll outline the basic process. You'll need the open source command line utilities pdftk and djvused (part of DjVuLibre). These are available from your package manager (GNU/Linux) or their websites (Windows, OS X).

step 1: convert the file text

First, use any tool to convert the DJVU file to a PDF (without bookmarks).

Suppose the files are called filename.djvu and filename.pdf.

step 2: extract DJVU outline

Next, output the DJVU outline data to a file, like this:

djvused "filename.djvu" -e 'print-outline' > bmarks.out

This is a file listing the DJVU documents bookmarks in a serialized tree format. In fact it's just a SEXPR, and can be easily parsed. The format is as follows:

file ::= (bookmarks
           <bookmark>*)
bookmark ::= (name
               page
               <bookmark>*)
name ::= "<character>*"
page ::= "#<digit>+"

For example:

(bookmarks
  ("bmark1"
    "#1")
  ("bmark2"
    "#5"
    ("bmark2subbmark1"
      "#6")
    ("bmark2subbmark2"
      "#7"))
  ("bmark3"
    "#9"
    ...))

step 3: convert DJVU outline to PDF metadata format

Now, we need to convert these bookmarks into the format required by PDF metadata. This file has format:

file ::= <entry>*
entry ::= BookmarkBegin
          BookmarkTitle: <title>
          BookmarkLevel: <number>
          BookmarkPageNumber: <number>
title ::= <character>*

So our example would become:

 BookmarkBegin
 BookmarkTitle: bmark1
 BookmarkLevel: 1
 BookmarkPageNumber: 1
 BookmarkBegin
 BookmarkTitle: bmark2
 BookmarkLevel: 1
 BookmarkPageNumber: 5
 BookmarkBegin
 BookmarkTitle: bmark2subbmark1
 BookmarkLevel: 2
 BookmarkPageNumber: 6
 BookmarkBegin
 BookmarkTitle: bmark2subbmark2
 BookmarkLevel: 2
 BookmarkPageNumber: 7
 BookmarkBegin
 BookmarkTitle: bmark3
 BookmarkLevel: 1
 BookmarkPageNumber: 9

Basically, you just need to write a script to walk the SEXPR tree, keeping track of the level, and output the name, page number and level of each entry it comes to, in the correct format.

step 4: extract PDF metadata and splice in converted bookmarks

Once you've got the converted list, output the PDF metadata from your converted PDF file:
```
pdftk "filename.pdf" dump_data > pdfmetadata.out
```
Now, open the file and find the line that begins: NumberOfPages:

insert the converted bookmarks after this line. Save the new file as pdfmetadata.in
step 5: create PDF with bookmarks

Now we can create a new PDF file incorporating this metadata:
```
pdftk "filename.pdf" update_info "pdfmetadata.in" output out.pdf
```
The file out.pdf should be a copy of your PDF with the bookmarks imported from the DJVU file.

pyrocrasty

Posted 2014-08-23T06:36:47.883

Reputation: 1 332

I had a DJVU file with non-numeric text in the bookmark page number fields, so the parser didn't read them. I replaced j.split('#')[1] with (int(re.findall(r'\d+', j.split('#')[1])[0])+1) and it worked great. Debian Jessie needed: sudo apt-get install pdftk djvulibre-bin python-pip ruby ruby-dev libmagickwand-dev; sudo pip install sexpdata; sudo gem install iconv pdfbeads – None – 2016-12-17T23:17:49.977