dvju to pdf including text layer

1

1

Trying to convert djvu with text layer to pdf with text layer. I've tried all the methods in this post and none of them preserve the text layer.

What options do I have?

nullUser

Posted 2016-01-22T23:29:23.040

Reputation: 593

Answers

1

As far as I know you have two options:

  1. Use ocrodjvu and pdfbeads as described here.

    The relevant commands assuming that your DJVU file is called sample.djvu and you want to convert page 10 to PDF including the text layer:

    djvu2hocr -p 10 sample.djvu | sed 's/ocrx/ocr/g' > pg10.html

    ddjvu -format=tiff -page=10 sample.djvu pg10.tif

    pdfbeads -o pg10.pdf

  2. Use Djview4 to convert the DJVU file to PDF and then use PDF-XChange Viewer to perform OCR. It takes time but it is damn good (even on two-column documents).

In principle the two options should work on Mac, Windows and Linux. For option 2. you will need Wine on Mac and Linux.

I tried option 1. with a single page and it did not finish in less than 10 minutes on a recent laptop with a quad-core processor and 8 GB of RAM. YMMV.

Option 2. took two hours on a 50 page document on a recent desktop computer with a quad-core processor and 16 GB of RAM, but the results are impressive.

Marduk

Posted 2016-01-22T23:29:23.040

Reputation: 131