0

I know how to use imagemagick's convert to render the PDF and generate new images from the PDF page, including both the bitmaps and the vector images rendered on the desired resolution.

But, the problem with that approach is that the bitmap images are re-sampled to the new resolution. What i'd like to be able to do is to extract the bitmap images exactly as they are stored in the PDF.

I want this to improve contrast on scanned PDFs, where the PDFs are nothing more than an archive for the bitmap images. E.g. http://www.datamath.net/Manuals/TI-66_Manual_US.pdf

I'd want the very first step to be just to extract the as-original-as-possible bitmaps from the PDF.

Note: I am limiting this to imagemagick so that the solution is portable. But if you know the same can be done with unix tools as common as imagemagick is, please do share!

gcb
  • 253
  • 3
  • 16

1 Answers1

1

(feel free to add answers if there is a way to use imagemagick)

Found[0] a solution using poppler which i think is as popular as imagemagick

pdfimages -all -p TI-66_Manual_US.pdf ./

The above will extract all image formats from the pdf to the local directory and add the page numbers. For some reason it adds "." in front of the filename, so just run...

for f in .*jpg; do mv $f a$f; done

...to add a "a" in front of the dot so it is easier to work with them.

[0] source: https://www.cyberciti.biz/faq/easily-extract-images-from-pdf-file/

gcb
  • 253
  • 3
  • 16