Finding Image resolution in PDF file?

I have a problem of having some users creating very large PDFs. On the other hands I have PDF sent from our fax machines that are really small in size and totally printable. My question is

Is there any way I can find the resolution (DPI) of the PDF. I search the internet, could not find any answer. Checked the properties of the file, this information was not stored there, at least in my case.
What is the optimum resolution of converting text file into image PDF. 96dpi, 300dpi or more ?
Fun question. Can I resize a PDF which was scanned with high dpi into smaller dpi?

I know some answers might not be available as I have already searched the internet and could not find answers.

Note: My PDF are entirely images, text to images. I am also familiar with primoPDF (free) something you can experiment with

hk_

Posted 2011-11-21T16:22:30.593

Reputation: 1 878

Answers

slhck's answer and scruss' comment deserve to be updated : pdfimages now (at least since version 0.26.5) explicitely lists x-ppi and y-ppi. Here is an sample output :

$ pdfimages -list example.pdf 
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2244  2244  cmyk    4   8  image  no       215  0   301   301  418K 2.1%
   2     1 image     900   600  rgb     3   8  image  no       324  0  1524  1525 35.5K 2.2%

On Debian (Wheezy) and Fedora (23), pdfimages is part of poppler-utils packages.

Skippy le Grand Gourou

Posted 2011-11-21T16:22:30.593

Reputation: 1 349

Mine are all empty – theonlygusti – 2018-09-01T22:02:33.920

I know that you don't want to extract the image data, but this is probably the only way to find out the original resolution.

On *nix, if you have ImageMagick's identify and Xpdf installed¹:

pdfimages -j test.pdf test && for file in $(find . -name "test*.jpg"); do identify "$file"; done

Where test.pdf is your input PDF. The output files are written to test-000.jpg, test-001.jpg, et cetera. This would give you the original size of all the contained images of that PDF².

Example output for a PDF file that only contains one big image:

./test-000.jpg JPEG 2500x1961 2500x1961+0+0 8-bit DirectClass 1.022MB 0.000u 0:00.000

_{1) Windows has these too, but the script would be different of course.}
_{2) Note that images don't really carry DPI information. Simply speaking: That's just something used for printing and images don't need an inherent measure of DPI.}

What is the optimum resolution of converting text file into image PDF. 96dpi, 300dpi or more?

Generally, anything you want to print should be 300dpi or more. Most printers will handle a higher resolution too.

slhck

Posted 2011-11-21T16:22:30.593

Reputation: 182 472

1@scruss As of version 0.34.0, pdfimages -list provides explicitely x-ppi and y-ppi, as well as many other informations. – Skippy le Grand Gourou – 2016-10-16T16:55:45.453

Indeed it now does, @SkippyleGrandGourou : about five years after the question was asked. pdfimages still doesn't apply that resolution/size to images it extracts, though. – scruss – 2016-10-16T17:21:20.877

@scruss Actually, it seems that the resolution given by pdfimages can be quite off (e.g. when the image is larger than its visible area, in a PDF produced by scribus). (Unfortunately I really don't have time to file a bug report now.) – Skippy le Grand Gourou – 2016-10-16T19:39:57.203

2A version of pdfimages (perhaps more recent than the original question) from the poppler project adds the -list option: pdfimages -list test.pdf. Rather than outputting files, this lists size and image type. Still doesn't explicitly give you resolution, but avoids creating output files. – scruss – 2013-10-30T12:42:49.910

For some reason, the latest version of pdfimages that I can upgrade in my CentOS is version 3.04.

So, I don't have the -list option as stated by previous answers. However, the test image created from pdfimages based on slhck's answer contains the desired answer!

identify -verbose test-0000.jpg | more

Image: test-0000.jpg  
Format: JPEG (Joint Photographic Experts Group JFIF format)  
Mime type: image/jpeg  
Class: DirectClass  
Geometry: 6600x5100+0+0  
Resolution: 600x600  
Print size: 11x8.5

So the dpi is explicitly shown on the 6th line using the -verbose option in the identify command.

So, slhck's answer can be modified to the following.

pdfimages -j test.pdf test && for file in $(find . -name "test*.jpg"); do identify -verbose "$file" | awk 'NR==6'; done

On another note, I tried running

identify -verbose test.pdf

Format: PDF (Portable Document Format)  
Mime type: application/pdf  
Class: DirectClass  
Geometry: 792x612+0+0  
Resolution: 72x72  
Print size: 11x8.5

It seems that Imagemagick always assumes a 72dpi and so the information printed here appears to be incorrect.

kykong

Posted 2011-11-21T16:22:30.593

Reputation: 41

This worked with a pdf generated from a Kyocera mfp... This is probably only valid for full-page images like scans.

Open the pdf w/ Reader-
File>Properties -Description tab -Page size. My example said 8.5x11.0 in.
Open the pdf with a text editor (notepad), look for /width and /height
Take the height and width and divide them by the page height and width (in inches)

Example:

5100/8.5=600
6600/11.0=600

My PDF was scanned at a 600x600 resolution.

You can skip the first 2 steps if you know the document size (typically A4 is 8.27x11.69).

Jeff21050

Posted 2011-11-21T16:22:30.593

Reputation: 11

A PDF file doesn't have an inherent resolution, each raster-image within it (if any) will have it's own resolution. I don't know of a simple way to extract a single number for median/modal resolution of embedded image XObjects.

RedGrittyBrick

Posted 2011-11-21T16:22:30.593

Reputation: 70 632

By the way I am not interested in extracting an image data from pdf, I just want to know what was the scan resolution and if it is very high unnecessarily would like to avoid that in future. – hk_ – 2011-11-21T16:39:28.733

@Dave: Actually I meant extract the information about the embedded images not extract the image. But slhck's answer may solve your problem.

– RedGrittyBrick – 2011-11-21T16:55:39.417

To answer your second point, in addition to @slhck 's mention about printer dpi, 300dpi is also the typical minimal recommended number for OCR with font-sizes of 10+pt.

Further, a modern 15" 4K laptop screen also only has about 280PPi, so if you want to view an entire A4 on the screen (landscape) there is no point scanning at higher than ~320 dpi, because any document higher than that will be scaled down. Of course, this doesn't matter if you plan to zoom in, then you might need higher dpi.

To answer your other two points, nowadays at least you can use Acrobat Pro to check image DPI and resolution, and you can edit it too.

jiggunjer

Posted 2011-11-21T16:22:30.593

Reputation: 831