How can I deskew and crop PDFs made from scanned pages *automatically*?



Possible Duplicate:
Which free software can I use to deskew scanned images

I have several PDFs made up of book pages' scans. The scans are made from two pages at a time and some of these scans are skewed, making text appear slightly tilted.

I'm looking for a tool that could allow me to do an automatic optimization by deskewing the scans without losing readability. I've found the GPL software Briss to crop the scans in order to have a 1:1 page ratio instead of 2:1, but I don't have any tool to deskew the pages.

I stumbled upon unpaper, another open source tool that seems perfect for what I want to do, but that tool is Linux only and it doesn't work on PDF files directly.

Any hint is appreciated.

Pietro M.

Posted 2012-07-04T15:53:23.997

Reputation: 229

Question was closed 2012-07-08T03:23:13.277

1@random: Why has this question been closed?? Why should this topic solicit 'debate, arguments, polling or extended discussion'?!? – Kurt Pfeifle – 2012-07-08T00:40:47.397

1"looking for a tool" is pretty much polling for services leads to the not constructive close reason @kur – random – 2012-07-08T00:43:08.343

1@random: This question led me to do some research about the topic, and I found some interesting options to persue. The most interesting one is using ImageMagick for this, and it seems surprisingly simple. Unfortunately your closing of this does not allow me to post my answer. – Kurt Pfeifle – 2012-07-08T00:58:25.800

@random: I'v now edited the question a bit. Hopefully it is now more compliant to your sense of 'constructiveness'. – Kurt Pfeifle – 2012-07-08T01:50:41.193

@random: Ok, 'closing as duplicate' is better acceptable to me in this case. – Kurt Pfeifle – 2012-07-08T10:00:15.087

@random: I don't agree that this is a duplicate. I was looking for something that operates directly on PDF files. If I need to do a conversion PDF -> Image and Image -> PDF, I have two more steps in which I have a quality loss. – Pietro M. – 2012-07-09T11:52:36.177



Have a look at deskew. It's a commandline tool. The download *zip seems to include binaries for Windows, MacOSX and Linux.

License is MPL (Mozilla) or LPGL (GNU), whatever you prefer.

The only drawback for you seems to be that it doesn't consume PDFs, only PNG and TIFF images (AFAICS). That means you'll have to set up a workflow of like:

 PDF.orig -> PNG.orig -> PNG.deskewed -> PDF.deskewed

I haven't tested it myself (yet), I just came across the website recently and bookmarked it.

Kurt Pfeifle

Posted 2012-07-04T15:53:23.997

Reputation: 10 024

deskew did manage to correct rotation-related distortion in my test run but unfortunately it introduced a thin gray line at the position of the original image border. To get rid of the gray border I cropped the images with the -extent option of mogrify. I only tested on OS X, maybe this misbehaviour is platform-specific. – Stefan Schmidt – 2015-05-25T19:45:36.960

deskew works really well. My workflow is like this: pdfimages -all <pdf> my_imagesjbig2 -s -p -v my_images* > output > deskewed.pdf If black borders (result of deskewing operation) bothers you, some processing with imagemagick might be necessary, like suggested by @StefanSchmidt – Mr. Tao – 2018-08-09T15:16:29.067


Oh, let me add another answer. I just remembered netpbm. Haven't used it in years, but I think I should take a fresh look...

netpbm is a very powerful toolkit for the commandline to manipulate of graphic images. It ships nearly 300 separate tools. It includes converters for about 100 graphics formats.

And it also has a commandline tool that can rotate images:


And it has another tool that tries to discover the angle of rotated images:


pamtilt returns a floating number of its guess of image rotation. So the automatic de-skewing of images should be within reach. A shell script could be written to do that. It would require different steps:

  1. Convert PDF page to an netpbm-suitable image format with the help of Ghostscript.
  2. Use pamtilt to auto-discover the skew angle of the image.
  3. Use pnmrotate to de-skew the image.
  4. Re-convert the image to PDF.

If you provide me access to a small sample of your PDF files I could try and come up with a shell script to accomplish the feat.

(I'm wondering heavily that [netpbm] doesn't seent to have a tag here on the superuser+stackoverflow.)

Kurt Pfeifle

Posted 2012-07-04T15:53:23.997

Reputation: 10 024