In my mind the best way to achieve the job is not to use a graphical user interface program but to use a collection of bash script (like in an Unix/Linux environment), so if you have some basic knowledges of programming you will be able to do much more things that a GUI program can provide to you.
You can first install a minimal Unix like command line you can use
my preference is to Cygwin as it contains a huge amount of software package.
If you want to extract image from a PDF install also pdfimages
pdfimages is an open source command-line utility for extracting images from PDF files. It is freely available as part of poppler-utils and xpdf-utils, and included by default with many Linux distributions.
$ pdfimages file.pdf foo
This usage produces a series of numbered images with "foo" as the prefix.
Use in fact
$ mkdir temp
$ mkdir temp/jpg
to create a temporary folder named jpg inside a temp directory
$ pdfimages -j file.pdf temp/jpg/foo
Let's say that you have now several fooXXXX.jpg images in temp folder.
In your case, you ever had fooXXXX.jpg pictures.
You can now generate one PDF using convert (a command line from ImageMagick)
So download ImageMagick http://www.imagemagick.org/ or install it using Cygwin package manager
Have a look at convert documentation (type "ImageMagick convert" in your favourite search engine)
So you understand that to convert your pictures to one PDF file you will have to write
$ convert -compress jpeg temp/*.jpg my_output_file.pdf
That's all... ;-)
but this solution can be extend...
Let's imagine that the scanned pictures came from a book...
1 file is in fact 2 pages of your book...
so if you have 10 files... your book had 20 pages... and you would like your PDF to also have 20 pages.
So you need to split the image contained in one file to make 2 files for each page.
Let's say that your file is temp/foo0001.jpg
you will have 2 files temp2/foo0001a.jpg (left page) and temp2/foo0001b.jpg (right page)
Create the temp2 directory (where your slitted files will go)
$ mkdir temp2
$ mkdir temp2/jpg
Create a file named split_jpg_minw.sh using a text editor (Emacs, VI or if you prefer Windows application you can use Notepad or Notepad++)
minimal_width=1500
minimal_width_ignore=10
rm temp2/jpg/*.jpg
for f in temp/jpg/*.jpg
do
f2=$(basename $f)
read -r width height <<< $( convert $f -format "%w %h" info:)
width2=$(( ${width} / 2 ))
height2=${height}
if [ $width -gt $minimal_width ]; then
echo "split $f ${width}x${height} to 2 files ${width2}x${height2}"
convert $f -crop ${width2}x${height2}+0+0 +repage temp2/jpg/${f2%%.*}a.jpg
convert $f -crop ${width2}x${height2}+$width2+0 +repage temp2/jpg/${f2%%.*}b.jpg
else
if [ $width -gt $minimal_width_ignore ]; then # ignore if with < 10px
echo "copy $f ${width}x${height} (don't split because width<$minimal_width)"
cp $f temp2/jpg/$f2
else
echo "ignore $f ${width}x${height} width=$width<minimal_width_ignore=$minimal_width_ignore"
fi
fi
do
width=1500px is the limit to split a file (or not)
- a file with a width over 1500px will be split
- a file with a width below 1500px will not be split
Make this script executable
$ chmod +x split_jpg_minw.sh
(you can use tab key to autocomplete the name of the file)
Run the script
$ ./split_jpg_minw.sh
The splitted files will be in temp2/jpg folder
Generate the new "splitted" file.
$ convert -compress jpeg temp2/*.jpg my_output_file_splitted.pdf
You can add much more options to your chain to produce PDF file using bash scripting.
There is no limit... you just have to learn scripting (but some code samples are sometimes much more useful than books)
For example, you can apply filter to your pictures before generating the PDF file (to remove for example Moiré pattern or to reduce noise) using command line tools such as G'MIC
1Thanks, iCopy worked great where many others failed - and it's free and open source, too. – EMP – 2012-06-22T02:44:31.210
iCopy is a cool program, moreover, since version 1.6 it can create PDF documents without the need of an external program – Pincopallino – 2013-01-20T11:23:12.043
I just used this to scan roughly 30 double-sided pages, and I didn't have a single problem. When I accidentally started scanning a wrong page, it let me cancel the page and then repeat the page without losing prior work. – Sam – 2013-05-24T04:14:04.177