Automating the choice between JPEG and PNG with a script

5

2

Choosing the right format to save your images in is crucial for preserving image quality and reducing artifacts. Different formats follow different compression methods and come with their own set of advantages and disadvantages.

JPG, for instance is suited for real life photographs that are rich in color gradients. The lossless PNG, on the other hand, is far superior when it comes to schematic figures:

enter image description here

Picking the right format can be a chore when working with a large number of files. That's why I would love to find a way to automate it.


A little bit of background on my particular use case:

I am working on a number of handouts for a series of lectures at my unversity. The handouts are rich in figures, which I have to extract from PDF-formatted slides. Extracting these images gives me lossless PNGs, which are needlessly large at times.

Converting these particular files to JPEG can reduce their size to up to less than 20% of their original file size, while maintaining the same quality. This is important as working with hundreds of large images in word processors is pretty crash-prone.

Batch converting all extracted PNGs to JPEGs is not an option I am willing to follow, as many if not most images are better suited to be formatted as PNGs. Converting these would result in insignificant size reductions and sometimes even increases in filesize - that's at least what my test runs showed.


What we can take from this is that file size after compression can serve as an indicator on what format is suited best for a particular image. It's not a particularly accurate predictor, but works well enough. So why not use it in form of a script:

enter image description here

I included inotifywait because I would prefer for the script be executed automatically as soon as I drag an extracted image into a folder.

This is a simpler version of the script that I've been using for the last couple of weeks:

#!/bin/bash
inotifywait -m --format "%w%f" --exclude '.jpg' -r -e create -e moved_to --fromfile '/home/MHC/.scripts/Workflow/Conversion/include_inotifywait' | while read file; do mogrify -format jpg -quality 92 "$file"
done

The advanced version of the script would have to

  • be able to handle spaces in file names and directory names
  • preserve the original file names
  • flatten PNG images if an alpha value is set
  • compare the file size between the temporary converted image and its original
  • determine if the difference is greater than a given precentage
  • act accordingly

The actual conversion could be done with imagemagick tools:

convert -quality 92 -flatten -background white file.png file.jpg

Unfortunately, my bash skills aren't even close to advanced enough to convert the scheme above into an actual script, but I am sure many of you can.

My reputation points on here are pretty low, but I will gladly award the most helpful answer with the highest bounty I can set.

References: http://www.formortals.com/introducing-cnb-imageguide/, http://www.turnkeylinux.org/blog/png-vs-jpg

Edit: Also see my comments below for some more information on why I think this script would be the best solution to the problem I am facing.

Glutanimate

Posted 2012-11-05T02:31:52.350

Reputation: 314

1

Not an expert, so take what I say with a healthy dose of sodium chloride: I think you may be underestimating the ability of PNG to handle photographs. The gradient issues you illustrate look to be a software artifact, not an intrinsic feature of the format. You might also consider using JPEG 2000, a format which addresses some of the shortcomings of JPEG.

– Isaac Rabinovitch – 2012-11-05T02:42:33.693

1@IsaacRabinovitch You're absolutely right, the image comparison might be abit misleading. The quality issues are restricted to JPG files. Photographs formatted in PNG render prefectly fine and without any artifacts but are significantly larger than their JPG counterparts. It's that unnecessary additional size I am trying to eliminate. As to why I didn't consider JPEG2000, it's not supported by LibreOffice, which makes it a no-go for me (even though it seems to be a pretty good format, indeed). – Glutanimate – 2012-11-05T02:56:32.340

Answers

2

Edit: Fixed some issues with the original script. Added an alternative one based on Marcks Thomas' proposition.

Edit 2: Updated cutoff values based on a number of test runs. I am still not sure how to estimate file sizes for greyscale images. If you are working with a large number of images outside of RGB colour schemes you might want to implement the first script as a fallback mode to the second one.

Edit 3: Added optipng integration. This optimizes PNG file sizes without any quality loss. See here for more information. Some smaller improvements.


Version 0.1

Important note: This script is deprecated. Newer versions are far more efficient.

Alright, my question might have been slightly too localized, so I put some time into it and compiled the script myself:

#!/bin/bash

# AUTHOR:   (c) MHC (http://askubuntu.com/users/81372/mhc)
# NAME:     Intelliconvert 0.1
# DESCRIPTION:  A script to automate and optimize the choice between different image formats.
# LICENSE:  GNU GPL v3 (http://www.gnu.org/licenses/gpl.html)
# REQUIREMENTS:  Imagemagick

ORIGINAL="$1"

###Filetype check###

MIME=$(file -ib "$ORIGINAL")

if [ "$MIME" = "image/png; charset=binary" ]
  then
    echo "PNG Mode"

###Variables###

      ##Original Image##
    FILENAME=$(basename "$ORIGINAL")
    PARENTDIR=$(dirname "$ORIGINAL")
        SUBFOLDER=$(echo "$PARENTDIR" | cut -d"/" -f10-)
    ORIGARCHIVE="~/ORIG"

      ##Converted Image##
    TEMPDIR="/tmp/imgcomp"
    CONVERTED="$TEMPDIR/$FILENAME.jpg"

      ##Image comparison##
    DIFFLO="50"
    DIFFHI="75"
    CUTOFF="1000000"

      ##DEBUG
    echo "#### SETTINGS ####"
    echo "Filepath to original = $ORIGINAL"
    echo "Filename= $FILENAME"
    echo "Parent directory = $PARENTDIR"
    echo "Archive directory = $ORIGARCHIVE"
    echo "Temporary directory = $TEMPDIR"
    echo "Filepath to converted image = $CONVERTED"
    echo "Low cut-off = $DIFFLO"
    echo "High cut-off = $DIFFHI"

###Conversion###

    convert -quality 92 -flatten -background white "$ORIGINAL" "$CONVERTED"

###Comparison###

    F1=$(stat -c%s "$ORIGINAL" )
    F2=$(stat -c%s "$CONVERTED" )
    FQ=$(echo "($F2*100/$F1)" | bc)

      #Depending on filesize we use a different Cut-off#
    if [ "$F1" -ge "$CUTOFF" ]
      then
        DIFF="$DIFFHI"
      else  
        DIFF="$DIFFLO"
    fi

      ##DEBUG
    echo "### COMPARISON ###"
    echo "Filesize original = $F1 Bytes"
    echo "Filesize converted = $F2 Bytes"
    echo "Chosen cut-off = $DIFF %"
    echo "Actual Ratio = $FQ %"


    if [ "$FQ" -le "$DIFF" ]
      then
           echo "JPEG is more efficient, converting..."
           mv -v "$CONVERTED" "$PARENTDIR"
               mkdir -p "$ORIGARCHIVE/$SUBFOLDER"
           mv -v "$ORIGINAL" "$ORIGARCHIVE/$SUBFOLDER"
      else
           echo "PNG is fine, exiting."
           rm -v "$CONVERTED"
    fi


  else
    echo "File does not exist or unknown MIME type, exiting."

fi

The script works great in combination with Watcher.

This is my first proper script, so there might be some unresolved bugs and issues I just didn't see. Feel free to use it for yourself and improve it. If you do so, I'd appreciate it if you could leave a comment here, so that I can learn from it.


Version 0.2.1

A more efficient way of finding the right format can be achieved by comparing the original's file size to its estimated size as an uncompressed image:

#!/bin/bash

# AUTHOR:   (c) MHC (http://askubuntu.com/users/81372/mhc)
# NAME:     Intelliconvert 0.2.1
# DESCRIPTION:  A script to automate and optimize the choice between different image formats.
# LICENSE:  GNU GPL v3 (http://www.gnu.org/licenses/gpl.html)
# REQUIREMENTS:  Imagemagick, Optipng

################ Filetype Check#################

MIME=$(file -ib "$1")

if [ "$MIME" = "image/png; charset=binary" ]
  then
    echo "###PNG Mode###"

####################Settings####################

##Folders##
ORIGARCHIVE="~/ORIG"

##Comparison##
DIFFLO="25"
DIFFHI="20"
CUTOFF="1000000"

################################################

###Variables###

ORIGINAL="$1"
FILENAME=$(basename "$ORIGINAL")
PARENTDIR=$(dirname "$ORIGINAL")
SUBFOLDER=$(echo "$PARENTDIR" | cut -d"/" -f10-)
CONVERTED="$PARENTDIR/$FILENAME.jpg"

#DEBUG#
    echo "###SETTINGS###"
    echo "Filepath to original = $ORIGINAL"
    echo "Filename= $FILENAME"
    echo "Parent directory = $PARENTDIR"
    echo "Archive directory = $ORIGARCHIVE"
    echo "Filepath to converted image = $CONVERTED"
    echo "Low cut-off = $DIFFLO"
    echo "High cut-off = $DIFFHI"


###Image data###

        WIDTH=$(identify -format "%w" "$ORIGINAL")
        HEIGHT=$(identify -format "%h" "$ORIGINAL")
        ZBIT=$(identify -format "%z" "$ORIGINAL")
        COL=$(identify -format "%[colorspace]" "$ORIGINAL")
        F1=$(stat -c%s "$ORIGINAL")

        if [ "$COL" = "RGB" ]
          then
              CHANN="3"
          else
              CHANN="1"
        fi


###Cutoff setting###

    if [ "$F1" -ge "$CUTOFF" ]
      then
        DIFF="$DIFFHI"
      else  
        DIFF="$DIFFLO"
    fi


###Calculations on uncompressed image###

        BMPSIZE=$(echo "($WIDTH*$HEIGHT*$ZBIT*$CHANN/8)" | bc)
        FR=$(echo "($F1*100/$BMPSIZE)" | bc)

#DEBUG#

        echo "###IMAGE DATA###"
        echo "Image Dimensions = $WIDTH x $HEIGHT"
        echo "Colour Depth = $ZBIT"
        echo "Colour Profile = $COL"
        echo "Channels = $CHANN"
        echo "Estimated uncompressed size = $BMPSIZE"
        echo "Actual file size = $F1"
        echo "Estimated size ratio = $FR %"
        echo "Cutoff at $DIFF %"

###Backup###

        echo "###BACKUP###"
        mkdir -p "$ORIGARCHIVE/$SUBFOLDER"  #keep the original folder structure
        cp -v "$ORIGINAL" "$ORIGARCHIVE/$SUBFOLDER"
        echo ""

###Comparison###

    if [ "$FR" -ge "$DIFF" ]
      then
          echo "JPEG is more efficient, converting..."
          convert -quality 92 -flatten -background white "$ORIGINAL" "$CONVERTED"
              echo "Done."
          echo "Cleaning up..."
          rm -v "$ORIGINAL"
      else
          echo "PNG is fine, passing over to optipng."
              echo "Optimizing..."
              optipng "$ORIGINAL"
              echo "Done."
    fi

################ Filetype Check#################

  else
    echo "File does not exist or unknown MIME type, exiting."

fi

Props to @Marcks Thomas for the great idea.

Glutanimate

Posted 2012-11-05T02:31:52.350

Reputation: 314

1Here's a thought that might make the script significantly more efficient. Compare the PNG's filesize to its estimated uncompressed size based on the dimensions and color depth. Photographic images might score a compression ratio of about 50%, but even complicated schematics should easily get below 10%. If the ratio is under a lower limit, you can assume PNG to be the better choice and skip converting to JPG alltogether. – Marcks Thomas – 2012-11-10T11:19:03.877

@MarcksThomas: Great idea, thank you! I added an alternative script based on your proposition. The ratios still need a bit of tweaking, but it works fine otherwise. There's just one problem with grayscale images. BMP sizes aren't calculated correctly for them and I just can't figure why. It's a minor issue, really, as almost all images I work with are in colour, but it's still annoying. – Glutanimate – 2012-11-11T06:28:17.890

1

PDFs support both lossless encodings and JPEG encoding. (JPEG 2000 as well.) You might want a tool which extracts the image data directly from the PDF, and leaving it as is. Pdfimages, from the XPDF package, will do this.

ImageMagick will actually render the entire page and extract it. That discards the choice of encoding that's already stored in the PDF, and isn't the best way to do this task.

Alan Shutko

Posted 2012-11-05T02:31:52.350

Reputation: 3 698

I should have clarified how I extract the images. Because I tend to use only a few figures of the ones provided in each set of slides, it would be impractical to extract all at once. So I am using evince (the default PDF viewer on Ubuntu) to drag and drop the files I want into their respective folders. From there I embed them into the documents I am editing. Evince uses PNG only to save images, which is a disadvantage, of course. I just tried XPDF with a few documents , but it seems to be broken on Precise

– Glutanimate – 2012-11-05T03:15:06.183

However, even if I did find a PDF viewer that extracts all embedded images in their original format there would still be no guarantee that the embedded format is the right one. If I wanted to minimize document size I would still have to go through each file and determine whether it's in the right format or not. – Glutanimate – 2012-11-05T03:25:50.110

Are you sure that it's not practical to extract all the images, then sort through them in another program? Even if xpdf isn't working for you, pdfimages might (which is in the xpdf-utils package). It's likely that the compression in the PDF is a good one, depending on what software created the PDF. But there might be a way to determine the best format based on number of colors. (Or, write a script to compress it both ways and figure out which is larger, which was your suggestion.) – Alan Shutko – 2012-11-05T03:29:44.443

On an average scale I would say that I use about 10% of the images in each set of slides. So sorting through all the extracted images afterwards isn't very practical. PDF image extraction utilites also tend to interpret non-image elements such as backgrounds and frames as images, which makes this task even more difficult. – Glutanimate – 2012-11-05T03:55:05.667

I also did some test runs with pdfimages (with the -j parameter set). Out of ~300 images only 2 were recognized as jpeg. Everything else was saved as ppm, which is akin to the bmp format (lossless, uncompressed). Another advantage of the script I proposed is that it would be compatible with all image sources, be it extracted PDF figures or downloaded files. But thank you anyway!

– Glutanimate – 2012-11-05T03:59:37.847

Darn, it was worth a try! – Alan Shutko – 2012-11-05T12:33:20.177

Even though your answer didn't turn out to be the solution I was searching for, you still helped me and for that I am grateful. Have this bounty as a simple token of gratitude :). – Glutanimate – 2012-11-13T13:11:04.353