How to extract an embedded image from a SVG file?

26

4

I have a SVG file that contains at least one embedded JPG/PNG image inside. I want to extract the JPG/PNG images from that SVG file and save them on disk.

I'm adding the inkscape tag as it is the program I use to edit SVG files, but I also accept solutions using other tools.

Denilson Sá Maia

Posted 2011-06-21T07:09:24.813

Reputation: 9 603

1If nothing else, Python could probably do it with some custom glue using lxml and PIL (or equivalent). – Keith – 2011-06-21T07:13:11.250

@Keith, indeed, I've just written a Python script to solve this question. It uses the built-in xml.etree library.

– Denilson Sá Maia – 2013-12-03T23:35:00.190

Answers

10

Finally, years later, I've written a script to correctly extract all images from an SVG file, using a proper XML library to parse the SVG code.

http://bitbucket.org/denilsonsa/small_scripts/src/tip/extract_embedded_images_from_svg.py

This script is written for Python 2.7 but should be quite easy to convert to Python 3. Even better, about 50 lines can be deleted after conversion to Python 3.4, due to the new features introduced in that version.

Denilson Sá Maia

Posted 2011-06-21T07:09:24.813

Reputation: 9 603

Thanks, since it works. But it's much slower than the PDF workaround. Have you thought about parallel processing? Right now, the script only uses a single CPU core/thread. – DanMan – 2018-08-30T10:11:11.867

@DanMan Unfortunately, making it parallel is not a magic solution to speed up anything. I'd need to profile the code in order to identify the bottleneck. If the bottleneck is XML parsing, I'm sorry, that part can't be done in parallel. Can you please send me by e-mail the exact SVG files that are too slow? Whenever I have some time, I may investigate the performance. – Denilson Sá Maia – 2018-09-01T07:57:17.473

Yeah, I tried doing it myself, and it turned out that the XML parsing is the slow part, not decoding the images. That said, cElementTree is supposed to be faster. But maybe something like Sax works better, too. – DanMan – 2018-09-01T12:37:34.253

@DanMan cElementTree is likely faster. However, on Python 3.3, both are be the same. At some point I'll likely update that script to Python 3.

– Denilson Sá Maia – 2018-09-18T03:38:29.937

30

My own solution (or... workaround):

  1. Select the image in Inkscape
  2. Open the built-in XML Editor (Shift+Ctrl+X)
  3. Select the xlink:href attribute, which will contain the image as data: URI
  4. Copy the entire data: URI
  5. Paste that data: URI into a browser, and save it from there.

Alternatively, I can open the SVG file in any text editor, locate the data: URI and copy it from there.

Although this solution works, it's kinda cumbersome and I'd love to learn a better one.

Denilson Sá Maia

Posted 2011-06-21T07:09:24.813

Reputation: 9 603

2+1 - I exported a 3.5 MB image using this method which took a while but worked. Somehow the "Extract Image" function did not work for me. – Martin – 2012-02-15T09:31:26.907

Please see also a command-line Python script for this purpose.

– Denilson Sá Maia – 2014-03-31T20:37:57.697

17

There's a better solution instead:

go to Extensions -> Images -> Extract Image..., there you can save selected raster image as a file. However this extension works weird and somehow works rather slowly (but perfectly well).

Another note: this extension is cumbersome and dies silently on vary large images. Also, with large number of raster images it can spike memory usage of inkscape to horrendous levels (like 3GB after only a handful of images extracted).

Because I've got about 20 svg files with about 70 raster images in them each, each image at least 1MB in size, I needed a different solution. After a short check using Denilson Sá tip I devised the following php script, that extracts images from svg files:

#!/usr/bin/env php
<?php

$svgs = glob('*.svg');

$existing = array();

foreach ($svgs as $svg){
    mkdir("./{$svg}.images");
    $lines = file($svg);
    $img = 0;
    foreach ($lines as $line){
        if (preg_match('%xlink:href="data:([a-z0-9-/]+);base64,([^"]+)"%i', $line, $regs)) {
            $type = $regs[1];
            $data = $regs[2];
            $md5 = md5($data);
            if (!in_array($md5, $existing)) {
                $data = str_replace(' ', "\r\n", $data);
                $data = base64_decode($data);
                $type = explode('/', $type);
                $save = "./{$svg}.images/{$img}.{$type[1]}";
                file_put_contents($save, $data);
                $img++;
                $existing[] = $md5;
            }
        } else {
            $result = "";
        }
    }
}

echo count($existing);

This way I can get all the images I want, and md5 saves me from getting repeated images.

I bet there must be another way that is a lot simpler, but it's up to inkscape devs to do it better.

Johnny_Bit

Posted 2011-06-21T07:09:24.813

Reputation: 283

@Johnny_Bit +1 for the use of md5 sum to prevent files duplication. I imrove your script below.

– Ivan Z – 2017-02-25T14:59:19.413

good, march 2019 and worked easy grand with a reasonably big image. And pretty old laptop/ubuntu/inkscape 0.48.4. Thanks! – gaoithe – 2019-03-03T17:09:34.350

Note: Your script only supports a single data: URL per line, and does not support newlines inside the href attribute (inkscape adds them for data URLs, and the base64 spec even mandates that lines should not be longer than 76 chars). Nice script for a quick hack, but it does not work with all kinds of SVG.

– Denilson Sá Maia – 2013-12-01T02:16:33.030

5

As yet another workaround, you can save as PDF, then open that document with Inkscape.

Uncheck "embed images", and bingo, all the pngs/jpegs will be spewed out into your home directory.

Messy, but quicker than goofing about with the data: URL.

Nicholas Wilson

Posted 2011-06-21T07:09:24.813

Reputation: 224

Where did you find that "embed images" option? – mik01aj – 2016-06-30T06:42:07.387

1When you open the PDF document in inkscape, it's on the next dialog. – Nicholas Wilson – 2016-06-30T13:32:45.430

I had a PDF from which I tried to extract an image by importing it in Inkscape. In that case, being able to do this on import rather than after import comes in even more handy. – user149408 – 2016-11-25T12:49:26.063

I'm not sure but this way any embedded ICC profiles seem to get lost in the process. The images I extracted straight from the SVG via that Python script have ICC profiles embedded. – DanMan – 2018-08-30T10:26:15.960

1

I improve the php-script of @Johnny_Bit. New release of the script can use svg with new lines. It extracts multiple images form svg file and save them in external png files. Svg and png files are in 'svg' directory, but you can change it in constant 'SVG_DIR'.

<?php

define ( 'SVG_DIR', 'svg/' );
define ( 'SVG_PREFIX', 'new-' );

$svgs = glob(SVG_DIR.'*.svg');
$external = array();
$img = 1;

foreach ($svgs as $svg) {
    echo '<p>';
    $svg_data = file_get_contents( $svg );
    $svg_data = str_replace( array("\n\r","\n","\r"), "", $svg_data);
    $svg_file = substr($svg, strlen(SVG_DIR) );
    echo $svg_file.': '.strlen($svg_data).' ????';

    if ( preg_match_all( '|<image[^>]+>|', $svg_data, $images, PREG_SET_ORDER) ) {
        foreach ($images as $image_tag) {

            if ( preg_match('%xlink:href="data:([a-z0-9-/]+);base64,([^"]+)"%i', $image_tag[0], $regs) ) {
                echo '<br/>Embeded image has benn saved to file: ';

               $type = $old_type = $regs[1];
               $data = $old_data = $regs[2];
               $md5 = md5($data);
               if ( array_key_exists($md5, $external) ) {
                $image_file = $external[$md5];
               } else {
                    $data = str_replace(" ", "\r\n", $data);
                    $data = base64_decode($data);
                    $type = explode('/', $type);
                    $image_file = substr( $svg_file, 0, strlen($svg_file)-4 ) . '-' . ($img++) . '.png';
                    file_put_contents(SVG_DIR.$image_file, $data);
                    $external[$md5] = $image_file;
               }
               echo $image_file;
               $svg_data = str_replace('xlink:href="data:'.$old_type.';base64,'.$old_data.'"', 'xlink:href="'.$image_file.'"', $svg_data);
            }
        }
        file_put_contents(SVG_DIR.SVG_PREFIX.'.svg', $svg_data);
    }

   echo '</p>';
}

?>

Ivan Z

Posted 2011-06-21T07:09:24.813

Reputation: 131

0

Open your file in Inkscape and select the bitmap that you wish to export. Click File->Export Bitmap (Ctrl+Shift+E) and it should export only the selected bitmap.

Chris

Posted 2011-06-21T07:09:24.813

Reputation: 813

I don't like this solution because it will re-encode the image. I would prefer a solution that extracts the image in its original format. – Denilson Sá Maia – 2013-12-01T00:21:31.893

1Yes, it seems like Inkscape re-encodes the image but it saves PNG images by default. So I am assuming that the re-encoding is at least lossless. – Chris – 2013-12-02T17:10:20.290

1Well, not really. The embedded image might have had transformations (scaling, rotation…), might have been clipped, or even something else I'm not aware. Inkscape will certainly export the selected object after applying all these transformations, which means this solution is not exactly lossless. – Denilson Sá Maia – 2013-12-03T23:39:20.720