How to extract vectors from a PDF file?

54

9

I have a PDF file with vector images inside it. (I downloaded it from the internet, so I do not have any originals.)

I wish to extract the vectors so that I can overlay them on some other images; use them in print media, or on a website.

How do I extract the specific vectors from the PDF file?

Is there, perhaps, any software which can extract the vectors from a PDF file? (Preferably free.)

Devdatta Tengshe

Posted 2011-06-25T11:16:43.340

Reputation: 1 656

@slhck: I meant Vector graphics. I did use Inkscape, and It works as expected. Can you please put your comment as an Answer, so that I can mark it as accepted?

– Devdatta Tengshe – 2011-06-25T12:04:34.580

Answers

50

You can use Inkscape, which is a free, open source and cross-platform vector graphics application. It will allow you to import the PDF files and select embedded vectors. You can then edit them and process as you like.

Detailed documentation is available on the Inkscape website.

Note that on Linux it like requires X11. There is also a native Windows version.

Alternatively, you may want to give Adobe Illustrator a go (paid software).

slhck

Posted 2011-06-25T11:16:43.340

Reputation: 182 472

2On Linux it like requires X11 - there is also a native Windows version (which I just used nicely for extracting a vector drawing from a PDF). – Mark Leighton Fisher – 2016-03-29T21:38:54.367

25

While Inkscape is an awesome way to do it, for those lacking X11, you can also extract individual pages of a PDF into SVG format using the poppler-utils at the command line. For example, to extract just page 30:

$ pdftocairo -f 30 -l 30 -svg  somehugemanual.pdf  myextractedpage.svg

You can then use your favorite vector editor (mine is Inkscape) to isolate the image from the text.

Alternately, if you're a hardcore command-line user, you can extract to EPS (encapsulated postscript) and use sed to hide all the text (which happens to be between BT and ET lines for pdftocairo). Here's how:

$ pdftocairo -f 30 -l 30 -eps  manual.pdf  - | sed '/^BT$/,/^ET$/ d' > myimage.eps

And, if you're really insane to avoid using X11, you can even shrink the bounding box of the image from the command line using Ghostscript's eps2eps command:

$ eps2eps myimage.eps myimage-bb.eps

I've tested this and it works great. However, personally, I find it easier to just use Inkscape.

hackerb9

Posted 2011-06-25T11:16:43.340

Reputation: 579

+1 your command line strips all text.. but do you know how to also strip all images? I'am looking for a solution where only the vector graphic is left :) – clarkk – 2018-03-14T22:01:58.723

Does this work for you? cat foo.eps | sed '/^8 dict dup begin$/,/^Q$/ c Q' > nobitmaps.eps – hackerb9 – 2018-03-15T02:02:02.893

1but is it then possible to check if the eps file even has vector graphic? :) – clarkk – 2018-03-15T08:36:58.660

1I suppose you could use Ghostscript's eps2eps to distill it down to its smallest bounding box and see if it's completely empty. But this is beginning to become a new question. Feel free to ask and I'm sure if I don't answer, someone will. – hackerb9 – 2018-03-15T23:56:52.657

thanks for your help so far :) have created a question if you have time to answer https://stackoverflow.com/questions/49383387/how-to-get-bounding-boxes-of-elements-in-eps-files

– clarkk – 2018-03-20T11:49:16.287

On arch linux (and possibly other distros) epstoeps is named eps2eps – rien333 – 2018-08-30T13:03:12.577

The command that converts in eps doesn't work for me. Seems like the eps file is invalid – rtrtrt – 2019-05-23T13:54:27.387

@raffamaiden: I've updated the answer so the sed step is more robust. Try again. If it doesn't work, post a link to the PDF file. – hackerb9 – 2019-05-24T17:08:36.463

1

@hackerb9 thanks, now the eps is readable, but the image is in really low density and some text still remains around it. The pdf is here, and the image is at page 7

– rtrtrt – 2019-05-25T07:43:39.570

Seems to be a misfeature in pdftocairo: it is rasterizing the image to 150dpi when converting to eps. You can change the resolution using -r 300. It doesn't have this problem when creating svg, so I'm looking into a workaround. – hackerb9 – 2019-05-27T06:57:20.880

I'm not sure why that PDF is troublesome, but I've written a little shell script, extractsurface.sh to handle such files. It uses xml_grep to extract the image as SVG. Your image from page 7 is here.

– hackerb9 – 2019-05-28T12:07:20.837