Can't copy text from a pdf file

40

19

I am using foxit PDF reader to view my text book. I would like to copy the text from the pdf file into a word document but it won't let me. I can select the text fine but the option to copy text is not available. I can copy text from other documents but not some. Is there a way to get around this protection in windows?

Jonno_FTW

Posted 2009-09-27T11:06:58.840

Reputation: 1 280

I see my answer doesn't work for you, so you have posted a bounty. If you post somewhere an example of such a pdf, I will have a look at it. – harrymc – 2012-07-15T10:04:22.850

@harrymc: Specifically, I was looking to copy the values from table 6.15 of http://acousticslab.org/papers/VassilakisP2001Dissertation.pdf

– endolith – 2012-07-15T19:48:07.560

@endolith: See my new answer. – harrymc – 2012-07-15T21:11:09.570

Answers

29

The pdf file has probably been locked against copying text. Below are two ways to unlock it:

  1. If the pdf has not been locked against printing, you can print it to a virtual pdf printer to create an unlocked file. See this:
    "Remove Password and Unlock Protected PDF Which Allowed To Be Printed Without Knowing Secret".
  2. If the print function has been locked out, see this :
    "Remove Restrictions and Decrypt Password Protected PDF Files With PDF Unlocker".

harrymc

Posted 2009-09-27T11:06:58.840

Reputation: 306 093

You can see if the PDF is locked for copying. From the File menu choose Properties and on the Security tab is says whether Content Copying is allowed. – Rob Sedgwick – 2015-09-09T19:40:39.227

Tried printing the PDF. The printed file does not allow to select text, it seems as it converted text to image. – queezz – 2019-01-17T03:10:41.630

@queezz: The PDF must have contained the images to start with. – harrymc – 2019-01-17T07:35:37.100

@harrymc Yes, there are images. But text is also converted into images. Google Chrome option works well on the same document. – queezz – 2019-01-17T08:48:53.853

Your first link links to http://www.primopdf.com/installers/4.0.1/FreewarePrimo64Setup.exe which is bad it doesn't work and it looks like you never even archived it to archive.org either. Your second link is ok but it links to a file sharing site https://dfiles.eu/files/7kiqyvswk the file is ok though, checked with virustotal. But not so easy to find as there are various links on that mydigitallife page. It's where it says "PDF Unlocker is a free yet user-friendly tool which can be downloaded via the link here (current version 1.0.4)."

– barlop – 2019-04-15T08:13:11.697

25

  1. Open the PDF in Google Chrome(drag and drop PDF file to Chrome).
  2. Print the particular page as PDF or just open print preview.
  3. Now you can copy the text from print preview or output PDF. But I don't think you could copy the table directly.

Khaleel

Posted 2009-09-27T11:06:58.840

Reputation: 822

Neither of those methods worked for me in Chrome 53. Has the loophole possibly been closed? – Simon East – 2016-08-25T02:42:16.437

1https://docs.google.com/open?id=0B0U0hneaP_FcYWprOFpEbTVqdkk See my result. – Khaleel – 2012-07-16T09:58:33.123

4This works for me, too. This is the easiest method I see here. – endolith – 2012-07-16T14:38:03.793

3Absolutely brilliant. Oh, you can drag files to Chrome's tab bar to quickly open them, by the way. – iono – 2013-02-19T06:42:39.143

12

I was able to create a DRM-free version of your PDF file using Ghostscript (which is available for Windows).

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=stripped.pdf VassilakisP2001Dissertation.pdf

The resulting file stripped.pdf can be loaded in Adobe Reader, and Reader will happily allow you to copy any part of it you wish. It also preserves most of the formatting of the table.

Michael Hampton

Posted 2009-09-27T11:06:58.840

Reputation: 11 744

If the PDF has a password, make sure to include the -sPDFPassword switch (-sPDFPassword=password). – palswim – 2017-08-16T23:02:34.817

This is brilliant. My tax accountant refuses to give me non-DRM PDFs, nor the password to remove DRM. This solves my problem. Excellent work! – kevinarpe – 2013-04-28T03:52:12.943

2

I was able to copy the table from your PDF file successfully using Okular (for Linux; part of KDE). To do this, I had to go into Okular's settings and uncheck "Obey DRM restrictions."

I'm aware that this doesn't really help you much since you're running Windows, but it is a possibility if you have a Linux machine handy or are willing to install it.

Unfortunately it was plain text with no formatting, but it looks like it shouldn't be too hard to recreate the table. You can see the results of my copy and paste adventure here.

Michael Hampton

Posted 2009-09-27T11:06:58.840

Reputation: 11 744

That's what VirtualBox is for. :D I can also copy the plain text without formatting, but by selecting one column at a time it is pretty easy to export. – endolith – 2012-07-15T23:19:14.487

Looks like this is best for tables of numbers, since Okular lets you do rectangular selection of text and extract a single column in order. – endolith – 2012-07-16T14:42:28.360

For single columns, probably so. For the whole table, see my other answer.

– Michael Hampton – 2012-07-16T14:44:32.490

Note that Okular can run on Windows. In fact a lot of KDE software can run on windows.

– Bakuriu – 2013-12-04T19:53:06.757

1

if copy is greyed out, as it now doubt is for you, then the PDF is 'locked', it can be read but is indeed stopping you from copy/pasting anything from it.

This website will unlock a PDF

https://smallpdf.com/unlock-pdf

barlop

Posted 2009-09-27T11:06:58.840

Reputation: 18 677

1

You can use GT Text is a program that translate images (also pdf snapshots = image) to text. You can select the area and copies it to clipboard It is free

The official home page is http://gttext.googlecode.com

David

Posted 2009-09-27T11:06:58.840

Reputation: 11

0

If you're just looking for short snippets, you can often type a few words into google inside quote marks and find the exact quote already scanned in some other format or typed by someone else.

Another option is "Document from Photo" in the Google Docs Android app, which will put the text through OCR. This is error-prone, of course.

I wish PDF locking functionality never existed. :(

endolith

Posted 2009-09-27T11:06:58.840

Reputation: 6 626

0

Answer to endolith:

Your PDF is protected against copying, but is not protected against printing.

So I have printed the one page containing table 6.15 into another PDF that is not protected against copying, selected and copied the table, then pasted it into Word. To my great surprise the result of the paste was utter rubbish.

I have now taken a further look at this table and found a very surprising result : This is not a table !

It is actually a montage of small pieces of text, positioned on the page so as to look like a table. But this is not a real table.

The best you can do is either rewrite the whole thing as a table, or just use in your work a screenshot of this table-like assembled text.

Here is my screenshot of the table, as taken from my generated one-page pdf document :

image

harrymc

Posted 2009-09-27T11:06:58.840

Reputation: 306 093

I tried to print it with 2 programs but all I got was a blank page. – endolith – 2012-07-15T22:56:12.587

Using Foxit Reader, I positioned myself on the page, then printed the current page to a pdf printer (I used Cute Pdf Writer). I will try to analyze the problem with copying the table this evening,

– harrymc – 2012-07-16T05:48:33.430

I tried PrimoPDF and qvPDF (which uses GhostScript) – endolith – 2012-07-16T14:33:07.353

See my above addition. – harrymc – 2012-07-17T08:07:14.730

...I also uploaded my one-page pdf to here (60 seconds wait time).

– harrymc – 2012-07-17T08:12:14.927

0

Another possibility is Evince.

In Windows, it seems to support copying by default.

In Linux, copying can be enabled by checking the override_restrictions setting if it isn't already, following these directions (dconf-editor/org/gnome/evinceoverride_restrictions).

endolith

Posted 2009-09-27T11:06:58.840

Reputation: 6 626

0

This managed to convert basic text. It stuggled with tables though.

http://www.onlineocr.net/documents

Rob Sedgwick

Posted 2009-09-27T11:06:58.840

Reputation: 444