PDF has garbled text when copy pasting



I'm trying to copy and paste text from a PDF file.

However, whenever I paste the original text it is a huge mess of garbled characters. The text looks like the following (this is just one small extract):

4$/)5=$13! ,4&1*%-! )5'$! 1$2$)&,$40! 65))! .*5)1! -#$! )/'8*/8$03! 
0/+$!6/9! -#/-! &,$4/-5'8! 090-$+! 1$2$)&,$40! .*5)1!1$25%$! 1452$40! 
/'1! &-#$4! 090-$+! 0&(-6/4$! %&+,&'$'-0! *0$1! .9! /,,)5%/-5&'! 
65))! .$!+*%#!+&4$! $2$')9! ./)/'%$13! #&6$2$43! -#/'! -#$!+5M! &(! 
)*+*+, C<88,?>8513AG<5A14, 

I've tried it in both Adobe and Foxit PDF readers. I did a 'Save as text' in Adobe Reader and the resultant text file is the same garbled text.

Any ideas how I can get this text out non-garbled? (Other than manual typing... there's a lot of text to extract.)


Posted 2010-05-05T13:53:18.337

Reputation: 1 593

Similar question: http://superuser.com/questions/119393/search-pdfs-with-non-standard-character-encodings

– Hugh Allen – 2011-03-01T05:46:09.567

I can also confirm this problem with OS X, at least as of 10.8.2. I've spent a bit of time going through the PDF file structure, but unfortunately I can't see any way to repair the damage. Acrobat Pro's "PreFlight" does report issues with the file when checking it against the PDF/A standard, and the Inventory report shows the glyphs being mapped against plainly wrong Unicode characters. I've raised a bug report with Apple - ID 12655651. I'll report back here if/when I get any updates. – KenD – 2012-11-08T09:48:22.813

Mught be helpful http://superuser.com/a/481510/153937

– Ankit – 2012-12-10T10:27:13.937

Try some screen reader utilities (which works with jpeg, do a print screen and there you go) or here is a different way. (Just a 'guess', don't bite me for it. I used the first way back then. Hope there are more convenient ways).

– Apache – 2010-05-05T13:56:42.170



Simplest way to get around this is to open the file in a recent version of Google Chrome with built-in PDF reading plugin. Then you can use Chrome's search feature to find text, and copy-paste works correctly.

I would like to vote up pipitas's comment on Shiki's answer, but I don't have the creds :( The problem may be custom font encoding, not encryption. In Acrobat, click File -> Properties, then click the Fonts tab to see encoding, and the Security tab to see whether it's encrypted.


Posted 2010-05-05T13:53:18.337

Reputation: 596

Indeed, custom font encoding was the culprit for me. However, Chrome wasn't the solution. I solved the problem partially with Ghostscript regenerating a PDF from the PS (I was lucky to have the PS source). Any character groups to which LaTeX applies ligatures (e.g. ff, c, fi, etc.) don't show up in the copied text of the PDF, which requires some editing when you copy/paste. – Fuhrmanator – 2015-01-28T19:43:35.557

1Same problem with chrome – JinSnow – 2015-12-05T16:55:50.493


I discovered this problem with PDFs I created, and I believe I tracked down the source of the problem: using Mac OS X's Preview to reduce the PDF file size.

I had created some Quartz filters using Colorsync Utility to compress images in PDFs to reduce the overall file size of PDFs with images. Such as described here: http://www.macosxhints.com/article.php?story=20031106133852693

I found that I am able to easily copy and paste text from the original (uncompressed) PDF file, but after running that PDF through a Reduce File Size filter I created, the resulting compressed PDF doesn't copy paste clearly (comes out looking like the strings you posted).

However running that same original PDF through Adobe Acrobat Pro's Document > Reduce File Size function, the resulting compressed PDF can successfully copy and paste text.

So, this is not totally helpful in your case, presuming that your PDF file was received from elsewhere and you can't get to the original version, if it was indeed compressed in some way. But that might be the explanation - that the file was mangled somehow in an effort to reduce the file size.

This might be useful for content creators running into similar problems copying and pasting text from PDFs - be careful using OS X Quartz filters to shrink your PDFs!

--edit-- I have also noticed this problem when combining PDFs with Preview. The two source PDFs can be copied and pasted fine, but when dragging a page from one file into the other file, then saving the combined PDF, the text in the combined document can't be copy/pasted. These are two documents both generated at the same time with Filemaker Pro 11 on Mac - I can't imagine they would have different encodings or any such thing.


Posted 2010-05-05T13:53:18.337

Reputation: 21

I got a few pdf files from a mac os user. Select is fine, but copy&paste would just give you garbage. Try a bunch of pdf to word converters, including googledoc, adobe save as text, all of them gives garbled text. – tigr – 2019-08-09T02:03:03.113

I suspect the OS X PDF shrinking is the culprit. Anyone out there aware of any means to "undo" such operation? Thanks! – tigr – 2019-08-09T02:05:03.277

I printed the pdf file to a several (virtual) printer, and I got inflated 4x size pdf files. The printed file apparent is image, no text selection can be made, while the original can be selected (garbled though). – tigr – 2019-08-09T09:35:00.050


There is another very easy way to make a workaround :)

Just print the document using CutePdf, Adobe 2 Pdf printer or any similar stuff. The bottom line is, that you need to print into the pdf format.

In many cases it will easily remove the problem.

Nick Olszanski

Posted 2010-05-05T13:53:18.337

Reputation: 11


Solution that worked for me:

  • Upload the document to Google Drive/Docs
  • Google will import it (as of 2013) as a PDF
  • Open the PDF view and choose File > Open With > Google Docs
  • It will take about a minute to export the document

The results weren't perfect, but got me 80% of the way there and provide me with enough text that I didn't have to rewrite everything!

Gavin Miller

Posted 2010-05-05T13:53:18.337

Reputation: 1 706


SOLVED: (worked for me on Windows 8, Acrobat XI, Office 2010)

Option 1:

  1. Print from Acrobat using "Microsoft XPS Document Writer" Output is: "your file name.oxps"
  2. Open "...oxps" with XPS Viewer. *(see download link in comments below)
  3. Print to PDF (Acrobat PDF, or CutePDF), using the highest resolution (600 DPI).
  4. Open with Acrobat and use OCR (Searchable Image (Exact)) option.



  • Using highest resolution and Searchable Image (exact) will save your text without loosing its clean appearance. Low resolution will make your text readable, but crappy looking.
  • Download Microsoft XPS (files): http://www.microsoft.com/en-us/download/details.aspx?id=11816
  • If you don't know what OCR is, or where to find Searchable Image (exact), or How to print using "Microsoft XPS Document Writer", PLEASE, Google it on your own, for your own best experiences.

*Download only if you do not have XPS installed.

Option 2:

Do similar, but save as image (png, tiff, ...), then you will have to combine all pages back in one "PDF" file.


Posted 2010-05-05T13:53:18.337

Reputation: 1

@Hennes Doing step 4 yields the error Acrobat could not perform OCR on this page because: This page contains renderable text – Fuhrmanator – 2015-01-28T19:59:28.283

'renderable text' sounds as something which still needs to be drawn (rendered). Possible already done so and stored as a OCR-able bitmap if you go via XPS. But that is just a guess. – Hennes – 2015-01-28T20:31:39.857

1Steps 1,2 and 3 seem a long way when you could just skip to step 3 Print to PDF. (E.g. from inside your PDF reader). No need to detour via XPS. – Hennes – 2013-03-25T00:41:16.773


There is a risk that the information won't be retrievable at all. PDF documents are essentially one document overlying another, one simple text, the other a picture. When you copy and paste from the document, you mark the text while looking at the picture, but what is copied to your clipboard is the corresponding piece of the text part.

Depending on the way the document is created, the quality and availability of the text part can differ greatly. If you save a word processor document in PDF format, using Acrobat, Word, a PDF printer driver or any other method, the quality will usually be excellent, since the text file can be created from the text of the original. Some special characters may become distorted, but plain text is usually fine.

If the document is created from a scanned image, however, the text part is typically created by OCR processing of the image, which can produce rather sorry results, especially if the original is less than optimal for the purpose.

A bad program used to create the PDF, or the wrong settings, might also cause the text part to become completely garbled, as could, perceivably, some kinds of encryption run on the file after it has been created.

The bottom line is, if the text part of the document is really bad, there is no way to make it better. Your best bet would be to remove the text part altogether, and have the program redo the OCR process. I think that might be doable from within Acrobat, but I'm not entirely sure.


Posted 2010-05-05T13:53:18.337

Reputation: 495


One possible reason for this could be that font embedding in the PDF was using a custom encoding, which is not correctly applied when copying text from the PDF.

You can apply different methods to save yourself from manually typing all of the content.

  1. Did you try to extract the text with one of the 'pdftotext.exe' tools downloadable throughout the 'net? (I'd recommend the one included in ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip).
  2. The latest version of Acrobat Reader have an option "Save as Text...". This does not use "copy'n'paste" (which gave you the garbled text), but probably uses the same software routines as used for rendering the text on screen, and may therefor produce more usable results.
  3. If '2.' does not work, and if you have access to Acrobat Professional: try to re-distill the PDF using one of the font-embedding Distiller profiles.
  4. If '3.' does not work, despite you having access to Acrobat Professional: try to re-distill the PDF, but this time you should use the 'print as image' option (available via the 'Advanced' button in the lower left corner of the main print dialog). Make sure you use 600dpi (although that may produce a huge file). The resulting PDF you then open again in Acrobat Pro. Now apply Acrobat's 'OCR' algorithm to the file, which will result in embedded text (not used for rendering on-screen in the Reader, but used for searching and highlighting strings). Now you can try again to extract the text from this PDF, using either of the above discussed methods.

Kurt Pfeifle

Posted 2010-05-05T13:53:18.337

Reputation: 10 024

For me, using Acrobat Pro XI to reprint to PDF--but with *"Print as Image"* checked (at 600 dpi) in the Advanced... button/sub-dialog from the Print... dialog--was the trick. Then you can finally OCR the result properly. None of the other solutions mentioned this page worked. Note: for a large document this may take a while and the result PDF may be quite huge. – Glenn Slayden – 2018-02-01T06:28:14.627

@GlennSlayden: Glad my advice worked for you... What was missing in it that you thought it still didn't deserve an upvote? – Kurt Pfeifle – 2018-02-01T09:50:59.707

Um, I did upvote. It's still showing for me as '1'. My only complaint was that your answer was at the bottom and it took me a while to find it (not your fault...) – Glenn Slayden – 2018-02-01T16:20:41.113

Ok, @GlennSlayden, then that upvote must have been loooong ago (long before your comment above). – Kurt Pfeifle – 2018-02-01T19:07:35.997

No, I upvoted "12 hours ago" at the same time I wrote the comment... I still see a blue arrow which (I believe) means my vote is (the one) vote that's currently registered. And I do recall that it was '0' before I up-voted last night. – Glenn Slayden – 2018-02-01T19:11:07.740

Sorry then, @GlennSlayden. On my side it doesn't look like this answer received any upvotes in the last 3 months.... Yes, your interpretation of the blue arrow is correct. – Kurt Pfeifle – 2018-02-01T19:15:54.063


One of my users just reported the same issue (PDF was created with Distiller for Windows), that copied text is only garbled text and he couldn’t search inside a document. I tried on my Mac and didn't find any issue. It turned out, that I used Apple’s Preview application, while he used Adobe Reader on his Windows machine. Then I tried Adobe Reader on my Mac an faced the same effect. To me it looks like:

  • Adobe Reader is coyping and searching in the saved text.

  • Apple’s Preview will copy and search after applying the encoding vector.

I can't say this for sure, but it would explain my observation. And it would indeed allow to make all kinds of encoding when saving combined/reduced files as described in another post here: with Preview you can still get out the text again.

First I thought it would be more logical to encode the embedded font subset as contiguous entries instead of leaving holes inside and using the original character location. But then I realized, that by using an encoding vector to the font subset with original entries, characters which are often used can have less bits set to 1 in their byte and can be compressed in a better way (it may lower the entrophy of the overall text this way).


Posted 2010-05-05T13:53:18.337

Reputation: 11


Uploading it to Google docs and Using the option View > Plain HTML, gives text copyable text correct to around 80% with some little bit of spaces missing.

This thread with accepted answer to same issue explains this with a working example.


Posted 2010-05-05T13:53:18.337

Reputation: 4 082


I have not tried the Google Docs option as it is still not supported in my office. However, by printing the file to "ScanSoft PDF Create!" from "Acrobat 9" (prints the entire file to image) and opening the printed file in "Nuance PDF Converter" (it prompted me if I want to make the image file searchable and editable, which I opted to), I was able to have a Word document I can easily copy and paste from. It's not perfect though with only around around 80-90% accuracy. But hey, you still have the original PDF file to compare with and offset those parts that just can't be fixed. Saves time from typing the whole thing. My 2c.


Posted 2010-05-05T13:53:18.337

Reputation: 1


I made some editable-text PDFs with an old version of Scansoft PDF Converter for Windows XP, and then combined the pages in Mac's Preview program. For each of the separate pages, I could search, copy and export text correctly from Adobe Reader on the Mac. When combined by Preview and saved as one file, all looked well on screen, but only a few passages were searchable/exportable correctly. That problem brought me here.

The posts here gave me some good pointers (thank you!). I looked at the file properties for fonts. The single page files from Win XP (where all is well) said the encoding was ANSI. The file combined in Preview (where copied text is garbled) showed encoding for most of the fonts as "Built-in" with a few as "Roman."

The solution to my problem was under my nose all the time — the Scansoft program itself can combine files. When I used Scansoft's combiner, and opened the file on the Mac, all fonts were shown as ANSI-encoded and all text exported/copied perfectly. Why on Earth I didn't combine them in PDF Converter in the first place, I don't know. Thanks, posters!

Same is true opening the files on a Linux system.

I know this doesn't explain the Windows-only problems — unless the PDF had similar mixed origins?


Posted 2010-05-05T13:53:18.337

Reputation: 1