How to find out why is text not searchable in a PDF (and make it searchable)

4

1

I have a PDF article (not created by me). However, I can not search for text in the PDF. All PDF viewers I've tried return zero results for words that are obviously in there. I've tried with Adobe Acrobat Professional 8, SumatraPDF and Google Chrome.

How can I find out why the document is not searchable?

Things I've checked:

  • The PDFproducer is reported as 'pdftopdf' and PDf version is reported as 1.3. However, it seems to have been created in something like MSWord or OpenOffice (but not *TEX).
  • It is definitely not a scanned document, as the font is crisp-clear at all zoom levels, and text is selectable.
  • If I look at the security settings (ctrl-D in Adobe Acrobat), everything is allowed (like printing, copying, ...).
  • my search options do not have 'match case' turned on
  • I can not turn it into a searchable document using Acrobat's 'Recognize text using OCR' as it reports: 'This page contains renderable text'.

So, what else could be the reason for the DPF not being searchable? And how to make it text-searchable?

Rabarberski

Posted 2013-03-06T09:45:02.073

Reputation: 7 494

Interesting, is that document contains any sensitive data? if not can you share it? – SparKot – 2013-03-06T09:49:53.757

@SparKot: I am not sure if I can share the document, so I prefer rather not to. Although I understand this would greatly aid in troubleshooting. – Rabarberski – 2013-03-06T10:02:32.847

Have you tried to upload it to Evernote and check if they can make it searchable? AFAIK they have a good OCR engine for that task. – ChaosCakeCoder – 2013-03-06T10:17:22.660

Answers

7

  • It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.

  • It may render characters individually out of sequence

  • It may have had characters flattened to paths

See https://stackoverflow.com/questions/12703387/pdf-font-encoding.
and https://stackoverflow.com/questions/4523283/how-do-you-debug-pdf-files

To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.

RedGrittyBrick

Posted 2013-03-06T09:45:02.073

Reputation: 70 632

Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom' – Rabarberski – 2013-03-06T10:30:33.100

1

I found a way around this problem. I did tools -> edit document text, then for each page, I hit Control-A (select all), then right-clicked and went to properties, and changed the font to something else. After I did this, the text was searchable and I could copy the text!

Don

Posted 2013-03-06T09:45:02.073

Reputation: 11

I think the edit document text option is only available in the paid version of Acrobat. – Burgi – 2016-05-01T18:57:42.513

Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools. – Don – 2016-05-04T03:03:24.127

0

I was having the same problem, and in frustration, googled to find an answer. It turns out that for me, the problem was simply that I was using Preview on my iMac to view and search the PDF. In most cases, searching works in Preview. But for a large book downloaded from Google Books, it didn't.

What worked was simply opening the PDF in Adobe Reader. (Duh, what a concept, I know.) Now I can search. This probably won't work for everyone with a Mac, but it might help someone.

Susan

Posted 2013-03-06T09:45:02.073

Reputation: 1

"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully. – NetwOrchestration – 2017-01-02T19:43:54.360

Please read the question again carefully. Your answer does not answer the original question. – DavidPostill – 2017-01-29T15:47:09.213

0

go to Edit / preferences - select 'search' from the left hand side of preferences screen - then 'Purge Cache Contents' - select OK then close and reopen the document

hope this helps

Posted 2013-03-06T09:45:02.073

Reputation: 1

0

So after trying a lot of things that didn't work. Here's how I actually got this done:

  1. Find yourself a PDF to Word converter or something. (I recommend https://www.online-convert.com/ )

  2. Follow al the necessary steps to convert BUT before that--

  3. Find the button that says something like 'optical character recognition' and click that

  4. Convert your file and you should be golden.

Alex

Posted 2013-03-06T09:45:02.073

Reputation: 1