How to make the unsearchable content of certain PDFs searchable

3

1

While I can search content via Windows Search on most PDFs, I occasionally come across PDF files with contents that are not searchable; even though they contain regular, selectable/copyable text with no format abnormalities.

An example is the PDF of this article: http://www.ncbi.nlm.nih.gov/pubmed/23870130 (both the CellPress version and the PMC version have unsearchable contents)

Is there a way to make all such PDFs searchable? Or does one have to use specific solutions for each document? What would these solutions be?

Esoppant

Posted 2016-03-17T16:01:13.450

Reputation: 41

1I cannot find the pdf in the page you have linked. Please provide the link to actual PDF file or upload it to accessible location – Art Gertner – 2016-03-17T16:03:32.387

2

FWIW, I followed the link to here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3753670/pdf/nihms496383.pdf. I added that file to my documents folder and immediately searched for metabolic and it was found inside that doc.

– uSlackr – 2016-03-17T16:06:43.123

1

In general the documents are unsearchable because the text is not stored as text, but as an image of a page layout containing text as binary blob data. In those cases I have usually scanned and OCR'd the document. There is an online OCR application that may suit your needs: http://www.onlineocr.net/

– Frank Thomas – 2016-03-17T16:06:50.977

@uSlackr It was found? Strange. Then I assume I have to change something about my Windows Search. – Esoppant – 2016-03-17T16:09:45.620

1Make sure the folder the doc is in is part of the index. See Indexing Options – uSlackr – 2016-03-17T16:17:07.157

I have found via Exiftool that the CellPress version of that pdf contains a "robots noindex" line in its MetaData. Perhaps it is related to that. Though it does not explain why the PMC version is unsearchable for me. – Esoppant – 2016-03-17T16:22:25.623

@uSlackr It is inside a folder that is indexed. To make sure, I also selected "search file contents in every folder" in my folder options. Still no go. – Esoppant – 2016-03-17T16:23:48.910

Answers

1

  1. First of all make sure that Windows search indexing is enabled and that Windows does file content indexing, not just property indexing.
  2. Make sure that .PDF is included in indexed file types enter image description here
  3. Make sure the directory where you store PDF is included in indexed locations list enter image description here
  4. Try restarting SearchIndexer.exe process enter image description here
  5. As a last resort, rebuild index and restart Windows Search service enter image description here

enter image description here

Art Gertner

Posted 2016-03-17T16:01:13.450

Reputation: 6 417