How to do OCR on a PDF document?

Possible Duplicate:
How to extract text with OCR from a PDF on Linux?

I have a few documents in English and Hebrew that I scanned in and converted to PDF format.

Is there some free or cheap utility that can process a scanned PDF and do OCR, at least in English, preferably also in Hebrew?

Thanks!

Shaul Behr

Posted 2010-02-16T16:40:57.153

Reputation: 1 437

– heavyd – 2010-02-16T16:47:19.620

6The author of this question did not specify that he is running Linux. The so-called possible duplicate question is too localized, and may not apply at all to the author of this question. – eleven81 – 2010-02-16T17:03:47.460

3@eleven81 - Correct, I was asking for Windows. – Shaul Behr – 2010-07-04T08:34:41.363

Not only this is not duplicate - it's still unanswered. All 3 answers only yields into text extracts and not a PDF text-selectable document. – cregox – 2013-06-28T16:05:14.703

Answers

I found a list of free OCR software for Windows.

However, these programs need an image input, not a PDF input. For this, try a PDF-to-JPG converter.

eleven81

Posted 2010-02-16T16:40:57.153

Reputation: 12 423

I found an interesting idea that lets Google do all the work of OCR'ing the PDF files for you.

eleven81

Posted 2010-02-16T16:40:57.153

Reputation: 12 423

Rather than what's at that link, it's simpler now to just use http://docs.google.com/viewer now.

– ShreevatsaR – 2010-08-29T02:37:04.940

Personally, I would use Ghostview to convert them to an image, then Tesseract to convert them to text. This is a totally free, open source, cross platform solution that I have had very good results with when trying to convert plain text. I don't use it for complex documents with tables and such, but for plain text you can't beat the price.

Dennis

Posted 2010-02-16T16:40:57.153

Reputation: 5 768