How to do OCR on a PDF document?

6

2

Possible Duplicate:
How to extract text with OCR from a PDF on Linux?

I have a few documents in English and Hebrew that I scanned in and converted to PDF format.

Is there some free or cheap utility that can process a scanned PDF and do OCR, at least in English, preferably also in Hebrew?

Thanks!

Shaul Behr

Posted 2010-02-16T16:40:57.153

Reputation: 1 437

Question was closed 2010-02-16T16:54:55.327

6The author of this question did not specify that he is running Linux. The so-called possible duplicate question is too localized, and may not apply at all to the author of this question. – eleven81 – 2010-02-16T17:03:47.460

3@eleven81 - Correct, I was asking for Windows. – Shaul Behr – 2010-07-04T08:34:41.363

Not only this is not duplicate - it's still unanswered. All 3 answers only yields into text extracts and not a PDF text-selectable document. – cregox – 2013-06-28T16:05:14.703

Answers

1

I found a list of free OCR software for Windows.

  1. FreeOCR
  2. Tesseract
  3. WeOcr Tesseract Web Interface
  4. GOCR
  5. Windows GUI for GOCR
  6. OCR Desktop
  7. Simple OCR
  8. TopOCR

However, these programs need an image input, not a PDF input. For this, try a PDF-to-JPG converter.

eleven81

Posted 2010-02-16T16:40:57.153

Reputation: 12 423

1

I found an interesting idea that lets Google do all the work of OCR'ing the PDF files for you.

eleven81

Posted 2010-02-16T16:40:57.153

Reputation: 12 423

Rather than what's at that link, it's simpler now to just use http://docs.google.com/viewer now.

– ShreevatsaR – 2010-08-29T02:37:04.940

0

Personally, I would use Ghostview to convert them to an image, then Tesseract to convert them to text. This is a totally free, open source, cross platform solution that I have had very good results with when trying to convert plain text. I don't use it for complex documents with tables and such, but for plain text you can't beat the price.

Dennis

Posted 2010-02-16T16:40:57.153

Reputation: 5 768