OCR with non-language text

4

I am interested in using OCR to recognize text from a document that doesn't contain words. Rather, it is a document with a long string of "random" printed characters. I have been trying to use tesseract to scan the text, but it seems to be looking for words. Is there a way to tell tesseract to just do plain character recognition?

Daniel

Posted 2013-08-28T15:00:48.267

Reputation: 151

I have updated the question to fix the complaint. – Daniel – 2013-08-28T15:33:12.897

The old Presto! PageManager that came with the scanner, did not do spellchecking by default (windows), it has spell checker but post OCR. I wonder if you can dissapear the dictionary on any software doing auto correction, it could not do it then. The OCR is not by default looking at whole words, except mabey for alignment. – Psycogeek – 2013-08-28T17:04:18.903

1@Daniel - Now its a question that can actually be answered. – Ramhound – 2013-08-28T17:08:07.367

Answers

3

Yes, you can disable the dictionaries by defining a configuration file containing:

load_system_dawg F
load_freq_dawg F

and specify it with the command.

nguyenq

Posted 2013-08-28T15:00:48.267

Reputation: 156

This does appear to do what I wanted. Sadly, the results aren't much better for the text that I was working with, but it does answer the question. Thanks! – Daniel – 2013-10-08T17:46:25.580