Scanning, OCR and adding the scanned numbers

1

I need to scan hundreds of pages with numbers written on them in the bottom right corner. Post scan - I need a OCR software to recognize the number written at the bottom right corner of each page and sum them in the end.
So far I could scan pages and using OCR in Adobe acrobat, I could identify the hand marked numbers in the pages. Is there a mechanism or any other OCR software that would pass the values to programs like MS Excel.
I tried various OCR programs like Neurograph (opensource) and trial version of other OCR softwares but could not link them directly to scanned files. They have the ability to export the OCR'ed values to Excel but directly do not connect to printer.
Also these softwares run in batches. Is it possible to make the updates a continuous process?
Any suggestions?
System setup:
The intended system will be a Raspberry PI connected to a scanner. From the scanner the input would be given to PI and in turn PI shall compute the sum and update a database with the total.

Prasanna

Posted 2014-08-29T09:12:28.397

Reputation: 3 554

This can certainly be done if you are willing to write some custom code. Some OCR software (my experience is with ABBYY Cloud OCR SDK) can output results in an XML file which you can parse the data you need from. – lzam – 2014-09-01T02:45:07.203

@Izam Thanks for your reply. Would it be possible to do the coding in Linux? I intend to build a lean system - like - a scanner connected to Raspberry PI. The Raspberry PI should take all the information from the scanner - compute and update a database with total. Is this too much of an ask? – Prasanna – 2014-09-01T07:00:30.447

The Cloud OCR SDK I mentioned is a web service, so your OS doesn't really matter. They charge by the page though.

– lzam – 2014-09-02T01:37:50.133

Yes, the code can be written to run on linux much as on any other operating system. – mc0e – 2014-09-05T12:08:26.840

@mc0e can you please tell how it could be done? – Prasanna – 2014-09-05T12:36:05.640

Answers

0

If you are going to customize hardware using raspberry pi, you might as well customize your software too. The most popular and widely used OCR package is Tesseract OCR running on OpenCV, which are Open Source and cross-platform. Together they will let you apply filters, do the OCR and possibly other nice things you may want.

I would recommend you look for some of the videos out there, which make it seem surprisingly easy to get set up.

https://code.google.com/p/tesseract-ocr/

http://opencv.org/

Conrado

Posted 2014-08-29T09:12:28.397

Reputation: 126

Hi Conrado, I appreciate your efforts. Let me have a look at it and get back if I have questions – Prasanna – 2014-09-05T19:43:04.730

I visited both the pages but could not figure out as how to put everything together to get it working – Prasanna – 2014-09-06T00:42:20.357