1

I need a Linux-based server that can be setup to receive images and transform them into text that will be inserted into a database. Is that possible, especially via an API to allow the organization to interact with the service if need be?

Wesley
  • 32,320
  • 9
  • 80
  • 116
crazybyte
  • 337
  • 1
  • 3
  • 10

4 Answers4

3

Teseract seems to be the best. http://code.google.com/p/tesseract-ocr/

Reviews seem to say it is the only one that beats retyping things. http://www.linux.com/archive/feature/138511 http://www.linux.com/archive/feed/57222

Do people not google any mone? 5 min reading what I pulled up with "linux ocr" as my search terms.

Ronald Pottol
  • 1,683
  • 1
  • 11
  • 19
  • @Ronald Pottol I did Google for OCR in Linux and I found among other tesseract and gocr, but I was curios to see if there was some similar application that can be used as a server and which I possibly missed in my search. This is the reason that I asked such a general question. – crazybyte Mar 02 '10 at 12:21
  • Ah, I know the feeling (I've asked questions that I had researched well hoping for better answers). – Ronald Pottol Mar 02 '10 at 21:37
0

I had a project that required OCR. You can use GOCR for the OCR part. For conversion into pbm  image format you can use djpeg. If you need in to be integrated with web, you can call conversion/ocr from PHP, also from  here to implement DB saving.

mxg
  • 113
  • 2
  • 4
  • Here are the links: GOCR: http://jocr.sourceforge.net/ djpeg: http://linux.about.com/library/cmd/blcmdl1_djpeg.htm – mxg Feb 28 '10 at 20:12
0

I'd set up a message queue and submit tasks to it for processing. All you'd really need to do is upload the file as an image to a shared storage platform, maybe GlusterFS or similar, then push the filename and path into a message queue, for processing. All you'd need to do then is set up a process to listen to the queue, and run gocr on it, pushing the output data into your database..

Easy.. In Theory. ;)

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
  • Thanks for the suggestion. I had a similar (if not the same) idea. I wanted only to see if wasn't there an already developed server application that can be used. – crazybyte Mar 02 '10 at 12:23
  • I doubt there is. There's lots of components all pre-made and available, the message queue, database, shared storage, OCR package. All you need to do is provide the Glue. – Tom O'Connor Mar 02 '10 at 15:22
  • There is such a server, but it's not free or open source. – crazybyte Mar 03 '10 at 07:56
0

Have you looked at WatchOCR? It is a free and open source OCR server that transforms image only PDFs into text searchable PDFs from a watched folder or network share.

mgorven
  • 30,036
  • 7
  • 76
  • 121
rlangner
  • 11
  • 1