E-aksharayan

e-Aksharayan
Written in	C++
Operating system	Linux (32 & 64-bit), Windows (32-bit)
Available in	Interface: English ; Recognition: Assamese, Bengali, Bodo, Devanagari, Kannada, Gujarati, Gurumukhi, Oriya, Malayalam, Meitei, Marathi, Tamil, Telugu, Tibetan and Urdu
Type	Optical character recognition
Website	ocr.tdil-dc.gov.in

e-Aksharayan is an optical character recognition engine for Indian languages. Some of research work from e-Aksharayan has been published in different conferences and journals.[1][2][3][4]

Bangla typos

Screenshots

OCR output for Devanagari
OCR output for Devanagari OCR output for Devanagari, sync between image and output
OCR output for Devanagari OCR output for Devanagari, spell checker

gollark: https://www.sbert.net/examples/applications/semantic-search/README.html is kind of like what you want.

gollark: Instead of recomputing the embeddings every time a new sentence comes in.

gollark: The embeddings for your example sentences are the same each time you run the model, so you can just store them somewhere and run the cosine similarity thing on all of them in bulk.

gollark: Well, it doesn't look like you ever actually move the `roberta-large-mnli` model to your GPU, but I think the Sentence Transformers one is slow because you're using it wrong.

gollark: For the sentence_transformers one, are you precomputing the embeddings for the example sentences *then* just cosine-similaritying them against the new sentence? Because if not that's probably a very large bottleneck.

References

Greedy Search for Active Learning of OCR Greedy Search for Active Learning of OCR
Text graphic separation in Indian newspapers Text graphic separation in Indian newspapers
An OCR System for the Meetei Mayek Script An OCR System for the Meetei Mayek Script
Experiences of Integration and Performance Testing of Multilingual OCR for Printed Indian Scripts Experiences of Integration and Performance Testing of Multilingual OCR for Printed Indian Scripts

External links

Official website

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[paper2-1] Greedy Search for Active Learning of OCR Greedy Search for Active Learning of OCR

[paper1-2] Text graphic separation in Indian newspapers Text graphic separation in Indian newspapers

[paper4-3] An OCR System for the Meetei Mayek Script An OCR System for the Meetei Mayek Script

[paper5-4] Experiences of Integration and Performance Testing of Multilingual OCR for Printed Indian Scripts Experiences of Integration and Performance Testing of Multilingual OCR for Printed Indian Scripts

Optical character recognition software
Free software	CuneiForm GOCR Ocrad OCRFeeder OCRopus Tesseract
Proprietary software	ABBYY FineReader Asprise OCR Microsoft Office Document Imaging OmniPage ReadSoft SmartScore TeleForm VueScan
See also	Comparison of optical character recognition software