CuneiForm (software)
CuneiForm Cognitive OpenOCR is a freely distributed open source OCR system developed by Russian software company Cognitive Technologies.
Original author(s) | Cognitive Technologies |
---|---|
Developer(s) | Cognitive Technologies |
Initial release | Source April 2, 2008[1] |
Stable release | 1.1
/ April 19, 2011 |
Written in | C and C++ |
Operating system | Cross-platform |
Type | Optical character recognition |
License | Freeware/BSD licenses |
Website | launchpad |
CuneiForm OCR was developed by Cognitive Technologies as a commercial product in 1993. The system came with the most popular models of scanners, MFPs and software in Russia and the rest of the world: Corel Draw, Hewlet-Packard, Epson, Xerox, Samsung, Brother, Mustek, OKI, Canon, Olivetti, etc.
In 2008 Cognitive Technologies opened the program’s source codes.
Features
CuneiForm is a system developed for transforming the electronic copies of paper documents and image files into an editable form without changing the structure and the original document fonts in automatic or semi-automatic mode. The system includes two components for single and batch processing of electronic documents.
The list of languages supported by the system:
Besides, the system supports a mixture of Russian and English. Recognition of other mixed languages is only supported in the branch, developed by Andrei Borovsky in 2009.[2] Educating the system to recognize other languages is difficult since each language is related to a dat-file, the structure and development method of which are not disclosed by the developers.
History
1993 - Cognitive Technologies signed an OEM-contract with Corel, under the terms which Cognitive recognition library came embedded into the Corel Draw 3.0 (and later versions) package popular in the publishing sphere.
1994 – The contract with Hewlett-Packard on the equipment of all scanners imported into Russia with CuneiForm OCR. This was the first HP contract with a Russian software company.
1995 - The contract with the Japanese corporation Epson on supplying their scanners with the CuneiForm OCR.[3] The OEM contract was signed with the world's largest manufacturer of fax machines, laser printers, scanners and other office equipment - Brother Corporation. According to the agreement, the new roller scanner Brother IC-150 was equipped with Cognitive software for scanning and recognition worldwide.
1996 - OEM agreement with one of the world's largest manufacturers of monitors, fax machines, laser printers, MFPs and other office equipment - Samsung Information Systems America. According to the agreement the new multifunction device Samsung OFFICE MASTER OML-8630A was to be equipped with the Cognitive Cuneiform LE system of symbol optical recognition worldwide.
- OEM agreement with a leading global manufacturer of office equipment Xerox on equipping the multifunctional devices Xerox 3006 and Pro-610 with the CuneiForm recognition system.
- CuneiForm '96 OCR release, with the first adaptive recognition algorithms in the world.
Adaptive Recognition - a method based on a combination of two types of printed character recognition algorithms: multifont and omnifont. The system generates an internal font for each input document based on well printed characters using a dynamic adjustment (adaptation) to the specific input symbols. Thus, the method combines the omnitude and the technological efficiency of the omnifont approach with the high font recognition accuracy that dramatically improves the recognition rate.
1997 – The first usage of neural network-based technologies in CuneiForm. The algorithms using neural networks for character recognition are developed as follows: the character image that is to be recognized (pattern) is reduced to a certain standard size (normalized). The luminance values of the normalized pattern are used as input parameters for the neural network. The number of output parameters of the neural network is equal to the number of recognized characters. The result of recognition is a symbol, which corresponds to the maximum value of the output vector of the neural network.
- New OEM agreement with Canon equipping multi-function devices imported into Russia with the CuneiForm system;
- New OEM contract with OKI Europe Limited on equipping MFPs OKI FAX 4100 and OKI FAX 5200 MFD’s, imported into Russia with the CuneiForm system;
- The first CuneiForm MMX Update OCR-system for Intel MMX processor release;
- NeuHause scanners come with the CuneiForm recognition system;
- Russia's first network scanning system CuneiForm 98 NEST release.
1999
- New OEM contract with the Olivetti company on supplying the multi-function devices imported into Russia with the CuneiForm system;
- Distribution agreement with a leading European distributor of software company WSKA (France) on the distribution of OCR Cuneiform Direct in Europe;
- New version of the system released, Cuneiform 2000, that implements the method of "cognitive analysis TM”: an expert system is integrated into the recognition core, which analyses of alternatives to the estimates on the output from each detection algorithm, and choose the best option.
- The method of "Meridian table segmentation TM" is developed for the improvement of the accuracy of recreating the original form of the table in the output document;
- The original document form recreation mechanism - "What you scan is what you get TM" is introduced. The technology was aimed at saving the scanned document’s original form in terms of its components placement. This particularly important for the documents with complex topology: multicolumn texts with headings, annotations, graphic illustrations, tables, etc.
2001 - OEM-contract with Canon on its scanners and multifunction devices equipment with Cognitive Technologies CuneiForm OCR software for Eastern Europe
Development prospects
- December 12, 2007 OCR CuneiForm freeware-version was released and the opening of its source was announced.
- April 2, 2008 the source codes of the Cuneiform OCR are published under the BSD license, and in the fall - the system’s interface source texts.
- The latest version of OpenSource version for Windows has not been updated since 14.02.2009. This version is no longer available for download. Instead, the version of 11.11.2008 is available on the download page
- In 2009 graphical interfaces for the open version of Cuneiform based on Qt 4 library - Cuneiform-Qt,[4] YAGF are released. Starting with version 0.9.0[5] open version for Linux can be used as library.
See also
- Puma.NET is a wrapper library for Cognitive Technologies CuneiForm recognition engine. It makes it easy to incorporate OCR functionality in any .NET Framework 2.0 (or higher) application.