OCR for numbers with gray noisy background

7

6

I tried to run OCR on multiple scanned sheets with numbers like this image (All with same background, digits only):

enter image description here

But all trials failed! I tried offline OCRs: gocr, tesseract and couple of online OCRs; but all TOTALLY failed!

What should I do?

ItsMe

Posted 2014-06-19T19:39:44.843

Reputation: 173

Answers

8

First you must tweak those images. I recommend a batch tool like XnViewMP which is free and multiplatform.

It has a file explorer. Select all your images, then go to Tools - Batch convert. Add actions like I did:

XNViewMP - Batch convert - Actions tab

Here are my actions:

  1. HLS - make it grayscale:
    • Hue: 0
    • Lightness: 0
    • Saturation: -127
  2. Levels - lower black level a bit so that the gray noise will disappear
    • Black point: 0
    • White point: 212 - may vary depending on image
  3. Reduce noise filter
  4. Adjust for increasing the contrast
    • Brightness: 0
    • Contrast: 127 - this one matters
    • Gamma: 1.06
  5. Minimum for making the black thicker
    • Filter size: 5x5 - may vary depending on image

Don't forget to save as tiff (See Output tab). After that I run tesseract:

tesseract test.tif text -psm 7

Note I selected PSM mode 7: Treat the image as a single text line. If you have multiple lines you'll probably need to use mode 6 or 3.

And here are the contents of text.txt output file:

570 394 666 638 043

Cornelius

Posted 2014-06-19T19:39:44.843

Reputation: 2 524

2

I wonder if those actions can also be done with GraphicsMagick.

– Cristian Ciupitu – 2014-06-19T21:02:33.140

2

I tried to recognise your image with OCR technology by ABBYY: OCR SDK result

More information about ABBYY's products you can find at abbyy.com.
I work for ABBYY and ready to help, if you have questions.

Vitalik

Posted 2014-06-19T19:39:44.843

Reputation: 141

Is there a digits-only mode? To increase detection rate of scratched images? – ItsMe – 2014-07-15T13:33:41.297

0

  import cv2
  import numpy as np
  import pytesseract

  im= cv2.imread('noisyNumbers.png',cv2.IMREAD_GRAYSCALE)

  cv2.imshow('Gray', im)
  cv2.imwrite('noisyNumbers.jpg', im)

  print(pytesseract.image_to_string(Image.open('noisyNumbers.jpg')))

jram

Posted 2014-06-19T19:39:44.843

Reputation: 1

1Welcome to Super User! Can you [edit] your answer to explain the code you gave above? thanks! – bertieb – 2018-10-30T12:59:15.553