Tesseract detection problems due to huge space between characters

I am using Tesseract for ocr, to recognize characters from a receipt, but it is not recognizing the price.

I am guessing that the problem is because there is a lot of blank space between the name of the product and the price, so tesseract assumes that the line has ended and therefore jumps to next line.

For example, the receipt might look like:

           Super Market Reciept

1. Wheat Bread                    xx $
2. Yoghurt                        yy $
   Total:                         zz $

What tesseract is detecting is something like:

1. Wheat Bread
2. Yoghurt
Total:

Questions:

1) Is my hypothesis true? Is the problem because of the blank space in between?

2) Is there any way to go around this?

BTW, I am using a pytesseract, a python wrapper for tesseract, in case, that is relevant.

python3
ocr
tesseract-ocr

TheOCRGuy

Posted 2019-06-15T15:44:08.050

Reputation: 1

Tesseract detection problems due to huge space between characters

No answers