How can an image of a scanned page be automatically divided into words similar to reCaptcha?

-1

I have an image of a page from a book and I want to divide it into separate little cropped words. Is there a way to do that?

webmagnets

Posted 2015-09-27T11:39:33.897

Reputation: 99

Are you talking about OCR, as you tagged this, or chopping up the image into individual word images? – fixer1234 – 2015-09-28T00:37:48.877

Chopping up the image into individual word images. Didn't know what to tag it. – webmagnets – 2015-09-28T03:39:13.247

Assuming the lines are equally spaced, you could automate splitting off each line, probably using common image software that does batch operations (I'm thinking Irfanview, but you don't indicate your OS). Separating each word is trickier. You might be able to do something like copy the page to a layer and use a filter to heavily blur the words to the point where they are darkish blobs. Then select based on a color range that includes the word blobs but not the lighter gaps in between. Apply the selection to the original layer. Not sure how you would save each to a separate file, though. – fixer1234 – 2015-09-28T04:13:18.427

Answers

0

Install a document mobile scanner into your mobile. You can capture the page you want with mobile device's camera. The mobile scanner will recognize for you and retrieve text for you. You can edit the text and save it. Do you think it is ok for you?

Johnson15

Posted 2015-09-27T11:39:33.897

Reputation: 1

If you'd read the comments on the question you'd have noticed that OP didn't mean OCR but chopping the image to little pieces (each containing a word). Thus, your solution is not applicable. – zagrimsan – 2015-10-23T09:24:46.217