Recognize ASCII art numbers

15

2

Challenge

Recognize ASCII art numbers. To make things interesting, three random points in the image might be flipped. For example:

 ***** 
 *  ** 
    ** 

   **  
  **   
 **    

Input

A 7x7 ASCII art number generated by the below Python script.

Output

A digit.

Testing script

Here's a Python script (2.6+) to generate the test cases:

import random

digits = '''\
  ***  
 ** ** 
**   **
**   **
**   **
 ** ** 
  ***  

   *   
 ***   
   *   
   *   
   *   
   *   
 ***** 

  ***  
 *  ** 
     * 
    ** 
   **  
  **   
 ******

  ***  
 *  ** 
     * 
  ***  
     * 
 *  ** 
  ***  

   **  
  ***  
 * **  
*  **  
****** 
   **  
   **  

 ***** 
 **    
 ****  
     * 
     * 
 *   * 
  ***  

  **** 
 **    
 ***** 
 *   * 
 **  **
 **  * 
  **** 

 ***** 
    ** 
    ** 
   **  
   **  
  **   
 **    

  **** 
 **  **
 **  **
  **** 
 **  **
 **  **
  **** 

  ***  
 ** ** 
**   **
 **  * 
  **** 
    ** 
 ****  '''.split('\n\n')

def speckle(image, num_speckles):
    grid = [list(row) for row in image.split('\n')]

    for i in range(num_speckles):
        row = random.choice(grid)
        row[random.randint(0, 6)] = random.choice([' ', '*'])

    return '\n'.join([''.join(row) for row in grid])

digit = random.choice(digits)

print(speckle(digit, 3))

Blender

Posted 2014-03-06T06:53:12.567

Reputation: 665

Are you sure the Hamming distance between each two digits is more than 6? – John Dvorak – 2014-03-06T07:01:35.437

@JanDvorak: I tweaked the font so that this won't be a problem. Do you see one? – Blender – 2014-03-06T07:03:37.870

Answers

9

APL (87 85)

1-⍨⊃⍒(,↑{7↑'*'=⍞}¨⍳7)∘(+.=)¨{49↑,(16/2)⊤⎕UCS⍵}¨↓10 3⍴'嵝䍝뫂傁ဣ␋䠁䊫낫䢝䊅넂垵僡ᑨ嘙쐅嘹䜝䪀슪퀪岹亝尵䌧뮢'

Explanation:

Each possible ASCII number is encoded in 48 bits. (The 49th bit is always zero anyway). The string 嵝䍝뫂傁ဣ␋䠁䊫낫䢝䊅넂垵僡ᑨ嘙쐅嘹䜝䪀슪퀪岹亝尵䌧뮢 has three characters per ASCII number, each of which encodes 16 bits.

  • ↓10 3⍴: split the data string into 10 3-char groups, each of which encodes a number.
  • {...: for each of the groups:
    • (16/2)⊤⎕UCS⍵: get the first 16 bits of each of the three characters
    • ,: concatenate the bit arrays into one array
    • 49↑: take the first 49 elements. There are only 48, so this is equivalent to adding a 0 at the end.
  • ,↑{7↑'*'=⍞}¨⍳7: read 7 lines of 7 characters from the keyboard, make a bit array for each line where 1 means the character was a *, and join them together.
  • (+.=)¨: for each possible digit, calculate how much bits the input had in common with the digit.
  • : get the indices for a downwards sort of that list, so that the first item in the result is the index of the largest number in the previous list.
  • : take the first item, which is the index of the digit
  • 1-⍨: subtract one, because APL indices are 1-based.

marinus

Posted 2014-03-06T06:53:12.567

Reputation: 30 224

3wow 87? must be the longest APL program ever. – izabera – 2014-03-06T11:53:30.407

4I always thought APL always looks like Greek. Now Chinese as well?!? – Digital Trauma – 2014-03-06T15:33:35.923

5

Python

I'm sure there will be OCR solutions, but the probability of mine being accurate is much higher.

import difflib as x;r=range;s='2***3**1**1**3****3****3**1**1**3***23*4***6*6*6*6*4*****12***3*2**6*5**4**4**4******2***3*2**6*3***7*2*2**3***23**4***3*1**2*2**2******4**5**21*****2**5****7*6*2*3*3***22****2**5*****2*3*2**2**1**2*3****11*****5**5**4**5**4**4**42****2**2**1**2**2****2**2**1**2**2****12***3**1**1**3**1**2*3****5**2****2'
for c in r(8):s=s.replace(str(c),' '*c)
s=map(''.join,zip(*[iter(s)]*7));a=[raw_input("") for i in r(7)];l=[[x.SequenceMatcher('','|'.join(a),'|'.join(s[i*7:(i+1)*7])).ratio()] for i in r(10)];print l.index(max(l))

Input one line of text at a time.

Not sure of a better way to deal with the asterisks without increasing the character count.

grovesNL

Posted 2014-03-06T06:53:12.567

Reputation: 6 736

4

JavaScript (ES6), 89

f=n=>(a=1,[a=(a+a^c.charCodeAt())%35 for(c of n)],[4,25,5,16,0,11,32,13,10,1].indexOf(a))

Usage:

> f("  ***  \n *  ** \n     * \n    ** \n   **  \n  **   \n ******")
2

Un-golfed version:

f = (n) => (
  // Initialize the digit's hash.
  a=1,
  // Hash the digit.
  // 35 is used because the resulting hash is unique for the first ten digits.
  // Moreover, it generates 4 1-digit hashes.
  [a = (a + a ^ c.charCodeAt()) % 35 for(c of n)],
  // Compare the hash to pre-computed digit hash.
  // The matching hash index is the digit.
  [4,25,5,16,0,11,32,13,10,1].indexOf(a)
)

Florent

Posted 2014-03-06T06:53:12.567

Reputation: 2 557

3Does this work if the input isn't exactly equal to one of the digits? According to the question, three pixels may be flipped and it should still work. – marinus – 2014-03-06T12:23:00.570

3

Bash+ImageMagick+tesseract, 316 chars

Here's a stab at an OCR solution. Its not very accurate though, even when telling tesseract that we have just one char and it is a digit. Moderately golfed, but still somewhat readable:

w=0
c()((w=${#2}>w?${#2}:w))
mapfile -c1 -Cc -t l
h=${#l[@]}
{
echo "# ImageMagick pixel enumeration: $w,$h,1,gray"
for y in ${!l[@]};{
for((x=0;x<w;x++));{
[ "${l[$y]:$x:1}" != " " ]
echo "$x,$y: ($?,$?,$?)"
}
}
}|convert txt:- i.png
tesseract i.png o -psm 10 <(echo "tessedit_char_whitelist 0123456789")
cat o.txt

The script takes input from stdin, so we can pipe from the test script.

Note I have put tee >( cat 1>&2 ) in the pipeline just so we can see what the test script actually generated.

Example output (This was a pretty good run with only 1 incorrect char out of 6):

$ python ./asciitest.py | tee >(cat 1>&2 ) | ./scanascii.sh
  ***  
 ** ** 
*    **
 **  * 
  **** 
    ***
 ****  
Tesseract Open Source OCR Engine v3.02 with Leptonica
9

$ python ./asciitest.py | tee >(cat 1>&2 ) | ./scanascii.sh
   *   
 ***  *
   *   
   *   
   *   
   *   
 ***** 
Tesseract Open Source OCR Engine v3.02 with Leptonica
1

$ python ./asciitest.py | tee >(cat 1>&2 ) | ./scanascii.sh
  ***  
 ** ** 
**   **
**   **
**   **
  * ** 
  ***  
Tesseract Open Source OCR Engine v3.02 with Leptonica
0

$ python ./asciitest.py | tee >(cat 1>&2 ) | ./scanascii.sh
 ***** 
 **    
 ****  
     * 
     * 
 **  * 
  ***  
Tesseract Open Source OCR Engine v3.02 with Leptonica
5

$ python ./asciitest.py | tee >(cat 1>&2 ) | ./scanascii.sh
  **** 
 **    
 ***** 
 *   * 
*** ***
 **  **
  **** 
Tesseract Open Source OCR Engine v3.02 with Leptonica
5

$ python ./asciitest.py | tee >(cat 1>&2 ) | ./scanascii.sh
  ***  
 *  ** 
     * 
    ** 
   *** 
  **   
 ******
Tesseract Open Source OCR Engine v3.02 with Leptonica
2

$ 

Digital Trauma

Posted 2014-03-06T06:53:12.567

Reputation: 64 644

1

LÖVE2D, 560 Bytes

t=...;g=love.graphics g.setNewFont(124)g.setBackgroundColor(255,255,255)A=g.newCanvas()B=g.newCanvas()x=1 y=1 g.setColor(255,255,255)g.setCanvas(B)g.clear(0,0,0)for i=1,#t do x=x+1 if t:sub(i,i)=="\n"then x=1 y=y+1 end if t:sub(i,i)=="*"then g.rectangle("fill",x*16,y*16,16,16)end end u=B:newImageData()g.setCanvas(A)S={}for i=0,9 do g.clear(0,0,0,0)g.print(i,48,0)r=A:newImageData()s={i=i,s=0}for x=0,16*8 do for y=0,16*8 do a=u:getPixel(x,y)b=r:getPixel(x,y)s.s=s.s+math.abs(a-b)end end S[i+1]=s end table.sort(S,function(a,b)return a.s<b.s end)print(S[1].i)

First, draws a blocky representation of the input text, then, for each number 0 - 9, overlays a number, checks how many similar pixels there are, and prints the number which got the closest. Very basic OCR. It matches all the Test Cases, and performs reasonably well with mutations.

Call with:

love.exe "" "INPUT"

ATaco

Posted 2014-03-06T06:53:12.567

Reputation: 7 898