Captcha Capture

10

Captchas aren't meant to be machine-readable, and that's terrible. It is time to end this obvious bout of unfounded hatred for machines just trying to let you know about great deals on ṗḧ@ṛḿ4ćëüäḷṡ ƻđȺɏ.

Note: All captchas were made by using a library that has been modified so that it mostly generates images much easier than most real captchas. Don't expect code here to work well in the real world.

Your task

Make a program that translates given captchas to their text.

Default acceptable image I/O methods apply for input.
Output must be the text contained in the captcha, but other than that default acceptable general I/O methods apply.

Example

Input:

dlznuu.png

Output: dlznuu

Rules

  • Please don't just hardcode the images or image hashes or such. If anyone does this I'm going to generate another testcase folder and then everyone will be unhappy. (Hardcoding stuff like "this is how a letter looks" is allowed, but creativity is encouraged.)
  • Don't rely on the filename or anything but the actual image data for anything other than validating your output.
  • Your program must be below 50,000 bytes.
  • No using Tesseract. Come on, you thought it was gonna be that easy?

Test cases:

  • All captchas have 6 letters in them.
  • All captchas are 200 by 80 pixels.
  • All characters are lowercase.
  • All characters are in the range a-z, with no numbers or special characters.
  • All captchas fit the regex /^[a-z]{6}$/.
  • Every testcase is named [correct_result].png

6 example images for starting out.
1000 images

Scoring:

Test your program on all 1000 images provided in the tar.gz. (the gist files are just example files)
Your program's score is (amount_correct / 1000) * 100%.

NO_BOOT_DEVICE

Posted 2018-11-03T22:25:44.970

Reputation: 419

Question was closed 2018-11-24T03:39:36.427

1Um, / 1000 * 100 is just / 10? – Erik the Outgolfer – 2018-11-03T22:39:14.253

@EriktheOutgolfer That is maybe slightly less clear. *100 emphasizes that the score is being displayed as a percentage. – dylnan – 2018-11-03T22:59:08.377

@EriktheOutgolfer Exactly what dylnan said -- I made the conscious choice not to do that because it'd be more obvious this way, in my opinion. – NO_BOOT_DEVICE – 2018-11-04T02:12:34.163

1Is there a reason you didn't just say "your program's score is the percentage correct on the 1000 test cases"? – Kamil Drakari – 2018-11-04T02:42:50.573

@Abigail The result should always be 6 lowercase letters in a row, because that's what the content of the captcha always is. – NO_BOOT_DEVICE – 2018-11-04T23:04:22.157

@Abigail ...I thought I had said /^[a-z]{6}$/ and not /[a-z]{6}/, but on closer inspection I hadn't. Nice catch. – NO_BOOT_DEVICE – 2018-11-05T05:55:52.200

Are any OCR libraries allowed? – Logern – 2018-11-16T22:26:46.800

3@KamilDrakari, even that's way more complicated than it needs to be. "Your program's score is the number of correct results from the test battery." – Peter Taylor – 2018-11-23T13:43:49.560

No answers