Defeat SVGCaptcha

80

6

I came across SVGCaptcha, and immediately knew it was a bad idea.

I would like you to show just how bad an idea this is by extracting the validation code from the SVG images that code produces.


An example image looks like this:
8u4x8lf
Here is the source of the example image:

<?xml version="1.0" encoding="utf-8"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
        "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
    <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve"
             width="200" height="40"
    > <rect x="0" y="0" width="200" height="40" 
        style="stroke: none; fill: none;" >
        </rect> <text style="fill: #4d9363;" x="5" y="34" font-size="20" transform="translate(5, 34) rotate(-17) translate(-5, -34)">8</text>
<text style="fill: #be8b33;" x="125" y="29" font-size="21" transform="translate(125, 29) rotate(17) translate(-125, -29)">f</text>
<text style="fill: #d561ff;" x="45" y="35" font-size="20" transform="translate(45, 35) rotate(-2) translate(-45, -35)">4</text>
<text style="fill: #3de754;" x="85" y="31" font-size="21" transform="translate(85, 31) rotate(-9) translate(-85, -31)">8</text>
<text style="fill: #5ed4bf;" x="25" y="33" font-size="22" transform="translate(25, 33) rotate(16) translate(-25, -33)">u</text>
<text style="fill: #894aee;" x="105" y="28" font-size="25" transform="translate(105, 28) rotate(9) translate(-105, -28)">1</text>
<text style="fill: #e4c437;" x="65" y="32" font-size="20" transform="translate(65, 32) rotate(17) translate(-65, -32)">x</text>
</svg>

The input is the SVG image, which is a textual format.

The only real restriction is that your code must produce the values in the correct order.
The input <text> elements are in random order so you have to pay attention to the x attribute in the <text> tag


Score is the number of bytes in the code


Since the code currently does two transforms that cancel each-other out you can ignore them, but if you do take them in consideration, go ahead and take a 30% reduction from your score.

Brad Gilbert b2gills

Posted 2015-11-15T22:25:21.813

Reputation: 12 713

3You haven't actually stated explicitly what the input and output are: I'm presuming the SVG file and the letters contained therein? And it's not clear to me whether answers are required to actually implement the SVG spec or whether they can assume that the SVG is generated by the current version of SVGCaptcha and so the transforms can be ignored. – Peter Taylor – 2015-11-15T22:51:17.123

I suggest limiting the output to STDOUT or function return value, and making it [tag:code-golf] – TheDoctor – 2015-11-15T22:54:02.483

@AlexA. So there can't be a criterion on quality that can't really be quantified? You can't qualify on how funny the code is, or how clear it is? I really want the answers to be so dead simple that people who don't even know a single language could eventually figure it out. – Brad Gilbert b2gills – 2015-11-15T22:55:10.470

@PeterTaylor I totally didn't look to see if the translations actually did anything. I would prefer if it did take them into consideration, but since the code is even dumber than I thought, any answer could ignore them. – Brad Gilbert b2gills – 2015-11-15T22:58:16.980

1No, questions need an objective, quantifiable winning criterion to be on-topic for this site. – Alex A. – 2015-11-15T23:00:10.567

If you want the answers to be "simple", why don't you just make it codegolf so that the shortest possible answer wins? Then we can get this open again. Yes SVGCaptcha is a hilariously dumb idea (I suspect the inventors do in fact know that, though there isn't a trace of irony shown on the linked page.) And it's been good for a laugh, but the purpose of this site is to host challenges, with objective rules. – Level River St – 2015-11-15T23:24:01.847

@steveverrill I really wanted them to be elegant. The shortest code in Perl 5&6 are very often not the simplest for example. (one of the reasons most people think that Perl is hard to read) But in the interest of opening it is now a golfing competition – Brad Gilbert b2gills – 2015-11-15T23:39:22.003

7I'm not sure how relevant [tag:image-processing] is here. – SuperJedi224 – 2015-11-16T00:42:15.150

18This question is now the 4th result when googling 'svgcaptcha' :) – Blue – 2015-11-16T09:33:35.823

It is the 3rd now! – Ioannes – 2015-11-16T12:44:39.270

the google result for svgcaptcha it self and the relevance of this OP to the keyword is the elegant and ironical proof of how bad the idea itself is. ;) – Zaibis – 2015-11-16T17:12:23.373

Can we output it as a list of characters instead of String? I.e. [8, u, 4, x, 8, 1, f] instead of 8u4x81f? – Kevin Cruijssen – 2018-01-18T09:30:13.603

@KevinCruijssen In some languages there is no difference. So sure why not. – Brad Gilbert b2gills – 2018-01-18T14:52:33.987

Answers

18

Bash, 63 56 39 bytes

cat<<_|grep -o 'x=.*>'|cut -c4-|sort -n|grep -o '>.</t'|cut -c2

grep -o 'x=.*>'|cut -c4-|sort -n|grep -o '>.</t'|cut -c2

grep -o 'x=.*<'|sort -k1.4n|rev|cut -c2

Note: requires cat, grep, sort, rev, and cut. Takes input from stdin. The output is separated by line breaks on stdout. Make sure to press CTRL+D (not COMMAND+D on Mac) when finished entering the CAPTCHA. Input must be followed by a newline and then '_'.

EDIT: Saved 13 bytes.

EDIT 2: Saved 20 bytes thanks to @manatwork!

Coder-256

Posted 2015-11-15T22:25:21.813

Reputation: 291

GNU coreutils sort supports character position in the keydef: cut -c4-|sort -nsort -k1.4n. – manatwork – 2018-01-18T10:56:32.663

@manatwork Thanks, I updated the answer. – Coder-256 – 2018-01-18T21:55:47.587

13

CJam, 26 bytes

q"x="/2>{'"/1=i}${'>/1=c}/

Try it online in the CJam interpreter.

How it works

q     e# Read all input from STDIN.
"x="/ e# Split it at occurrences of "x=".
2>    e# Discard the first two chunks (head and container).
{     e# Sort the remaining chunks by the following key:
  '"/ e#   Split at occurrences of '"'.
  1=  e#   Select the second chunk (digits of x="<digits>").
  i   e#   Cast to integer.
}$    e#
{     e# For each of the sorted chunks:
  '>/ e#   Split at occurrences of '>'.
  1=  e#   Select the second chunk.
  c   e#   Cast to character.
}/    e#

Dennis

Posted 2015-11-15T22:25:21.813

Reputation: 196 637

8

JavaScript, 95 93 91 bytes

l=[],r=/x="(\d*).*>(.)/g;while(e=r.exec(document.lastChild.innerHTML))l[e[1]]=e[2];l.join``

edit: -2 bytes changing documentRoot to lastChild; -2 bytes changing join('') to join``, thanks Vɪʜᴀɴ

Enter code in the browser console on a page containg the SVG in question, writes to console output.

Nickson

Posted 2015-11-15T22:25:21.813

Reputation: 121

document.rootElement is retuning undefined. I've tried Firefox and Safari – Downgoat – 2015-11-16T02:31:11.657

This was only tested in Chrome, I'll look into what could be changed. – Nickson – 2015-11-16T02:32:41.630

It appears to work in Firefox, is SVG the only content of the file? – Nickson – 2015-11-16T02:39:33.497

Okay, tried it in Chrome, now it worked. +1. You can also save two bytes by changing the ('') to two backticks: `` – Downgoat – 2015-11-16T02:41:55.807

This is 78: t=>(l=[],r=/x="(\d*).*?>(.)/g,eval("while(e=r.exec(t))l[e[1]]=e[2];l.join``")) (takes xml string as parameter, returns captcha text) – DankMemes – 2015-11-16T07:25:50.313

7

Perl, 40 bytes

39 bytes code + 1 for -n

$a[$1]=$2 for/x="(.+)".+(.)</g}{print@a

Example:

perl -ne '$a[$1]=$2 for/x="(.+)".+(.)</g}{print@a' <<< '<example from above>'
8u4x81f

Dom Hastings

Posted 2015-11-15T22:25:21.813

Reputation: 16 415

Man that is just full of warnings if you turn them on. Excellent use of Perl's default lax nature. – Brad Gilbert b2gills – 2015-11-16T15:44:45.277

@BradGilbertb2gills Yeah, I try not to test the warnings, I'm so surprised any golfed code even works sometimes! – Dom Hastings – 2015-11-17T08:49:54.190

7

Bash + GNU utilities, 53

grep -Po '(?<=x=").*(?=<)'|sort -n|grep -Po '(?<=>).'

Like this answer, output is one char per line.

Digital Trauma

Posted 2015-11-15T22:25:21.813

Reputation: 64 644

3

Befunge, 79 bytes

It feels like it should be possible to golf at least one more byte off of this, but I've been working on it for a couple of days now, and this is as good as I could get it.

<%*:"~"*"~"_~&45*/99p1v-">":_|#`0:~<
#,:#g7-#:_@^-+*"x~=":+_~99g7p>999p#^_>>#1+

Try it online!

Explanation

Source code with execution paths highlighted

* Make the execution direction right-to-left, and wrap around to start the main loop.
* Read a char from stdin, and test for the end-of-file value.
* If it's not end-of-file, check if it's a >.
* If it's not a >, add it to the value on the stack which tracks the last two characters, and check if the current pair matches x=.
* If not, multiply by 126 and mod with 1262 to drop the oldest value from the pair and make space for the next character.
* Wrap around again to repeat the main loop.
* When an x= pair is encountered, skip the next character (the quote), read an integer (the x value), and divide by 20. This becomes the current offset which is saved for later.
* When a > is encountered, read the next character (typically one of the captcha letters), and save that at the current offset in an "array". Reset the offset to 9, so the captcha letter won't be overwritten when later > characters are encountered.
* Finally, when then end-of-file is reached, iterate over the 7 values saved in the array and output them one by one. That should give you all the captcha letters in the correct order.

I'm glossing over some of the details here, since the code paths overlap each other in ways which are a little difficult to explain, but it should give you a general idea of how the algorithm works.

James Holderness

Posted 2015-11-15T22:25:21.813

Reputation: 8 298

3

Perl 6, 68 bytes

say [~] lines.map({/'x="'(\d+).*(.)\</??(+$0=>$1)!!()}).sort».value

Brad Gilbert b2gills

Posted 2015-11-15T22:25:21.813

Reputation: 12 713

2

V, 28 26 25 24 bytes

d5j́x=
ún
J́">
lH$dÍî

Try it online!

Explanation:

d5j              delete first 6 lines
   Í<0x81>x=     In every line, replace everything up to x=" (inclusive) by nothing
ún               Sort numerically
J                Join line (closing </svg>) with next line
 Í<0x81>">       In every line, replace everything up to "> by nothing
l␖H$d            Visual block around closing </text> tags, delete
     Íî          In every line, replace \n by nothing.

HexDump:

00000000: 6435 6acd 8178 3d0a fa6e 0a4a cd81 223e  d5j..x=..n.J..">
00000010: 0a6c 1648 2464 cdee                      .l.H$d..

oktupol

Posted 2015-11-15T22:25:21.813

Reputation: 697

2

QuadS, 49 bytes

∊c[⍋⊃x c←↓⍎¨@1⍉(⊢⍴⍨2,⍨.5×≢)3↓⍵]
x="(\d+)
>(.)<
\1

Try it online!

Finds x values (digit-runs after x=") and "letters" (pinned by closing and opening tags), then executes the following APL (where is the list of found x values and letters, in order of appearance):

3↓⍵ drop the first three elements (spaces around <rect/rect> and the <rect's x value).

() apply the following tacit function on that:

 the number of remaining items

.5× halve that

2,⍨ append a two

⊢⍴⍨ reshape to that shape (i.e. an n×2 matrix)

 transpose (to a 2×n matrix)

⍎¨@1 execute each string in the first row (turning them into numbers)

 split the matrix into two vectors (one per row)

x c← store those two in x (x values) and c (characters) respectively

 pick the first (x)

 grade up (the indices into x which would sort x)

c[] use that to index into c

ϵnlist (flatten) because each letter is a string by itself


The equivalent APL expression of the entire QuadS program is:

∊c[⍋⊃x c←↓⍎¨@1⍉(⊢⍴⍨2,⍨.5×≢)3↓'x="(\d+)"' '>(.)<'⎕S'\1'⊢⎕]

Adám

Posted 2015-11-15T22:25:21.813

Reputation: 37 779

2

Python2, 129 bytes

import re,sys
print''.join(t[1] for t in sorted(re.findall(r'(\d+), -\d+\)"\>(.)\</t',sys.stdin.read()),key=lambda t:int(t[0])))

Takes the HTML source on stdin, produces code on stdout.

orlp

Posted 2015-11-15T22:25:21.813

Reputation: 37 067

How does this sort the output? The <text> elements are in a random order, and the only real requirement is that you have to put them in the correct order. That means you have to use the x from the <text> and follow any transforms. – Brad Gilbert b2gills – 2015-11-15T22:36:26.937

@BradGilbertb2gills I missed that the first time around, fixed now. – orlp – 2015-11-15T22:43:35.413

2

Mathematica, 106 bytes

""<>(v=ImportString[#~StringDrop~157,"XML"][[2,3,4;;;;2]])[[;;,3]][[Ordering[FromDigits/@v[[;;,2,2,2]]]]]&

Note: The input needs to be in exactly the format specified by the example.

LegionMammal978

Posted 2015-11-15T22:25:21.813

Reputation: 15 731

1

Java 8, 197 173 bytes

import java.util*;s->{String a[]=s.split("x=\""),r="";Map m=new TreeMap();for(int i=2;i<a.length;m.put(new Long(a[i].split("\"")[0]),a[i++].split(">|<")[1]));return m.values();}

Outputs a java.util.Collection of characters.

Explanation:

Try it online.

import java.util*;            // Required import for Map and TreeMap
s->{                          // Method with String as both parameter and return-type
  String a[]=s.split("x=\""), //  Split the input by `x="`, and store it as String-array
         r="";                //  Result-String, starting empty
  Map m=new TreeMap();        //  Create a sorted key-value map
  for(int i=2;                //  Skip the first two items in the array,
      i<a.length;             //  and loop over the rest
    m.put(new Long(a[i].split("\"")[0]),
                              //   Split by `"` and use the first item as number-key
          a[i++].split(">|<")[1]));
                              //   Split by `>` AND `<`, and use the second item as value
    return m.values();}       //  Return the values of the sorted map as result

Kevin Cruijssen

Posted 2015-11-15T22:25:21.813

Reputation: 67 575

1

Gema, 65 characters

x\="<D>*\>?=@set{$1;?}
?=
\Z=${5}${25}${45}${65}${85}${105}${125}

In Gema there is no sorting, but fortunately is not even needed.

Sample run:

bash-4.4$ gema 'x\="<D>*\>?=@set{$1;?};?=;\Z=${5}${25}${45}${65}${85}${105}${125}' < captcha.svg
8u4x81f

manatwork

Posted 2015-11-15T22:25:21.813

Reputation: 17 865

1

XMLStarlet, 46 characters

xmlstarlet sel -t -m //_:text -s A:N:U @x -v .

Hopefully this is valid solution as XMLStarlet is transpiler that generates and executes XSLT code, which is a Turing complete language.

Sample run:

bash-4.4$ xmlstarlet sel -t -m //_:text -s A:N:U @x -v . < captcha.svg 
8u4x81f

manatwork

Posted 2015-11-15T22:25:21.813

Reputation: 17 865

1

PHP, 96 bytes

Given that $i is the input string

preg_match_all('|x="(\d+).*(.)\<|',$i,$m);$a=array_combine($m[1],$m[2]);ksort($a);echo join($a);

Jordi Kroon

Posted 2015-11-15T22:25:21.813

Reputation: 111

1

Instead of array_combine() + ksort() you could use array_multisort() like this: array_multisort($m[1],$m[2]);echo join($m[2]);. But please note that solutions are expected to handle input and output themselves (unless the language does it automatically), instead of expecting to find the input in a variable or just leave the result in a variable. See related meta.

– manatwork – 2018-01-19T09:12:09.730

1

Clean, 277 150 bytes

Yay pattern matching!

import StdEnv,StdLib
?s=map snd(sort(zip(map(toInt o toString)[takeWhile isDigit h\\['" x="':h]<-tails s],[c\\[c:t]<-tails s|take 7 t==['</text>']])))

Try it online!

Defines the function ?, taking [Char] and giving [Char].

Οurous

Posted 2015-11-15T22:25:21.813

Reputation: 7 916