Reverse a regex

27

2

The Challenge

Given a valid regex, output a regex that that matches the same set of strings, but reversed.

The Task

This challenge uses the most basic regex operations: ^, $, ?, +, *, [], {}, |. There's no such thing as capture groups or any of that complicated stuff. Special characters can be escaped.

Sample Input/Output

Note: Invalid input will never be given, and there are generally multiple possible answers for a given input!

Input      | Sample Output
-----------|-------------
abc        | cba
tuv?       | v?ut
a(b|c)     | (c|b)a
1[23]      | [23]1
a([bc]|cd) | (dc|[bc])a
^a[^bc]d$  | ^d[^bc]a$
x[yz]{1,2} | [yz]{1,2}x
p{2}       | p{2}
q{7,}      | q{7,}
\[c[de]    | [de]c\[
ab[c       | <output undefined>
a(?bc)     | <output undefined>
a[]]bc     | <output undefined>

Demo

Working demo that demonstrates correct inputs/outputs. This has some additional logic for validating inputs that isn't necessary in a real answer. Consider invalid inputs to be undefined behavior.

Specifics

For simplicity, all special characters either have their special meaning or are escaped; that is, [[] isn't a character range for [. Length ranges come from standard POSIX EREs; that is, {n}, {n,}, and {n,m} are supported. The character ranges [] and [^] are supported. Because of these rules, and since no invalid input is given, you really need only copy the contents of these directly to the output. Lastly, greediness doesn't matter, i.e. it doesn't matter if the reversed regex finds a different match first, it just needs to find a match for the same set of strings.

Scoring

Smallest program in bytes (barring cheating such as network requests) wins. Program can either use real IO or simply define a function.

TND

Posted 2015-11-04T13:31:21.617

Reputation: 563

1Because there's nothing for the ? to attach to. Try typing /a(?bc)/ into the browser's console. – TND – 2015-11-04T15:16:13.403

3Looks good now. You might want to add something like (^a|b)(c$|d) as a test case though. – Martin Ender – 2015-11-04T16:16:36.567

Can we assume that the input will contain only printable ASCII characters? In particular, no linefeed characters? – Martin Ender – 2015-11-04T19:25:34.307

As a side note, I'm pretty sure those "cheating" programs are covered in the standard loopholes post, which is somewhere that I can't find right now because I'm dumb. You might want to link to that. – Fund Monica's Lawsuit – 2015-11-04T20:30:16.083

1Should we consider qualifiers applied on groups i.e. (a)?(b)+(b)+(a)?? – kennytm – 2015-11-04T20:57:45.147

1Your list of regex operation is missing (), which is used in your example. – n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳ – 2015-11-06T03:39:36.897

Answers

7

Retina, 136 114 110 bytes

Yo dawg, I heard you like regex...

^
;
(T`^$`$^`;.
(.*);(\[(\\.|[^]])*]|\\.|.)([*+?]|{\d+(,|,\d+)?})?
$2$4!$1;
^\(!
) 
^\)(.*)!(.+?) 
($2$1
;$|!
<empty>

Where <empty> represents an empty trailing line. Run the code from a single file with the -s flag.

...when you wanna reverse regex you should use regex. When you wanna use regex you should use a regex-based programming language.

This code assumes that the input contains neither ; nor ! nor spaces. While I agree that that's a pretty strong and potentially invalid assumption, you could replace those three in the code with any three unprintable characters (like null bytes, bell character, <DEL> you name it), and it wouldn't affect the code size or functionality at all.

I'll add an explanation when I'm done golfing.

Martin Ender

Posted 2015-11-04T13:31:21.617

Reputation: 184 808

3"I herd* you liek*" – lirtosiast – 2015-11-05T18:00:50.523

I think the code also assumes that the regex does not contain any raw new line character. – n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳ – 2015-11-06T03:03:48.180

@n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳ Oh, that is true, I was under the assumption that there will be no unprintable characters in the input. I will fix that once we get clarification from the OP. (If any characters can appear, there are still certain combinations that won't show up in what this challenge considers a valid regex, e.g. ]]], so either way, this answer won't need much modification.) – Martin Ender – 2015-11-06T07:34:32.567

done golfing after over a year? :P – Okx – 2017-02-03T16:40:53.810

2

JavaScript ES6, 574 bytes

I can probably remove a few var statements.

R=e=>{for(var s=0,c=[],h=/(\?|\+|\{\d*,*\d*\}|\*)(\?*)/,t=0;t<e.length;t++)switch(s){case 0:switch(e[t]){case"\\":t++,c.push("\\"+e[t]);break;case"[":j=t,s=1;break;case"(":k=t,s=2;break;default:var l=e.search(h,t);(l>=t+1||0>l)&&c.push(l==t+1?e[t]+e.slice(t,e.length).match(h)[0]:e[t])}break;case 1:"\\"==e[t]?t++:"]"==e[t]&&(c.push(e.slice(j,t+1)+(e.search(h,t)==t+1?e.slice(t,e.length).match(h)[0]:"")),s=0);break;case 2:"\\"==e[t]?t++:")"==e[t]&&(a=R(e.slice(k+1,t)),c.push("("+a+")"),s=0)}c.reverse(),r=c;var i=c.length-1;return"^"==c[i]&&(r[i]="$"),"$"==c[0]&&(r[0]="^"),r.join``}}

JS ES6, untested, 559 bytes

Will test at home.

R=e=>{for(s=0,c=[],h=/(\?|\+|\{\d*,*\d*\}|\*)(\?*)/,t=0;t<e.length;t++)switch(s){case 0:switch(e[t]){case"\\":t++,c.push`\\${e[t]}`;break;case"[":j=t,s=1;break;case"(":k=t,s=2;break;default:l=e.search(h,t);(l>=t+1||0>l)&&c.push(l==t+1?e[t]+e.slice(t,e.length).match(h)[0]:e[t])}break;case 1:"\\"==e[t]?t++:"]"==e[t]&&(c.push(e.slice(j,t+1)+(e.search(h,t)==t+1?e.slice(t,e.length).match(h)[0]:"")),s=0);break;case 2:"\\"==e[t]?t++:")"==e[t]&&(a=R(e.slice(k+1,t)),c.push`(${a})`,s=0)}c.reverse(),r=c;i=c.length-1;return"^"==c[i]&&(r[i]="$"),"$"==c[0]&&(r[0]="^"),r.join``}}

JavaScript ES5, ungolfed, 961 bytes

function revRegex(str){
 var mode = 0;
 var oS = [];
 var post = /(\?|\+|\{\d*,*\d*\}|\*)(\?*)/;
 for(var i=0;i<str.length;i++){
  switch(mode){
   case 0: switch(str[i]){
    case "\\": i++; oS.push("\\"+str[i]); break;
    case "[": j=i; mode = 1; break;
    case "(": k=i; mode = 2; break;
    default:
     var pLoc = str.search(post,i);
     if(pLoc>=i+1||pLoc<0){ // current is not pLoc
      if(pLoc==i+1){
       oS.push(str[i] + str.slice(i,str.length).match(post)[0]);
      } else {
       oS.push(str[i]);
      }
     }
   }; break;
   case 1: if(str[i]=="\\") i++; else if(str[i]=="]"){oS.push

(str.slice(j,i+1)+(str.search(post,i)==i+1?str.slice

(i,str.length).match(post)[0]:""));mode = 0}; break;
   case 2: if(str[i]=="\\") i++; else if(str[i]==")")

{a=revRegex(str.slice(k+1,i));oS.push("("+a+")");mode = 

0};break;
  }
 }
 oS.reverse();
 r=oS;
 var l=oS.length-1;
 if(oS[l]=="^") r[l]="$";
 if(oS[0]=="$") r[0]="^";
 return r.join("");
}

Conor O'Brien

Posted 2015-11-04T13:31:21.617

Reputation: 36 228

5lol, you used regex to reverse regex :D – user41805 – 2015-11-04T17:26:53.440

3@ΚριτικσιΛίθος Yes, I did :D I'd use it to parse HTML if I could... – Conor O'Brien – 2015-11-04T17:28:25.667

4"if you could" – Optimizer – 2015-11-04T19:12:11.923

@Optimizer Indeed!

– Conor O'Brien – 2015-11-04T19:13:17.570

Nice work! If you're still looking to golf this, I'm 99% sure that a few if(x==y)do_stuff(); statements will be at least 25% shorter than switch(x){case y:do_stuff();break;}. – ETHproductions – 2015-11-04T19:13:44.317

@ETHproductions Thanks :) I think so, too; I'll put in the untested answer, but, atm, I'm isolated from Firefox. – Conor O'Brien – 2015-11-04T19:15:26.107

1@CᴏɴᴏʀO'Bʀɪᴇɴ i parsed html codes with regex but got a serious problem with unicode chars – Abr001am – 2015-11-04T19:21:20.640

1@Agawa001 o__o Pro tip: this is the only valid way to parse HTML with Regex: code.replace(/<(.+)>.+?</\1>/,""); – Conor O'Brien – 2015-11-04T19:26:59.940

2@CᴏɴᴏʀO'Bʀɪᴇɴ This is an alternative: `code.replace(/.*/, "trollolol"); – user41805 – 2015-11-04T20:09:08.597

@CᴏɴᴏʀO'Bʀɪᴇɴ What if I write <a onclick="map(b => b.toUpperCase())">Upper Case</a>? Or heck, even just <a href="http://google.com/">Blah</a>? Your code doesn't strip those out! Anyway, time for me to go back to Hell. Regards, Satan. – Fund Monica's Lawsuit – 2015-11-04T20:35:53.907

The 574 bytes one gives R("a{3}b") -> "a{3}" – feersum – 2015-11-04T21:08:41.167

@feersum So does the other one... let me debug. – Conor O'Brien – 2015-11-04T21:12:24.930

Here is a test case. At least the ungolfed version seems to be dropping several tokens. I get this result. – Martin Ender – 2015-11-05T07:13:39.247

@MartinBüttner Thanks for the test case. I unfortunately cannot debug probably for another 6-7 hours. – Conor O'Brien – 2015-11-05T11:58:37.903

2

JavaScript ES6, 343 bytes

t=r=>(u="substr",T="(?:[+?*]|{\\d+(?:,\\d*)?})?)([^]*)",(c=r[0])=="(")?([n,s]=v(r[u](1)),[_,q,s]=s.match("^(\\)"+T),["("+n+q,s]):c==")"||c==null?["",r]:c=="^"?["^",r[u](1)]:c=="$"?["^",r[u](1)]:r.match("^("+/(?:\[(?:[^\]\\]|\\[^])*\]|[^[\\]|\\[^])/.source+T).slice(1);(v=r=>{var o="";do{[n,r]=t(r);o=n+o;}while(n&&r);return[o,r]})(prompt())[0]

Original code (the functions, but without prompt):

function reverse(regex) {
    var out = "";
    do {
        // console.log("calling term");
        var [node, regex] = term(regex);
        // console.log("reverse: " + [node, regex]);
        out = node + out;
    } while (node && regex);
    return [out, regex];
}

function term(regex) {
    switch (regex[0]) {
        case "(":
            // console.log("calling reverse");
            var [node, sequel] = reverse(regex.substr(1));
            // console.log("term: " + regex + " / " + [node, sequel]);
            var [_, quantifier, sequel] = sequel.match(/^(\)(?:[+?*]|{\d+(?:,\d*)?})?)([^]*)/);
            return ["(" + node + quantifier, sequel];
        case ")":
        case void 0:
            return ["", regex];
        case "^":
            return ["$", regex.substr(1)];
        case "$":
            return ["^", regex.substr(1)];
        default:
            return regex.match(/^((?:\[(?:[^\]\\]|\\[^])*\]|[^[\\]|\\[^])(?:[+?*]|{\d+(?:,\d+)?})?)([^]*)/).slice(1);
    }
}

reverse("^\\(([The(){}*\\] ]{2,3}world\\\\(begin(ner|ning)?|ends*)+|Con\\|ti\\*n\\)ue...[^%\\[\\]()\\\\])$")[0]

The code is implemented as a recursive top-down parser, so it may cause stack overflow on deeply nested input.

The code may cause infinite loop in invalid cases, since I don't test them, taking advantage of the "undefined behavior" clause.

n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳

Posted 2015-11-04T13:31:21.617

Reputation: 5 683

0

Python 3, 144 bytes

(This one does not support qualifiers on a group like (a)+(b)*(cde)?.)

import re;f=lambda x:''.join({e:e,'^':'$','$':'^','(':')',')':'('}[e]for e in re.findall(r'(?:\\.|\[(?:\\?.)+?\]|.)(?:[?+*]|\{.+?\})?',x)[::-1])

Test cases:

test_cases = [
    # Provided test cases
    r'abc',
    r'tuv?',
    r'a(b|c)',
    r'1[23]',
    r'a([bc]|cd)',
    r'^a[^bc]d$',
    r'x[yz]{1,2}',
    r'p{2}',
    r'q{7,}',
    r'\[c[de]',

    # Pathological cases
    r'a{3}b',
    r'(a)?(b)+',            # <-- currently failing!
    r'[\[]{5}[^\]]{6}',
    r'[\[]\]{7}',
    r'[\[\]]{8,9}',
    r'\(\)\^\$',

    # Undefined cases
    r'ab[c',
    r'a(?bc)',
    r'a[]]bc',
]

for t in test_cases:
    print(t, '->', f(t))

Result:

abc -> cba
tuv? -> v?ut
a(b|c) -> (c|b)a
1[23] -> [23]1
a([bc]|cd) -> (dc|[bc])a
^a[^bc]d$ -> ^d[^bc]a$
x[yz]{1,2} -> [yz]{1,2}x
p{2} -> p{2}
q{7,} -> q{7,}
\[c[de] -> [de]c\[
a{3}b -> ba{3}
(a)?(b)+ -> )+b))?a)                    # <-- note: wrong
[\[]{5}[^\]]{6} -> [^\]]{6}[\[]{5}
[\[]\]{7} -> \]{7}[\[]
[\[\]]{8,9} -> [\[\]]{8,9}
\(\)\^\$ -> \$\^\)\(
ab[c -> c[ba
a(?bc) -> (cb(?a
a[]]bc -> cb[]]a

kennytm

Posted 2015-11-04T13:31:21.617

Reputation: 6 847