18
2
The Problem
I have a bunch of regular expressions that I need to use in some code, but I'm using a programming language that doesn't support regex! Luckily, I know that the test string will have a maximum length and will be composed of printable ASCII only.
The Challenge
You must input a regex and a number n
, and output every string composed of printable ASCII (ASCII codes 32 to 126 inclusive, to
~
, no tabs or newlines) of length less than or equal to n
that matches that regex. You may not use built-in regular expressions or regex matching functions in your code at all. Regular expressions will be limited to the following:
- Literal characters (and escapes, which force a character to be literal, so
\.
is a literal.
,\n
is a literaln
(equivalent to justn
), and\w
is equivalent tow
. You do not need to support escape sequences.) .
- wildcard (any character)- Character classes,
[abc]
means "a or b or c" and[d-f]
means anything from d to f (so, d or e or f). The only characters that have special meaning in a character class are[
and]
(which will always be escaped, so don't worry about those),\
(the escape character, of course),^
at the beginning of the character class (which is a negation), and-
(which is a range). |
- the OR operator, alternation.foo|bar
means eitherfoo
orbar
, and(ab|cd)e
matches eitherabe
orcde
.*
- match the previous token repeated zero or more times, greedy (it tries to repeat as many times as possible)+
- repeated one or more times, greedy?
- zero or one times- Grouping with parentheses, to group tokens for
|
,*
.+
, or?
The input regex will always be valid (i.e., you do not have to handle input like ?abc
or (foo
or any invalid input). You may output the strings in any order you would like, but each string must appear only once (don't output any duplicates).
The Test Cases
Input: .*
, 1
Output: (empty string), ,
!
, "
, ..., }
, ~
Input: w\w+
, 3
Output: ww
, www
Input: [abx-z][^ -}][\\]
, 3
Output: a~\
, b~\
, x~\
, y~\
, z~\
Input: ab*a|c[de]*
, 3
Output: c
, cd
, ce
, aa
, cde
, ced
, cdd
, cee
, aba
Input: (foo)+(bar)?!?
, 6
Output: foo
, foo!
, foofoo
, foobar
Input: (a+|b*c)d
, 4
Output: ad
, cd
, aad
, bcd
, aaad
, bbcd
Input: p+cg
, 4
Output: pcg
, ppcg
Input: a{3}
, 4
Output: a{3}
The Winner
This is code-golf, so the shortest code in bytes will win!
Anybody been able to do this with R? – Kaleb – 2017-02-08T10:29:58.803
Are we allowed to support escape sequences? Then this is trivial. – John Dvorak – 2014-04-05T13:39:37.427
@JanDvorak No, you are not, as specified in the challenge and as demonstrated in one of the test cases. – Doorknob – 2014-04-05T13:40:16.863
@JanDvorak Actually, I thought I had a rule against built-in regex functions, but apparently not. Edited it in now – Doorknob – 2014-04-05T13:42:17.517
3Your explanation of
|
makes very little sense. It doesn't seem to handle nested groups ora|b|c
. What's wrong with using the standard explanations in terms of how strongly concatenation and alternation bind? (And you have no excuse for not using the sandbox) – Peter Taylor – 2014-04-05T13:44:17.270@PeterTaylor Because I have no idea what those standard explanations are? ;) You could edit my post with the "standard" terms, or just link me to a resource on this so I can do it myself. – Doorknob – 2014-04-05T13:45:34.120
1
@PeterTaylor Actually, he has an excuse: http://meta.codegolf.stackexchange.com/q/1305/9498
– Justin – 2014-04-05T15:15:38.867http://www.regular-expressions.info/quickstart.html – Peter Taylor – 2014-04-05T16:01:02.213
@PeterTaylor Ok, edited to just say "alternation" – Doorknob – 2014-04-05T16:15:26.180
I like this Regex Debugger
– F. Hauri – 2014-04-05T17:10:45.2532Judging by your examplesthe pattern has to match the entire string? (As opposed to a substring) – Martin Ender – 2014-04-05T21:01:09.970
@m.buettner Yes, it does. – Doorknob – 2014-04-06T17:39:03.367
@Doorknob You should probably put the {\d,\d} quantifier in the The Challenge section if you're going to have it in the The Test Cases section – Mouq – 2014-04-07T00:00:31.337
@Mouq Why is that? It's a test case, not a rule. – Doorknob – 2014-04-07T00:22:42.430
@Doorknob It's contradictory if your rules don't say you need it but the test cases say you do – Mouq – 2014-04-07T00:36:18.533
@Mouq "need it"? Need what? I suppose you could say it falls under the first category ("Literal characters"), but I don't see the need to clarify. – Doorknob – 2014-04-07T00:37:47.840
I wasn't trying to nit-pick; I just don't see how "Literal characters" covers "a{3}" – Mouq – 2014-04-07T00:59:42.117
@Mouq Umm...
a
is a literal character,{
is a literal character, .... What would you suggest I add to the post? Sorry, I don't see your point. – Doorknob – 2014-04-07T01:01:23.627Hmm. Puzzles like this make me think I ought to learn regular expressions. – Kyle Kanos – 2014-04-07T13:28:35.353
3
@KyleKanos It's a shame real world problems don't make you think you ought to learn regular expressions. :P But they are not as inaccessible as they might seem: http://www.regular-expressions.info/tutorial.html
– Martin Ender – 2014-04-07T19:34:26.730Noticed a wee typo in the rules, going over them to check my entry: '...of length less than n'. You meant less than or equal to, going by the examples, and both entries work that way. – bazzargh – 2014-04-13T01:24:41.153
@bazzargh Thanks; fixed – Doorknob – 2014-04-13T01:25:57.657
What about
abcd
,2
? Do we have to handle cases like that? – Justin – 2014-04-13T02:32:41.030How come
ab*a|c[de]*
matches the empty string? Is this a bug in the test cases? – John Dvorak – 2014-11-20T09:45:24.407Are we required to only output each matched string once? May we output
["aa","a","aa"]
foraa|a+
? – John Dvorak – 2014-11-20T09:48:28.330@JanDvorak Yep, I'm assuming that empty string was a mistake. There cannot be duplicate outputs; edited. – Doorknob – 2014-11-20T22:56:53.487