Duct Tape a Regex Decider

11

Your task is to create a program that determines whether a given string is a valid regular expression or not using code snippets sourced from sites on the StackExchange network.

For the purposes of this challenge, the regular expression dialect will be a stripped down and mostly minimal set of meta-characters: ()*?|\. As such, you will not be able to use built-in regex parsers.

  • \ is used to escape meta-characters. It must be followed by a meta-character.
  • Unescaped parentheses must be balanced
  • * and ? must be preceded by either a non-meta-character, a parenthesized group, or an escaped meta-character.
  • All other printable ASCII characters plus newline, tab, and space must be supported as non-meta characters. What happens with a string containing other characters is undefined.
  • The actual meaning of the regex is not important for this challenge.

Examples

Truthy:
  abc
  a?
  (a|)*
  ()
  a|b*
  \*
  \\
  \\*
  a*b?(cd|e)
  +
  [
  }
  (123\))*
  \|
  (a(b(c|d)*e)*f)*
  (|\)*)
  (abc)+*
  (abc)+
  +abc

^ last test case is an actual newline

Falsy:
  ?abc
  *
  **
  \
  (
  a*?
  a?*
  ?
  a)
  (\)
  (|\)*
  \()
  |*
  (?:abc)
  \\**
  \n

Scoring

Your overall score is the number of snippets taken from questions and answers around StackExchange.

  • Repeated snippets count for as many times as they are used.
  • Whitespace can be added and removed freely (because of Python, Haskell, and other whitespace-sensitive languages) and does not count toward your snippet count.
    • The exception would be if your code is actually written in Whitespace.
  • Snippets are allowed from any StackExchange site as long as they come from questions, answers, and comments that are older (Including by edit time - use older revisions if necessary) than this challenge. (Sep 24, 2019 @ 3:30 PM UTC)
  • Snippets can come from anywhere in a question, answer, or comment body, whether it's in a preformatted code block or not.
  • Splicing a snippet into the middle of another causes the outer snippet to count as two snippets

Lowest score wins!

Beefster

Posted 2019-09-24T15:41:17.387

Reputation: 6 651

1@RobinRyder yes, changed – Beefster – 2019-09-24T20:26:43.887

Can the post be older than or equal to this challenge, i.e. can we use snippets from the body of this challenge? – Jo King – 2019-09-24T22:39:47.530

1"As such, you will not be able to use built-in regex parsers" Is that to say that its designed to thwart using that for a simple ya/nay, or that we are forbidden from using regex at all in our answers? – user0721090601 – 2019-09-24T23:24:53.230

@guifa it's designed so that you can't just take your language's regex engine and see if it compiles the given regex. Every language I know of supports a larger set of meta-characters and specialized capture groups, so they wouldn't match this set of characters correctly in all cases. – Beefster – 2019-09-25T14:05:28.890

@JoKing, I'm going to say no on that. – Beefster – 2019-09-25T14:06:37.880

Does it have to be more than one snippet? Is the program allowed to have output? – S.S. Anne – 2019-09-25T20:56:51.643

What if we cut out a part from the middle? – S.S. Anne – 2019-09-25T20:58:24.690

1@JL2210 That would make it two snippets: one for the beginning and one for the end. You can use a single snippet as long as it passes all the test cases and comes from an answer/question/post that's older than this challenge – Beefster – 2019-09-25T21:03:42.237

Unused, empty input? – S.S. Anne – 2019-09-25T21:07:12.070

Must the snippet be in the same language as the submission? – boboquack – 2019-09-26T02:05:47.567

@boboquack very yes. It doesn't even need to be a programming language. – Beefster – 2019-09-26T04:33:58.713

Answers

6

Perl 6, 20 snippets

{$_ eq m/[[<-[()*?|\\]>|\\<[()*?|\\]>|'(' <~~>* ')']<[*?]>?|\|]+/}

Try it online!

The snippets are taken from:

{$_ eq, m/[, <-[, ()*?, |\\, ]>, |\\, <[, ()*?, |\\, ]>, |, '(' <~~>* ')', <[, *?, ]>, ?|, \|, ]+/, }.

This is mostly the greedy approach (made obvious by all the one or two character snippets). I used SymbolHound to search for the individual characters, and the only real optimisation was the '(' <~~>* ')' snippet, which is taken from my own answer on recursive Perl 6 regexes.

Explanation:

This basically checks if the input is equal to a greedy match of a valid regex. The reason we can't just use the regex itself and add ^$ to mark the ends is because we are using a recursive regex, which wouldn't work if there were ^$ markers. The regex itself is:

m/[                             ]+/   # Match one or more times
   [              ]  # Any of 
    <-[()*?|\\]> |     # Not a metacharacter
    \\<[()*?|\\]>      # A metacharacter preceded by a \
    '(' <~~>* ')'      # Brackets surrounding a valid regex
                   <[*?]>?  # Optionally followed by a ? or *
                           | \|    # Or just the | metacharacter

Jo King

Posted 2019-09-24T15:41:17.387

Reputation: 38 234

TIL ~~, thanks! – user0721090601 – 2019-09-25T04:54:29.593

@guifa Yes, I learnt that through the P6 specification, which has a lot of things that haven't been properly documented yet. I suspect ~~ doesn't appear because it is not yet fully implemented (for example <~~0>), though there are other hidden gems in there.

– Jo King – 2019-09-25T06:44:58.467