All of these regexes use their engines' respective case insensitivity flags, so that has not been counted towards the byte counts. Even though some use \pL
(a shorthand for \p{L}
) instead of [A-Z]
, they still need the flag, due to comparing characters via backreference.
Regex (Perl 5 / PCRE / Boost / Python 3), 21, 22, 24, or 25 bytes
Quite simply, this regex works by asserting that there are at least 26 alphabetical characters in the string that do not match with any character to the right of themselves. Each of these characters will be the last occurrence of a given letter of the alphabet in the string; as such, they are all guaranteed to be different letters, and asserting that there are 26 of them implies that the full set A-Z is covered.
(.*(\pL)(?!.*\2)){26}
(21 bytes) - Try it online! (Perl 5)
Using \pL
instead of [A-Z]
to match alphabetical characters imposes the requirement that the input must be in ASCII, because \pL
matches all Unicode alphabetical characters (including those that are extended ASCII in common codepages); so an accented letter, for example, would count towards consuming the target of 26 loop iterations. Some other regex engines only support this syntax in the form of \p{L}
(which offers no advantage over [A-Z]
for this particular problem, being equal in length) and not with the shortened syntax of \pL
.
This bare-bones version of the regex is extremely slow, due to an excessively huge amount of backtracking both for pangrams and non-pangrams, but will always give the correct result when given enough time. Its search for the 26 matches proceeds in the most pessimal way possible.
Using a lazy quantifier speeds it up greatly when matching pangrams, but it's still incredibly slow to yield non-matches when given non-pangrams:
(.*?(\pL)(?!.*\2)){26}
(22 bytes)
Try it online! (Perl 5)
Try it online! (PCRE2 / C++, with backtrack limit adjusted)
Try it online! (PCRE2 / PHP, with backtrack limit adjusted)
Changing the main loop to an atomic group allows the regex to also yield non-matches at a reasonable speed:
(?>.*?(\pL)(?!.*\1)){26}
(24 bytes) - Try it on regex101 (PCRE1)
Adding an anchor is not necessary to make the regex yield correct results, but speeds up non-matches further, by preventing the regex engine from continuing to try for a match at every character of the string (because if it fails to match at the beginning, we know it's guaranteed not to match anywhere in the same string, but the regex engine can't know that):
Try it online! (Python 3) (24 bytes) - anchor is implied
^(?>.*?(\pL)(?!.*\1)){26}
(25 bytes)
Try it on regex101 (PCRE1)
Try it online! (Perl 5)
Try it online! (PCRE2 / C++)
Try it online! (PCRE2 / PHP)
Try it online! (Boost / C++)
Regex (.NET / Java / Ruby), 23, 24, 26, or 27 bytes
These regex engines don't support \pL
(they support \p{L}
, but that isn't useful when we already need the case insensitivity flag anyway), thus [A-Z]
is used:
(.*([A-Z])(?!.*\2)){26}
(23 bytes)
(.*?([A-Z])(?!.*\2)){26}
(24 bytes)
(?>.*?([A-Z])(?!.*\1)){26}
(26 bytes)
^(?>.*?([A-Z])(?!.*\1)){26}
(27 bytes)
^ # 1. Anchor to the start of the string (without this, the regex would still
# work but would be slower in its non-matches, due to trying to match at
# every character position in the string, which we know will fail, but
# the regex engine can't know)
(?> # 2. Start an atomic group (every complete iteration of matching this group
# is set in stone, and will not be backtracked); without only a normal
# group, the regex would still work, but with most non-pangram inputs,
# would take longer than the age of the universe to yield a non-match,
# due to trying every way of matching ".*?" at every iteration of the loop
.*? # 3. Skip zero or more characters, as few as possible in order to make the
# following match
([A-Z]) # 4. Capture and consume an alphabetical character in \1
(?! # 5. Negative lookahead: Match outside (with zero-width) only if the inside
# does not match
.* # 6. Skip forward by zero characters or more in an attempt to make the
# following expresison match
\1 # 7. Match the character captured in \1
) # 8. The effect of this negative lookahead is to assert that at no point
# right of where \1 was captured does any character match \1
){26} # 9. Only match if this group can be matched exactly 26 times in a row,
# i.e. if we can find 26 characters in the range A-Z that don't match
# any character right of themselves, i.e. that we can find the last
# occurrence of 26 different alphabetical character in the string.
Try it online! (.NET / C#)
Try it online! (Java)
Try it online! (Ruby)
Regex (ECMAScript), 23, 24, 32, or 33 bytes
The same progression of speed applies:
(.*([A-Z])(?!.*\2)){26}
(23 bytes) - Try it online!
(.*?([A-Z])(?!.*\2)){26}
(24 bytes) - Try it online!
ECMAScript lacks atomic groups, so they must be emulated using lookahead+capture+backref:
((?=(.*?([A-Z])(?!.*\3)))\2){26}
(32 bytes) - Try it online!
^((?=(.*?([A-Z])(?!.*\3)))\2){26}
(33 bytes) - Try it online!
Edit: Silly me, I assumed that this problem required variable-length lookbehind or other tricks that substitute for it, without even trying to do it without that. Thanks to @Neil for pointing this out.
3Plus points if your code can check if input is a Pungram. – timmyRS – 2016-05-17T16:43:08.570
6Question name request: Did the quick brown fox jump over the lazy dog? – None – 2016-08-04T11:59:32.633