ECMAScript Regex, 733+ 690+ 158 119 118 (117) bytes
My interest in regex has been sparked with renewed vigor after over 4½ years of inactivity. As such, I went in search of more natural number sets and functions to match with unary ECMAScript regexes, resumed improving my regex engine, and started brushing up on PCRE as well.
I'm fascinated by the alienness of constructing mathematical functions in ECMAScript regex. Problems must be approached from an entirely different perspective, and until the arrival of a key insight, it's unknown whether they're solvable at all. It forces casting a much wider net in finding which mathematical properties might be able to be used to make a particular problem solvable.
Matching factorial numbers was a problem I didn't even consider tackling in 2014 – or if I did, only momentarily, dismissing it as too unlikely to be possible. But last month, I realized that it could be done.
As with my other ECMA regex posts, I'll give a warning: I highly recommend learning how to solve unary mathematical problems in ECMAScript regex. It's been a fascinating journey for me, and I don't want to spoil it for anybody who might potentially want to try it themselves, especially those with an interest in number theory. See this earlier post for a list of consecutively spoiler-tagged recommended problems to solve one by one.
So do not read any further if you don't want some advanced unary regex magic spoiled for you. If you do want to take a shot at figuring out this magic yourself, I highly recommend starting by solving some problems in ECMAScript regex as outlined in that post linked above.
This was my idea:
The problem with matching this number set, as with most others, is that in ECMA it's usually not possible to keep track of two changing numbers in a loop. Sometimes they can be multiplexed (e.g. powers of the same base can be added together unambiguously), but it depends on their properties. So I couldn't just start with the input number, and divide it by an incrementally increasing dividend until reaching 1 (or so I thought, at least).
Then I did some research on the multiplicities of prime factors in factorial numbers, and learned that there's a formula for this – and it's one that I could probably implement in an ECMA regex!
After stewing on it for a while, and constructing some other regexes in the meantime, I took up the task of writing the factorial regex. It took a number of hours, but ended up working nicely. As an added bonus, the algorithm could return the inverse factorial as a match. There was no avoiding it, even; by the very nature of how it must be implemented in ECMA, it's necessary to take a guess as to what the inverse factorial is before doing anything else.
The downside was that this algorithm made for a very long regex... but I was pleased that it ended up requiring a technique used in my 651 byte multiplication regex (the one that ended up being obsolete, because a different method made for a 50 byte regex). I had been hoping a problem would pop up that required this trick: Operating on two numbers, which are both powers of the same base, in a loop, by adding them together unambiguously and separating them at each iteration.
But because of the difficulty and length of this algorithm, I used molecular lookaheads (of the form (?*...)
) to implement it. That is a feature not in ECMAScript or any other mainstream regex engine, but one that I had implemented in my engine. Without any captures inside a molecular lookahead, it's functionally equivalent to an atomic lookahead, but with captures it can be very powerful. The engine will backtrack into the lookahead, and this can be used to conjecture a value which cycles through all possibilities (for later testing) without consuming characters of the input. Using them can make for a much cleaner implementation. (Variable-length lookbehind is at the very least equal in power to molecular lookahead, but the latter tends to make for more straightforward and elegant implementations.)
So the 733 and 690 byte lengths do not actually represent ECMAScript-compatible incarnations of the solution – hence the "+" after them; it's surely possible to port that algorithm to pure ECMAScript (which would increase its length quite a bit) but I didn't get around to it... because I thought of a much simpler and more compact algorithm! One that could easily be implemented without molecular lookaheads. It's also significantly faster.
This new one, like the previous, must take a guess at the inverse factorial, cycling through all possibilities and testing them for a match. It divides N by 2 to make room for the work it needs to do, and then seeds a loop in which it will repeatedly divide the input by a divisor that starts at 3 and increments each time. (As such, 1! and 2! can't be matched by the main algorithm, and must be dealt with separately.) The divisor is kept track of by adding it to the running quotient; these two numbers can be unambiguously separated because, assuming M! == N, the running quotient will continue to be divisible by M until it equals M.
This regex does division-by-a-variable in the innermost portion of the loop. The division algorithm is the same as in my other regexes (and similar to the multiplication algorithm): for A≤B, A*B=C if any only if C%A=0 and B is the largest number which satisfies B≤C and C%B=0 and (C-B-(A-1))%(B-1)=0, where C is the dividend, A is the divisor, and B is the quotient. (A similar algorithm can be used for the case that A≥B, and if it is not known how A compares to B, one extra divisibility test is all that is needed.)
So I love that the problem was able to be reduced to even less complexity than my golf-optimized Fibonacci regex, but I do sigh with disappointment that my multiplexing-powers-of-the-same-base technique will have to wait for another problem that actually requires it, because this one doesn't. It's the story of my 651 byte multiplication algorithm being supplanted by a 50 byte one, all over again!
Edit: I was able to drop 1 byte (119 → 118) using a trick found by Grimy that can futher shorten division in the case that the quotient is guaranteed to be greater than or equal to the divisor.
With no further ado, here's the regex:
True/false version (118 bytes):
^((x*)x*)(?=\1$)(?=(xxx\2)+$)((?=\2\3*(x(?!\3)xx(x*)))\6(?=\5+$)(?=((x*)(?=\5(\8*$))x)\7*$)x\9(?=x\6\3+$))*\2\3$|^xx?$
Try it online!
Return inverse factorial or no-match (124 bytes):
^(?=((x*)x*)(?=\1$)(?=(xxx\2)+$)((?=\2\3*(x(?!\3)xx(x*)))\6(?=\5+$)(?=((x*)(?=\5(\8*$))x)\7*$)x\9(?=x\6\3+$))*\2\3$)\3|^xx?$
Try it online!
Return inverse factorial or no-match, in ECMAScript + \K
(120 bytes):
^((x*)x*)(?=\1$)(?=(xxx\2)+$)((?=\2\3*(x(?!\3)xx(x*)))\6(?=\5+$)(?=((x*)(?=\5(\8*$))x)\7*$)x\9(?=x\6\3+$))*\2\K\3$|^xx?$
And the free-spaced version with comments:
^
(?= # Remove this lookahead and the \3 following it, while
# preserving its contents unchanged, to get a 119 byte
# regex that only returns match / no-match.
((x*)x*)(?=\1$) # Assert that tail is even; \1 = tail / 2;
# \2 = (conjectured N for which tail == N!)-3; tail = \1
(?=(xxx\2)+$) # \3 = \2+3 == N; Assert that tail is divisible by \3
# The loop is seeded: X = \1; I = 3; tail = X + I-3
(
(?=\2\3*(x(?!\3)xx(x*))) # \5 = I; \6 = I-3; Assert that \5 <= \3
\6 # tail = X
(?=\5+$) # Assert that tail is divisible by \5
(?=
( # \7 = tail / \5
(x*) # \8 = \7-1
(?=\5(\8*$)) # \9 = tool for making tail = \5\8
x
)
\7*$
)
x\9 # Prepare the next iteration of the loop: X = \7; I += 1;
# tail = X + I-3
(?=x\6\3+$) # Assert that \7 is divisible by \3
)*
\2\3$
)
\3 # Return N, the inverse factorial, as a match
|
^xx?$ # Match 1 and 2, which the main algorithm can't handle
The full history of my golf optimizations of these regexes is on github:
regex for matching factorial numbers - multiplicity-comparing method, with molecular lookahead.txt
regex for matching factorial numbers.txt (the one shown above)
Note that ((x*)x*)
can be changed to ((x*)+)
, dropping the size by 1 byte (to 117 bytes) with no loss of correct functionality – but the regex exponentially explodes in slowness. However, this trick, while it works in PCRE and .NET, does not work in ECMAScript, due to its behavior when encountering a zero-length match in a loop. ((x+)+)
would work in ECMAScript, but this would break the regex, because for \$n=3!\$, \2
needs to capture a value of \$3-3=0\$ (and changing the regex to be 1-indexed would undo the golf benefit of this).
The .NET regex engine does not emulate this behavior in its ECMAScript mode, and thus the 117 byte regex works:
Try it online! (exponential-slowdown version, with .NET regex engine + ECMAScript emulation)
2If the language supports only numbers in the range {0,1}, can I expect the input to always be
1
? – eush77 – 2017-05-20T09:45:53.26311
@eush77 Abusing native number types to trivialize a problem is forbidden by default.
– Dennis – 2017-05-20T17:56:40.6231is 4! a truthy? – tuskiomi – 2017-05-20T22:04:38.523
Question: Why aren't you using the I/O defaults? – CalculatorFeline – 2017-05-22T23:15:58.563