2

I am trying to perform a regex that matches if both the word cat AND dog are in the regex with multi-line support

matches

cat asdjfaldsfj dog
####
does NOT match

cat adfasdf8989
####
matches

dog adlsjf88989 cat
####
matches

cat asdf8a89sdf8
a sdf asd f ads f ads fasdf
dog  a dsf ads fads f
asdfadsfadsf

The regex I'm using is pretty simple

/^(?=.*\bcat\b)(?=.*\bdog\b).*$/gs

The problem is that this only finds the first occurrence since it is greedy. I really want the following to count two matches but it only matches once

cat asdf8a89sdf8
a sdf asd f ads f ads fasdf
dog  a dsf ads fads f
asdfadsfadsf
cat asdf8a89sdf8
a sdf asd f ads f ads fasdf
dog  a dsf ads fads f
asdfadsfadsf

Even without the second set of cat STUFF dog STUFF the regex still matches until the end.

SquidZ00
  • 21
  • 2

1 Answers1

0

Some hints, but not a complete answer.

.* with /s is going to eat everything until end of string. Switching to the non-greedy .*? though will match a minimal string; the lookaheads are not forced into the match. My usual strategy for handling that is including anchors in the lookaheads, but multiline matching makes this difficult.

/m will be required if you want to match multiple times within the same string and still use ^$ anchors. Otherwise they match only the beginning and end of string.

Unless you really need a general-case solution, it's probably worth trying one manually ordering your subpatterns, eg:

(?gsmx)(?(DEFINE)
  (?<a>\bcat\b)
  (?<b>\bdog\b)
)
^.*?(?:
      (?&a).*?(?&b)|  # cat before dog
      (?&b).*?(?&a)   # dog before cat
    )[^\n]*
$

There's some really interesting things you can do with recursive subpatterns and relative backreferences but I wasn't able to structure them into a general case for N lookaheads without the number of steps skyrocketing into the 10k+ range.

Andrew Domaszek
  • 5,103
  • 1
  • 14
  • 26