What is the simplest/best `Regex` that will never ...
# announcements
m
What is the simplest/best
Regex
that will never find/match any input string? Seems that each regex engine out there has a different best practice for making an ‘impossible’ regex.
l
why tho
m
An obscure class needs to implement an interface and it never wants to match. Just like how a subclass might override a function to do nothing.
What I have so far:
"""\b\B""".toRegex()
https://pl.kotl.in/743V8u8ok
e
I'm partial to
"""\Z.""".toRegex()
(a.k.a.
"$.".toRegex()
, as long as there isn't
options = setOf(RegexOption.MULTILINE, RegexOption.DOT_MATCHES_ALL)
)
you could of course do the opposite
""".\A""".toRegex()
(or
".^".toRegex()
, provided there aren't those regex options)
m
It turns out
$.
and
.^
are not efficient. So better to use
^.^
which is just 3 steps regardless of the length of input text. Testing using https://regex101.com/
e
it's in multiline mode by default, take the
m
off the modifiers list
I'm also pretty sure java's regex is a smarter DFA than that
m
I can’t get
^.^
to match anything regardless of the multiline flag
e
well yes, you need the /s flag to get . to match a newline character as well
m
Hmm, but I don’t want it match a newline character as well. If I use
^.^
it seems to do what I want and is performant
e
- no, you don't want it to match a newline character. - I gave a warning on ^$ etc. due to options changing their behavior, I almost always prefer to use \A\Z instead because they always anchor to start and end, not lines, regardless of option. - the performance characteristics on regex101 are a lie.
if you're targeting Java specifically, then a zero-width negative lookahead is another pattern that never matches:
"(?!)".toRegex()
it just won't work in Javascript, if you're using Kotlin/JS, though, whereas "$." etc. will
m
Ok, I’m with you now.
\A.\A
From what I can see, that will always fail regardless of flags
e
or
\Z.\A
, or just
\Z.
or
.\A
like I suggested at first
m
Seems
.\A
is much slower https://pl.kotl.in/bHniyg34T
\Z.
is also much slower, but not nearly as slow as
.\A
I can’t see any benefit of using
\Z.\A
over
\Z.
v
I usually use
[^\w\W]
which tries to match one character that is neither a word nor a non-word character. You can probably speed it up by anchoring in the beginning like
\A[^\w\W]
. If you have possessive quantifiers you can probably also do
.*+.
which should be quite efficient.
👍 1
m
@Vampire pretty good,
\A[^\w\W]
is just two steps, so very efficient. https://regex101.com/r/AKtYle/1 https://pl.kotl.in/hdbNH73It
Another advantage over
\A.\A
is you don’t need to use
@Suppress("RegExpUnexpectedAnchor")