What is the simplest best `Regex` that will never find match kotlinlang #announcements

What is the simplest/best `Regex` that will never ...

Mark

05/04/2021, 3:10 AM

What is the simplest/best

Regex

that will never find/match any input string? Seems that each regex engine out there has a different best practice for making an ‘impossible’ regex.

LeoColman

05/04/2021, 3:12 AM

why tho

Mark

05/04/2021, 3:13 AM

An obscure class needs to implement an interface and it never wants to match. Just like how a subclass might override a function to do nothing.

Mark

05/04/2021, 3:18 AM

What I have so far:

"""\b\B""".toRegex()

https://pl.kotl.in/743V8u8ok

ephemient

05/04/2021, 4:20 AM

I'm partial to

"""\Z.""".toRegex()

(a.k.a.

"$.".toRegex()

, as long as there isn't

options = setOf(RegexOption.MULTILINE, RegexOption.DOT_MATCHES_ALL)

)

ephemient

05/04/2021, 4:21 AM

you could of course do the opposite

""".\A""".toRegex()

(or

".^".toRegex()

, provided there aren't those regex options)

Mark

05/04/2021, 5:40 AM

Yeah, that looks good. How about

^\b$

Mark

05/04/2021, 5:42 AM

https://pl.kotl.in/STNqGwo1r

Mark

05/04/2021, 5:56 AM

It turns out

$.

and

.^

are not efficient. So better to use

^.^

which is just 3 steps regardless of the length of input text. Testing using https://regex101.com/

ephemient

05/04/2021, 5:56 AM

it's in multiline mode by default, take the

off the modifiers list

ephemient

05/04/2021, 5:58 AM

I'm also pretty sure java's regex is a smarter DFA than that

Mark

05/04/2021, 6:03 AM

I can’t get

^.^

to match anything regardless of the multiline flag

ephemient

05/04/2021, 6:04 AM

well yes, you need the /s flag to get . to match a newline character as well

Mark

05/04/2021, 6:05 AM

Hmm, but I don’t want it match a newline character as well. If I use

^.^

it seems to do what I want and is performant

ephemient

05/04/2021, 6:07 AM

- no, you don't want it to match a newline character. - I gave a warning on ^$ etc. due to options changing their behavior, I almost always prefer to use \A\Z instead because they always anchor to start and end, not lines, regardless of option. - the performance characteristics on regex101 are a lie.

ephemient

05/04/2021, 6:08 AM

if you're targeting Java specifically, then a zero-width negative lookahead is another pattern that never matches:

"(?!)".toRegex()

ephemient

05/04/2021, 6:09 AM

it just won't work in Javascript, if you're using Kotlin/JS, though, whereas "$." etc. will

Mark

05/04/2021, 6:09 AM

Ok, I’m with you now.

\A.\A

Mark

05/04/2021, 6:12 AM

From what I can see, that will always fail regardless of flags

ephemient

05/04/2021, 6:15 AM

\Z.\A

, or just

\Z.

.\A

like I suggested at first

Mark

05/04/2021, 6:22 AM

Seems

.\A

is much slower https://pl.kotl.in/bHniyg34T

\Z.

is also much slower, but not nearly as slow as

.\A

I can’t see any benefit of using

\Z.\A

over

\Z.

Vampire

05/04/2021, 7:52 AM

I usually use

[^\w\W]

which tries to match one character that is neither a word nor a non-word character. You can probably speed it up by anchoring in the beginning like

\A[^\w\W]

. If you have possessive quantifiers you can probably also do

.*+.

which should be quite efficient.

👍 1

Mark

05/04/2021, 8:55 AM

@Vampire pretty good,

\A[^\w\W]

is just two steps, so very efficient. https://regex101.com/r/AKtYle/1 https://pl.kotl.in/hdbNH73It

Mark

05/04/2021, 9:03 AM

Another advantage over

\A.\A

is you don’t need to use

@Suppress("RegExpUnexpectedAnchor")

16 Views

Open in Slack

Previous Next