Vampire
07/07/2023, 1:36 PMhho
07/07/2023, 2:11 PMVampire
07/07/2023, 2:27 PMVampire
07/07/2023, 2:28 PM"$foo".replace("""\W++""".toRegex(), "_")
Joffrey
07/07/2023, 2:45 PM.
or -
?Joffrey
07/07/2023, 2:45 PM+
unnecessary?Robert Williams
07/07/2023, 2:50 PMVampire
07/07/2023, 2:50 PMDoesn't this rule out a bunch of valid thingsYes, it is just the ultra-nuclear-option. 😄
Vampire
07/07/2023, 2:51 PMAlso, am I about to learn something, or is the secondActually as it is in the end of the regex it is indeed unncessary hereunnecessary?+
Joffrey
07/07/2023, 2:51 PM++
is different from a single +
?Vampire
07/07/2023, 3:01 PM0+2
, the regex engine matches 000
then sees 1 is not 2. Then it backtracks and matches 00
then sees 0 is not 2. Then it backtracks and matches 0
then sees 0 is not 2. Now there is no way to backtrack more and the match fails.
If you match with 0++2
, it matches 000
, sees 1 is not 2. As ++
is possessive it cannot backtrack into the match and immediately the match fails.
But if used wrongly, you can get false-negatives.
If you for example match 0++01
against 0001
you will not have a match either as the 0++
possesively matches the 000
and 1 != 0 as next character.
So the match fails, while with 0+01
it would have succeeded.
Simplified, if the thing you repeat cannot appear directly after it, you can make the quantifier possessive to get the fail-fast behavior.Vampire
07/07/2023, 3:02 PMJoffrey
07/07/2023, 3:06 PM?
), but didn't know about this third possibility. Thanks.Vampire
07/07/2023, 3:07 PMVampire
07/07/2023, 3:07 PMVampire
07/07/2023, 3:08 PMAdam S
07/07/2023, 3:13 PMIs there some built-in utility method to sanitize a String for usage in a File name/path?sanitize as in ‘make look pretty’ or also check to make sure the path is absolute? E.g.
"../test 123"
is invalid because it’s relative, or it would be prettified to "___test_123"
?Chris Lee
07/07/2023, 3:18 PMinvalid because it’s relative“invalid” is situational here - a relative path is a valid path from the OS’ perspective; it may not be valid for your use case. Ditto for HTTP URLs: /something/../whatever is a valid HTTP URL path - but your app may not permit relative paths (I always prevent relative paths by default here for security reasons).
Vampire
07/07/2023, 3:20 PMAdam S
07/07/2023, 3:20 PM.
and /
are valid path characters, but in some contexts they’re not permissibleVampire
07/07/2023, 3:21 PM"$foo".replace("""[^\w.-]++""".toRegex(), "_").removeSuffix("_")
Adam S
07/07/2023, 3:22 PMVampire
07/07/2023, 3:34 PMRobert Williams
07/07/2023, 3:40 PM_
Vampire
07/07/2023, 3:40 PMJoffrey
07/07/2023, 3:43 PM_
. Depending on your case, it may or may not be a problem.
The URLEncoder
approach doesn't suffer from this problemVampire
07/07/2023, 3:44 PMAdam S
07/07/2023, 4:23 PMVampire
07/07/2023, 4:29 PMAdam S
07/07/2023, 4:29 PMRuckus
07/08/2023, 1:58 AM:
is a valid character in Linux, but not Windows. When I need to worry about platform independent names, I like to limit the POSIX portable set, but as far as how to sanitize is a different can of worms. For example, do you want to remove invalid characters/combinations or replace them?Ruckus
07/08/2023, 2:01 AM