Hi. I'd like to have a string escape sequence that...
# language-evolution
c
Hi. I'd like to have a string escape sequence that makes it easier to include astral code points (value greater than 0xFFFF). So I could write e.g.
"\u{1F995}"
instead of
"\uD83E\uDD95"
or
"🦕"
. Granted, often it's preferable to use the last variant. But there are cases where (programming) fonts are missing glyphs or the code point isn't associated with a visible character. In such cases, having to use escapes of the surrogate characters just doesn't make for a great developer experience. Does this sound like something that has a chance of being added to the language? Or would I be wasting my time if I wrote up a KEEP? (I already have a prototype to add support to the compiler.)
👍 1
c
In my experience, these show up rarely enough that
"\uD83E\uDD95"
is good enough. Would you disagree?
c
Obviously. Otherwise I wouldn't have typed the message 🙂
A lot of other programming languages have added support for an escape sequence that supports non-BMP code points as well.
c
Can these characters be represented by a
Char
?
c
No. That's kind of the "source" of the problem. But I'm not suggesting to figure out a way to support
'\u{1F995}'
. Just
"\u{1F995}"
.
y
What's the problem with writing the character directly in the text? It feels clearer to me, no?
c
I believe the first message already answers that question.
Imagine you want to use a code point that is currently unassigned. There is no glyph associated with that code point.
c
Is that really frequent?
c
Does it matter? Most people don't know very much about Unicode or how strings are encoded in their programming language. That doesn't mean you shouldn't make life easier for those who do know such things. (I'm also offering to do most of the work)
c
Most of the work? Does that include the PRs to GitHub, GitLab, Bitbucket and all other editors so it displays properly?
c
Sure, why not.
c
Alternative proposal:
Copy code
val Long.astral: String get() = TODO()
Instead of:
Copy code
"\u{1F995}"
you get:
Copy code
"${0x1F995L.astral}"
That's slightly more verbose, sure, but it is still quite simple, and more importantly doesn't require a language change.
(if this gets added to the stdlib it will most likely be called
.decodeUnicode
or similar, I guess)
c
That would add runtime overhead (and deviate from what pretty much every other programming language is doing).
c
> That would add runtime overhead not necessarily, there are plenty of methods in the stdlib that are intrinsics
k
Does that include the PRs to GitHub, GitLab, Bitbucket and all other editors so it displays properly?
Why would they need to do that? If you display code in GitHub that has the line
println("\u03C0")
does GitHub display it as
println("π")
?
c
Escape sequences could be highlighted using a different color. But it looks like that's currently not happening. See https://gist.github.com/cketti/db967f067bfc2d227a1f57c224c77be4
j
I think adding
\U
like Python has is pretty reasonable.
c
I like
\u{…}
better. No leading zeros required.