Is unicode encoding in string literals working as ...
# announcements
b
Is unicode encoding in string literals working as it is supposed to? I’m trying to encode a Flag emoji character. They are encoded as to regional indicators, basically the 2 char country iso code. So a german flag would be
\u1f1e9\u1f1ea
but for some reason only the first 4 chars of each unicode codepoint are recognized as part of the character. Do I have to encode unicode codepoints as a surrogate pair if they aren’t part of the BMP or is there a better way to do this?
s
Before I go dig up my code that uses it are you sure your editor can display it ?
b
No, not really. I’m working on a game engine so this is part of a test I’m writting. I also printed all the codepoints of the string and the first codepoint is
\u1f1e
followed by a
9
Also looking at intellij syntax higlighting, the 9 has a different color than the rest of the unicode char
s
We dont use it often but Im pretty sure it works:
Copy code
private val wantedRegex: Regex by lazy { "^[a-zA-Z]{2}$".toRegex() }
private fun isoToUnicodeFlag(iso: String) : String? {
    return if (wantedRegex.matches(iso)) {
        val first = charToFunkyIsoChar(iso[0].toLowerCase())
        val second = charToFunkyIsoChar(iso[1].toLowerCase())
        "\uD83C$first\uD83C$second"
    } else null
}

//<https://en.wikipedia.org/wiki/Regional_Indicator_Symbol#Unicode_Block>
private fun charToFunkyIsoChar(char: Char) : Char {
    return ((char.toInt() - 'a'.toInt()) + '\uDDE6'.toInt()).toChar()
}
Its odd that I dont use the same \u1f1e as you do
b
You are encoding them as surrogate pairs. This means you represent them as if they are saved in utf16. The real codepoint is U+1f1e9 which is also the representation in utf32.
I guess the problem is that the jvm represents strings in utf16 so a char literal can’t contain any unicode characters that need more than 16 bits, even if the literal is used for strings.
s
Ah right
Yeah
b
“Ah right, Yeah” — my reaction exactly. I’m trying for the last 3 days to wrap my head around unicode and how to use it on the jvm. Ever tried to ask kotlin what the length of this string is
"👨‍👩‍👦‍👦"
? I think the answer is 6 or 7.
s
Unicode is always a headscratcher 🙃
🙂 1
178 Views