https://kotlinlang.org logo
#announcements
Title
# announcements
p

Paul Woitaschek

09/17/2019, 9:34 AM
How would I convert this encoded emoji to a string?
U+1F3CA-200D-2642-FE0F
d

diesieben07

09/17/2019, 9:39 AM
Copy code
val codePoints = "U+1F3CA-200D-2642-FE0F"
    .substring(2)
    .split('-')
    .map { it.toInt(16) }

val string = buildString {
    for (cp in codePoints) {
        appendCodePoint(cp)
    }
}
Could probably be improved efficiency-wise, but that's the general idea.
p

Paul Woitaschek

09/17/2019, 9:50 AM
Thanks, thats super helpful, trying to figure this out for an hour now. Performance doesn't matter at all
This is how it ended up in my code
Copy code
buildString {
  input.drop(2).split('-').forEach {
    appendCodePoint(it.toInt(16))
  }
}
1
But what exactly is happening here? Is it actually 4 emojis that are mangled together?
s

spand

09/17/2019, 9:53 AM
I only see two
d

diesieben07

09/17/2019, 9:54 AM
Emoji are made up of multiple unicode code points. For example this one has a "male modifier" at the end
p

Paul Woitaschek

09/17/2019, 9:55 AM
And how would I write the code points if I didnt want them to be merged but wanted to display them one after another in a string?
d

diesieben07

09/17/2019, 9:55 AM
Here you have the "person swimming" emoji plus the "man modifier" codepoint which gives you the "man swimming" emoji
You can't, the modifiers are not designed to be displayed.
They modify what comes before them (similar to how you can turn a into ä with a following modifier code point)
You will get them shown (♂️) if they don't follow a supported emoji or the font doesn't support the merged emoji
p

Paul Woitaschek

09/17/2019, 9:59 AM
And if I wanted to unit test on android that all these work correctly, would it work to create a paint; set the typeface I use and call
Paint#hasGlyph
?
d

diesieben07

09/17/2019, 9:59 AM
I can't really help you with Android, sorry.
p

Paul Woitaschek

09/17/2019, 10:00 AM
But this thing (swimmer+male) could be concidered as a glyph?
I'm heavily confused by all of this toInt, radix, glyph, surrogate topic thing 😄
d

diesieben07

09/17/2019, 10:01 AM
The technical term is "grapheme", as far as I know. Theroretically there can be as many as you want into one single "emoji" (to e.g. form a family of 2 children and 2 parents from individual person emojis)
well, the toInt and radix is just to parse the hexadecimal unicode codepoint representation
surrogates are an artifact of how UTF-16 (which Java uses!) works
UTF-16 characters are 16 bits, but that's not enough to represent all of unicode (particularly emoji). So that's why there are surrogate characters, that represent one unicode code point as 2 UTF-16 characters
Then on top of that you have the fact that multiple unicode code points can be merged into one "grapheme" (what a user would call "character"). These code points can themselves be made up of 2 utf-16 surrogate characters.
p

Paul Woitaschek

09/17/2019, 10:11 AM
danke
47 Views