How would I convert this encoded emoji to a string...
# announcements
p
How would I convert this encoded emoji to a string?
U+1F3CA-200D-2642-FE0F
d
Copy code
val codePoints = "U+1F3CA-200D-2642-FE0F"
    .substring(2)
    .split('-')
    .map { it.toInt(16) }

val string = buildString {
    for (cp in codePoints) {
        appendCodePoint(cp)
    }
}
Could probably be improved efficiency-wise, but that's the general idea.
p
Thanks, thats super helpful, trying to figure this out for an hour now. Performance doesn't matter at all
This is how it ended up in my code
Copy code
buildString {
  input.drop(2).split('-').forEach {
    appendCodePoint(it.toInt(16))
  }
}
1
But what exactly is happening here? Is it actually 4 emojis that are mangled together?
s
I only see two
d
Emoji are made up of multiple unicode code points. For example this one has a "male modifier" at the end
p
And how would I write the code points if I didnt want them to be merged but wanted to display them one after another in a string?
d
Here you have the "person swimming" emoji plus the "man modifier" codepoint which gives you the "man swimming" emoji
You can't, the modifiers are not designed to be displayed.
They modify what comes before them (similar to how you can turn a into ä with a following modifier code point)
You will get them shown (♂️) if they don't follow a supported emoji or the font doesn't support the merged emoji
p
And if I wanted to unit test on android that all these work correctly, would it work to create a paint; set the typeface I use and call
Paint#hasGlyph
?
d
I can't really help you with Android, sorry.
p
But this thing (swimmer+male) could be concidered as a glyph?
I'm heavily confused by all of this toInt, radix, glyph, surrogate topic thing 😄
d
The technical term is "grapheme", as far as I know. Theroretically there can be as many as you want into one single "emoji" (to e.g. form a family of 2 children and 2 parents from individual person emojis)
well, the toInt and radix is just to parse the hexadecimal unicode codepoint representation
surrogates are an artifact of how UTF-16 (which Java uses!) works
UTF-16 characters are 16 bits, but that's not enough to represent all of unicode (particularly emoji). So that's why there are surrogate characters, that represent one unicode code point as 2 UTF-16 characters
Then on top of that you have the fact that multiple unicode code points can be merged into one "grapheme" (what a user would call "character"). These code points can themselves be made up of 2 utf-16 surrogate characters.
p
danke