How would I convert this encoded emoji to a string `U+1F3CA kotlinlang #announcements

How would I convert this encoded emoji to a string...

Paul Woitaschek

09/17/2019, 9:34 AM

How would I convert this encoded emoji to a string?

U+1F3CA-200D-2642-FE0F

diesieben07

09/17/2019, 9:39 AM

Copy code

val codePoints = "U+1F3CA-200D-2642-FE0F"
    .substring(2)
    .split('-')
    .map { it.toInt(16) }

val string = buildString {
    for (cp in codePoints) {
        appendCodePoint(cp)
    }
}

diesieben07

09/17/2019, 9:46 AM

Could probably be improved efficiency-wise, but that's the general idea.

Paul Woitaschek

09/17/2019, 9:50 AM

Thanks, thats super helpful, trying to figure this out for an hour now. Performance doesn't matter at all

Paul Woitaschek

09/17/2019, 9:51 AM

This is how it ended up in my code

Copy code

buildString {
  input.drop(2).split('-').forEach {
    appendCodePoint(it.toInt(16))
  }
}

➕ 1

Paul Woitaschek

09/17/2019, 9:52 AM

But what exactly is happening here? Is it actually 4 emojis that are mangled together?

spand

09/17/2019, 9:53 AM

I only see two

diesieben07

09/17/2019, 9:54 AM

Emoji are made up of multiple unicode code points. For example this one has a "male modifier" at the end

Paul Woitaschek

09/17/2019, 9:55 AM

And how would I write the code points if I didnt want them to be merged but wanted to display them one after another in a string?

diesieben07

09/17/2019, 9:55 AM

Here you have the "person swimming" emoji plus the "man modifier" codepoint which gives you the "man swimming" emoji

diesieben07

09/17/2019, 9:55 AM

You can't, the modifiers are not designed to be displayed.

diesieben07

09/17/2019, 9:56 AM

They modify what comes before them (similar to how you can turn a into ä with a following modifier code point)

diesieben07

09/17/2019, 9:56 AM

You will get them shown (♂️) if they don't follow a supported emoji or the font doesn't support the merged emoji

Paul Woitaschek

09/17/2019, 9:59 AM

And if I wanted to unit test on android that all these work correctly, would it work to create a paint; set the typeface I use and call

Paint#hasGlyph

diesieben07

09/17/2019, 9:59 AM

I can't really help you with Android, sorry.

Paul Woitaschek

09/17/2019, 10:00 AM

But this thing (swimmer+male) could be concidered as a glyph?

Paul Woitaschek

09/17/2019, 10:01 AM

I'm heavily confused by all of this toInt, radix, glyph, surrogate topic thing 😄

diesieben07

09/17/2019, 10:01 AM

The technical term is "grapheme", as far as I know. Theroretically there can be as many as you want into one single "emoji" (to e.g. form a family of 2 children and 2 parents from individual person emojis)

diesieben07

09/17/2019, 10:01 AM

well, the toInt and radix is just to parse the hexadecimal unicode codepoint representation

diesieben07

09/17/2019, 10:02 AM

surrogates are an artifact of how UTF-16 (which Java uses!) works

diesieben07

09/17/2019, 10:02 AM

UTF-16 characters are 16 bits, but that's not enough to represent all of unicode (particularly emoji). So that's why there are surrogate characters, that represent one unicode code point as 2 UTF-16 characters

diesieben07

09/17/2019, 10:04 AM

Then on top of that you have the fact that multiple unicode code points can be merged into one "grapheme" (what a user would call "character"). These code points can themselves be made up of 2 utf-16 surrogate characters.

Paul Woitaschek

09/17/2019, 10:11 AM

danke

97 Views

Open in Slack

Previous Next