How do I turn a unicode scalar value into a string...
# getting-started
m
How do I turn a unicode scalar value into a string/char? I tried
toChar()
but that fails for 3-byte values like 0x122C5 (CUNEIFORM SIGN SHID TIMES IM)
k
What is Unicode scalar value for you? A String like “0x122C5” (8 distinct ASCII characters)? And you need to convert it to a string that has the matching Unicode character?
m
0x122C5 is the integer value
o
You have to perform the encoding of the int code point to UTF-16 code units yourself. There aren't standard library facilities for that
You can find the algorithm on Wikipedia: https://en.wikipedia.org/wiki/UTF-16#Examples
Copy code
To encode U+10437 (𐐷) to UTF-16:

    Subtract 0x10000 from the code point, leaving 0x0437.
    For the high surrogate, shift right by 10 (divide by 0x400), then add 0xD800, resulting in 0x0001 + 0xD800 = 0xD801.
    For the low surrogate, take the low 10 bits (remainder of dividing by 0x400), then add 0xDC00, resulting in 0x0037 + 0xDC00 = 0xDC37.
d
If you're on JVM, you can use CharsetDecoder process it.
Or String(bytes, charset)