https://kotlinlang.org logo
#kotlin-native
Title
# kotlin-native
e

Edoardo Luppi

11/07/2023, 1:14 PM
Is there a way to get the amount of allocated bytes for a
String
in Native? For context, I need to pass the size in bytes to
MultiByteToWideChar
Copy code
MultiByteToWideChar(CP_UTF8.toUInt(), 0u, this /* String */, /* Length in bytes */, null, 0)
m

mbonnin

11/07/2023, 1:18 PM
You can encode your
String
to a
ByteArray
first?
Or does
MultiByteToWideChar
take a
String
as input? (in which case the API is a bit weird)
Pointer to the character string to convert.
What the hell is a "character string"?
e

Edoardo Luppi

11/07/2023, 1:21 PM
MultiByteToWideChar
serves the purpose of converting the UTF-8 Kotlin String to an UTF-16 buffer. The function requires the UTF-8 String length in bytes
m

mbonnin

11/07/2023, 1:22 PM
Are string UTF-8 in K/N ? (in JVM they are UTF-16)
https://kotlinlang.org/docs/mapping-strings-from-c.html Kotlin/Native uses UTF-8 character encoding by default.
All good then, you can pass -1 to
cbMultiByte
Copy code
Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.
I'm assuming the Kotlin string is null terminated but anything else would really be asking for trouble
e

Edoardo Luppi

11/07/2023, 1:30 PM
you can pass -1
That's probably the fastest way, although I'd have preferred not including the NULL termination. Also I don't take for granted it's NULL terminated. See this for UTF-8/16 in Native https://kotlinlang.slack.com/archives/C3SGXARS6/p1699288840089989?thread_ts=1699264329.174739&cid=C3SGXARS6
👀 1
m

mbonnin

11/07/2023, 1:31 PM
I'd have preferred not including the NULL termination
I don't think you have the choice? The Kotlin runtime will add it for you?
Kotlin maps
_In_NLS_string_(cbMultiByte)LPCCH lpMultiByteStr
to
String
because I'm guessing it's all
const char *
under the hood
Copy code
Unlike other pointers, the parameters of type const char* are represented as a Kotlin String.
This is convenient in your case, just pass the Kotlin String and the underlying
const char *
pointer will be used, which should really always contain a null terminator
I wonder how are subscript handled. If you do
Copy code
val a  = "hello"
    for (i in a.indices) {
      println(a.get(i))
    }
is it going to scan the utf8 representation n times? That would be not cool (could be avoided by programming differently I guess but would be good to know)
But the fact that
const char *
is mapped to
String
is a strong indicator that the internal representation is a null-terminated utf-8 string
e

Edoardo Luppi

11/07/2023, 1:42 PM
Sorry was in call lol
The Kotlin runtime will add it for you
But in this context I'm working purely with WinAPIs. What I'm doing is converting the UTF-8 String I've read with Okio to the UTF-16 buffer, which I then convert to ISO-8859-1 with
WideCharToMultiByte
Copy code
internal actual fun FileSystem.writeLatin1(path: Path, content: String) {
  write(path, mustCreate = false) {
    val utf16 = content.toUtf16String()
    val latin1 = utf16.toLatin1String()
    write(latin1.buffer) // Just write bytes
  }
}
I guess I could just strip away the last NULL byte?
m

mbonnin

11/07/2023, 1:45 PM
Gosh windows API...
Can't you convert from UTF-8 to ISO-8859 directly?
e

Edoardo Luppi

11/07/2023, 1:45 PM
Nope, that I know of you need to pass to UTF-16 first, which is the native Windows encoding.
m

mbonnin

11/07/2023, 1:46 PM
WideCharToMultiByte
seems to take a
char *
as input though
side note: there's a bunch of unicode codepoints that don't have a representation in ISO-8859-1, right?
So your conversion might have some loss
e

Edoardo Luppi

11/07/2023, 1:50 PM
WideCharToMultiByte
seems to take a
char *
as input though
It's a wide char, so not possible to pass in a Kotlin
String
.
So your conversion might have some loss
All this for reading .properties files used in a Java context. So I'm reading using UTF-8 > No data loss > manipulate > convert back to ISO > write to file
m

mbonnin

11/07/2023, 1:54 PM
lpMultiByteStr is a
Copy code
typedef CONST CHAR * LPCCH
So it's not a wide char?
e

Edoardo Luppi

11/07/2023, 1:55 PM
But in the
WideCharToMultiByte
context,
lpMultiByteStr
is the output, so the only value you can pass is a buffer that's going to be filled by the API.
m

mbonnin

11/07/2023, 1:56 PM
Aaahhhh
I think it's input? Or I am looking at the wrong place?
e

Edoardo Luppi

11/07/2023, 1:58 PM
You're looking at
MultiByteToWideChar
which is the step to convert from UTF-8 (multi byte) to UTF-16 (wide chars)
There is no API to convert between multi byte
m

mbonnin

11/07/2023, 1:59 PM
Aaah gotcha 👍
TBH At this point I would write the ISO-8859 decoder myself using okio and
readUtf8CodePoint()
ISO-8859 is 191 char, 128 of them should be pretty straightforward
leaves you with a lookup table of 63 chars
e

Edoardo Luppi

11/07/2023, 2:01 PM
That's probably a better alternative, you're right. It also avoids dealing with Windows APIs, thus making it to the common source set
m

mbonnin

11/07/2023, 2:01 PM
And yea, no LPWCHARRSTRCHGRAAA 😂
e

Edoardo Luppi

11/07/2023, 2:02 PM
Hahahaha my god, that Windows naming convention drives me crazy every time. A typealias that typealias a typealias for a typealias
💯 1
Btw in the KMP survey I wrote down MinGW needs more love. Windows support seem abandoned
m

mbonnin

11/07/2023, 2:12 PM
Got curious and wrote this:
Copy code
val buffer = Buffer().writeUtf8("éèùô")

        val wideChars = allocArray<WCHARVar>(50)

        var converted = MultiByteToWideChar(
            CodePage = CP_UTF8.toUInt(),
            dwFlags = 0u,
            lpMultiByteStr = buffer.readUtf8(),
            cbMultiByte = -1,
            lpWideCharStr = wideChars,
            cchWideChar = 50
        )

        val iso8851 = allocArray<CHARVar>(50)
        converted = WideCharToMultiByte(
            CodePage = 28591.convert(),
            dwFlags = 0u,
            lpWideCharStr = wideChars,
            cchWideChar = converted,
            lpMultiByteStr = iso8851,
            50,
            null,
            null
        )
but I can't run it on MacOS 🤦‍♂️
Anyways, I'll leave the windows API alone.
commonMain
FTW!
e

Edoardo Luppi

11/07/2023, 2:13 PM
😂 I'll take your suggestion and use the UTF-8 code points
😄 1
2 Views