Edoardo Luppi
11/07/2023, 1:14 PMString
in Native?
For context, I need to pass the size in bytes to MultiByteToWideChar
MultiByteToWideChar(CP_UTF8.toUInt(), 0u, this /* String */, /* Length in bytes */, null, 0)
mbonnin
11/07/2023, 1:18 PMString
to a ByteArray
first?MultiByteToWideChar
take a String
as input? (in which case the API is a bit weird)Pointer to the character string to convert.What the hell is a "character string"?
Edoardo Luppi
11/07/2023, 1:21 PMMultiByteToWideChar
serves the purpose of converting the UTF-8 Kotlin String to an UTF-16 buffer. The function requires the UTF-8 String length in bytesmbonnin
11/07/2023, 1:22 PMcbMultiByte
Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.
Edoardo Luppi
11/07/2023, 1:30 PMyou can pass -1That's probably the fastest way, although I'd have preferred not including the NULL termination. Also I don't take for granted it's NULL terminated. See this for UTF-8/16 in Native https://kotlinlang.slack.com/archives/C3SGXARS6/p1699288840089989?thread_ts=1699264329.174739&cid=C3SGXARS6
mbonnin
11/07/2023, 1:31 PMI'd have preferred not including the NULL terminationI don't think you have the choice? The Kotlin runtime will add it for you?
_In_NLS_string_(cbMultiByte)LPCCH lpMultiByteStr
to String
because I'm guessing it's all const char *
under the hoodUnlike other pointers, the parameters of type const char* are represented as a Kotlin String.
const char *
pointer will be used, which should really always contain a null terminatorval a = "hello"
for (i in a.indices) {
println(a.get(i))
}
is it going to scan the utf8 representation n times? That would be not cool (could be avoided by programming differently I guess but would be good to know)const char *
is mapped to String
is a strong indicator that the internal representation is a null-terminated utf-8 stringEdoardo Luppi
11/07/2023, 1:42 PMThe Kotlin runtime will add it for youBut in this context I'm working purely with WinAPIs. What I'm doing is converting the UTF-8 String I've read with Okio to the UTF-16 buffer, which I then convert to ISO-8859-1 with
WideCharToMultiByte
internal actual fun FileSystem.writeLatin1(path: Path, content: String) {
write(path, mustCreate = false) {
val utf16 = content.toUtf16String()
val latin1 = utf16.toLatin1String()
write(latin1.buffer) // Just write bytes
}
}
mbonnin
11/07/2023, 1:45 PMEdoardo Luppi
11/07/2023, 1:45 PMmbonnin
11/07/2023, 1:46 PMWideCharToMultiByte
seems to take a char *
as input thoughEdoardo Luppi
11/07/2023, 1:50 PMIt's a wide char, so not possible to pass in a Kotlinseems to take aWideCharToMultiByte
as input thoughchar *
String
.
So your conversion might have some lossAll this for reading .properties files used in a Java context. So I'm reading using UTF-8 > No data loss > manipulate > convert back to ISO > write to file
mbonnin
11/07/2023, 1:54 PMtypedef CONST CHAR * LPCCH
So it's not a wide char?Edoardo Luppi
11/07/2023, 1:55 PMWideCharToMultiByte
context, lpMultiByteStr
is the output, so the only value you can pass is a buffer that's going to be filled by the API.mbonnin
11/07/2023, 1:56 PMEdoardo Luppi
11/07/2023, 1:58 PMMultiByteToWideChar
which is the step to convert from UTF-8 (multi byte) to UTF-16 (wide chars)mbonnin
11/07/2023, 1:59 PMreadUtf8CodePoint()
Edoardo Luppi
11/07/2023, 2:01 PMmbonnin
11/07/2023, 2:01 PMEdoardo Luppi
11/07/2023, 2:02 PMmbonnin
11/07/2023, 2:12 PMval buffer = Buffer().writeUtf8("éèùô")
val wideChars = allocArray<WCHARVar>(50)
var converted = MultiByteToWideChar(
CodePage = CP_UTF8.toUInt(),
dwFlags = 0u,
lpMultiByteStr = buffer.readUtf8(),
cbMultiByte = -1,
lpWideCharStr = wideChars,
cchWideChar = 50
)
val iso8851 = allocArray<CHARVar>(50)
converted = WideCharToMultiByte(
CodePage = 28591.convert(),
dwFlags = 0u,
lpWideCharStr = wideChars,
cchWideChar = converted,
lpMultiByteStr = iso8851,
50,
null,
null
)
but I can't run it on MacOS 🤦♂️commonMain
FTW!Edoardo Luppi
11/07/2023, 2:13 PM