Is there a way to get the amount of allocated bytes for a `S kotlinlang #kotlin-native

Is there a way to get the amount of allocated byte...

Edoardo Luppi

11/07/2023, 1:14 PM

Is there a way to get the amount of allocated bytes for a

String

in Native? For context, I need to pass the size in bytes to

MultiByteToWideChar

Copy code

MultiByteToWideChar(CP_UTF8.toUInt(), 0u, this /* String */, /* Length in bytes */, null, 0)

mbonnin

11/07/2023, 1:18 PM

You can encode your

String

to a

ByteArray

first?

mbonnin

11/07/2023, 1:18 PM

Or does

MultiByteToWideChar

take a

String

as input? (in which case the API is a bit weird)

mbonnin

11/07/2023, 1:19 PM

https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar 🤔

mbonnin

11/07/2023, 1:21 PM

Pointer to the character string to convert.

What the hell is a "character string"?

Edoardo Luppi

11/07/2023, 1:21 PM

MultiByteToWideChar

serves the purpose of converting the UTF-8 Kotlin String to an UTF-16 buffer. The function requires the UTF-8 String length in bytes

mbonnin

11/07/2023, 1:22 PM

Are string UTF-8 in K/N ? (in JVM they are UTF-16)

mbonnin

11/07/2023, 1:25 PM

https://kotlinlang.org/docs/mapping-strings-from-c.html Kotlin/Native uses UTF-8 character encoding by default.

mbonnin

11/07/2023, 1:25 PM

All good then, you can pass -1 to

cbMultiByte

mbonnin

11/07/2023, 1:25 PM

Copy code

Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.

mbonnin

11/07/2023, 1:26 PM

I'm assuming the Kotlin string is null terminated but anything else would really be asking for trouble

Edoardo Luppi

11/07/2023, 1:30 PM

you can pass -1

That's probably the fastest way, although I'd have preferred not including the NULL termination. Also I don't take for granted it's NULL terminated. See this for UTF-8/16 in Native https://kotlinlang.slack.com/archives/C3SGXARS6/p1699288840089989?thread_ts=1699264329.174739&cid=C3SGXARS6

👀 1

mbonnin

11/07/2023, 1:31 PM

I'd have preferred not including the NULL termination

I don't think you have the choice? The Kotlin runtime will add it for you?

mbonnin

11/07/2023, 1:33 PM

Kotlin maps

_In_NLS_string_(cbMultiByte)LPCCH lpMultiByteStr

String

because I'm guessing it's all

const char *

under the hood

mbonnin

11/07/2023, 1:34 PM

Copy code

Unlike other pointers, the parameters of type const char* are represented as a Kotlin String.

mbonnin

11/07/2023, 1:35 PM

This is convenient in your case, just pass the Kotlin String and the underlying

const char *

pointer will be used, which should really always contain a null terminator

mbonnin

11/07/2023, 1:40 PM

I wonder how are subscript handled. If you do

Copy code

val a  = "hello"
    for (i in a.indices) {
      println(a.get(i))
    }

is it going to scan the utf8 representation n times? That would be not cool (could be avoided by programming differently I guess but would be good to know)

mbonnin

11/07/2023, 1:42 PM

But the fact that

const char *

is mapped to

String

is a strong indicator that the internal representation is a null-terminated utf-8 string

Edoardo Luppi

11/07/2023, 1:42 PM

Sorry was in call lol

The Kotlin runtime will add it for you

But in this context I'm working purely with WinAPIs. What I'm doing is converting the UTF-8 String I've read with Okio to the UTF-16 buffer, which I then convert to ISO-8859-1 with

WideCharToMultiByte

Edoardo Luppi

11/07/2023, 1:43 PM

Copy code

internal actual fun FileSystem.writeLatin1(path: Path, content: String) {
  write(path, mustCreate = false) {
    val utf16 = content.toUtf16String()
    val latin1 = utf16.toLatin1String()
    write(latin1.buffer) // Just write bytes
  }
}

Edoardo Luppi

11/07/2023, 1:44 PM

I guess I could just strip away the last NULL byte?

mbonnin

11/07/2023, 1:45 PM

Gosh windows API...

mbonnin

11/07/2023, 1:45 PM

Can't you convert from UTF-8 to ISO-8859 directly?

Edoardo Luppi

11/07/2023, 1:45 PM

Nope, that I know of you need to pass to UTF-16 first, which is the native Windows encoding.

mbonnin

11/07/2023, 1:46 PM

WideCharToMultiByte

seems to take a

char *

as input though

mbonnin

11/07/2023, 1:48 PM

side note: there's a bunch of unicode codepoints that don't have a representation in ISO-8859-1, right?

mbonnin

11/07/2023, 1:48 PM

So your conversion might have some loss

Edoardo Luppi

11/07/2023, 1:50 PM

WideCharToMultiByte
seems to take a
char *
as input though

It's a wide char, so not possible to pass in a Kotlin

String

So your conversion might have some loss

All this for reading .properties files used in a Java context. So I'm reading using UTF-8 > No data loss > manipulate > convert back to ISO > write to file

mbonnin

11/07/2023, 1:54 PM

lpMultiByteStr is a

Copy code

typedef CONST CHAR * LPCCH

So it's not a wide char?

Edoardo Luppi

11/07/2023, 1:55 PM

But in the

WideCharToMultiByte

context,

lpMultiByteStr

is the output, so the only value you can pass is a buffer that's going to be filled by the API.

mbonnin

11/07/2023, 1:56 PM

Aaahhhh

mbonnin

11/07/2023, 1:57 PM

I think it's input? Or I am looking at the wrong place?

Edoardo Luppi

11/07/2023, 1:58 PM

You're looking at

MultiByteToWideChar

which is the step to convert from UTF-8 (multi byte) to UTF-16 (wide chars)

Edoardo Luppi

11/07/2023, 1:59 PM

There is no API to convert between multi byte

mbonnin

11/07/2023, 1:59 PM

Aaah gotcha 👍

mbonnin

11/07/2023, 2:00 PM

TBH At this point I would write the ISO-8859 decoder myself using okio and

readUtf8CodePoint()

mbonnin

11/07/2023, 2:01 PM

ISO-8859 is 191 char, 128 of them should be pretty straightforward

mbonnin

11/07/2023, 2:01 PM

leaves you with a lookup table of 63 chars

Edoardo Luppi

11/07/2023, 2:01 PM

That's probably a better alternative, you're right. It also avoids dealing with Windows APIs, thus making it to the common source set

mbonnin

11/07/2023, 2:01 PM

And yea, no LPWCHARRSTRCHGRAAA 😂

Edoardo Luppi

11/07/2023, 2:02 PM

Hahahaha my god, that Windows naming convention drives me crazy every time. A typealias that typealias a typealias for a typealias

💯 1

Edoardo Luppi

11/07/2023, 2:03 PM

Btw in the KMP survey I wrote down MinGW needs more love. Windows support seem abandoned

mbonnin

11/07/2023, 2:12 PM

Got curious and wrote this:

Copy code

val buffer = Buffer().writeUtf8("éèùô")

        val wideChars = allocArray<WCHARVar>(50)

        var converted = MultiByteToWideChar(
            CodePage = CP_UTF8.toUInt(),
            dwFlags = 0u,
            lpMultiByteStr = buffer.readUtf8(),
            cbMultiByte = -1,
            lpWideCharStr = wideChars,
            cchWideChar = 50
        )

        val iso8851 = allocArray<CHARVar>(50)
        converted = WideCharToMultiByte(
            CodePage = 28591.convert(),
            dwFlags = 0u,
            lpWideCharStr = wideChars,
            cchWideChar = converted,
            lpMultiByteStr = iso8851,
            50,
            null,
            null
        )

but I can't run it on MacOS 🤦‍♂️

mbonnin

11/07/2023, 2:12 PM

Anyways, I'll leave the windows API alone.

commonMain

FTW!

Edoardo Luppi

11/07/2023, 2:13 PM

😂 I'll take your suggestion and use the UTF-8 code points

😄 1

268 Views

Open in Slack

Previous Next