Hi everyone, I’m trying to use `platform.linux.get...
# kotlin-native
r
Hi everyone, I’m trying to use
platform.linux.gettext
in Kotlin/Native, but I always get a corrupted string back: the first 8 characters are replaced with unexpected data. Has anyone successfully used
gettext
, or noticed any issues with it? I’ve opened an issue to track this problem: KT-73948: Corrupted String Returned by gettext in Kotlin/Native. Any insights would be greatly appreciated!
a
I can take an outsider look really quick but i'd be sweet if you assign the affected Kotlin version on the issue ^^
Also, could you try replacing toKString() with toKStringFromUtf8() and see if that makes a difference for you?
r
sure, I didn't notice I could set it, it's now set to 2.1.0 👍 regarding using
toKStringFromUtf8()
, it doesn't fix it unfortunately, I will update the ticket with this info
💯 1
a
I will take a closer look then and see if i can find something. Thank you ^^
r
thank you
a
I didn't try to reproduce this yet, but 8 chars specifically sounds very suspicous to me now that i think about it. That's the size of a pointer in C, so maybe this really is an interop issue where the stack frame gets messed up somehow. I will try to recreate this when i'm on my PC again later today. Could you try to use a regular for-loop to iterate over the C-string using the subscript-operator overload provided by CPointer and then dump the individual chars as hex? That way we can at least differentiate if it is an interop-issue or an issue with K/Ns string conversion function(s).
But that's weird too, the toKString() function uses a simple loop as well afaik.. I'll keep you updated on my findings if there's any ^^
🙏 1
r
hey I'm not sure if I understood what you suggested to do but, if I did, this is the output:
Copy code
fun main() {
    // Set the locale to the user's default environment locale
    setlocale(LC_ALL, null)
    // You might need to call bindtextdomain if you had translations, but here it's just demonstration
    bindtextdomain("messages", ".")
    textdomain("messages")

    val rawString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
    val ptr = gettext(rawString)

    // Dump the raw bytes as hex to see what we actually got from gettext
    if (ptr != null) {
        print("Raw bytes from gettext: ")
        var i = 0
        while (true) {
            val c = ptr.get(i)
            if (c.toInt() == 0) break
            // Print each character as a hex byte
            print("\\x${c.toUByte().toString(16).uppercase()}")
            i++
        }
        println()
    }

    val translated = ptr?.toKString() ?: rawString
    println("gettext result = \"$translated\"")
}
output:
Copy code
> Task :samples:playground:runDebugExecutableLinuxX64
Raw bytes from gettext: \xDC\xC\xDF\x6E\x87\x7A\x10\xC5\x73\x75\x6D\x20\x64\x6F\x6C\x6F\x72\x20\x73\x69\x74\x20\x61\x6D\x65\x74\x2C\x20\x63\x6F\x6E\x73\x65\x63\x74\x65\x74\x75\x72\x20\x61\x64\x69\x70\x69\x73\x63\x69\x6E\x67\x20\x65\x6C\x69\x74\x2E
gettext result = "��n�z�sum dolor sit amet, consectetur adipiscing elit."
a
That's exactly what i wanted you to do ^^ Thank you. This is so weird, neither endianess arrangement of the first 8 chars looks like a valid address to me, so its not a pointer on the stack. It's something else.
r
hey @ephemient do you have any suggestion for this issue? Do you think it's a real bug or am I doing something wrong?
e
Copy code
$ ./gradlew linkLinuxX64DebugExecutable
$ gdb build/bin/linuxX64/debugExecutable/gettext.kexe
(gdb) break dcgettext
Breakpoint 1 at 0x2f1030
(gdb) run
Thread 1 "gettext.kexe" hit Breakpoint 1, __GI___dcgettext (domainname=0x0, msgid=0x353988 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", category=5) at ./intl/dcgettext.c:47
(gdb) finish
Value returned is $1 = 0x353988 "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
(gdb) watch *0x353988
(gdb) continue
Thread 1 "gettext.kexe" hit Hardware watchpoint 2: *0x353988

Old value = 1701998412
New value = 1463964853
tcache_put (tc_idx=3, chunk=0x353970) at ./malloc/malloc.c:3177
(gdb) where
#0  tcache_put (tc_idx=3, chunk=0x353970) at ./malloc/malloc.c:3177
#1  _int_free (av=0x7ffff7e25c60 <main_arena>, p=0x353970, have_lock=have_lock@entry=0) at ./malloc/malloc.c:4477
#2  0x00007ffff7cebf1f in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3385
#3  0x0000000000260c0a in kfun:kotlinx.cinterop.nativeMemUtils#freeRaw(kotlin.native.internal.NativePtr){} (_this=0x30d530 <__unnamed_2033>, mem=0x353980)
    at /mnt/agent/work/b5c630f73501b353/kotlin/kotlin-native/Interop/Runtime/src/native/kotlin/kotlinx/cinterop/NativeMem.kt:127
#4  0x0000000000260a3c in kfun:kotlinx.cinterop.nativeMemUtils#free(kotlin.native.internal.NativePtr){} (_this=0x30d530 <__unnamed_2033>, mem=0x353980)
    at /mnt/agent/work/b5c630f73501b353/kotlin/kotlin-native/Interop/Runtime/src/native/kotlin/kotlinx/cinterop/NativeMem.kt:115
#5  0x000000000025e0b5 in kfun:kotlinx.cinterop.nativeHeap#free(kotlin.native.internal.NativePtr){} (_this=0x30d3c8 <__unnamed_1994>, mem=0x353980)
    at /mnt/agent/work/b5c630f73501b353/kotlin/kotlin-native/Interop/Runtime/src/main/kotlin/kotlinx/cinterop/Utils.kt:33
#6  0x00000000002afa79 in kfun:kotlinx.cinterop.NativeFreeablePlacement#free(kotlin.native.internal.NativePtr){}-trampoline ()
    at /mnt/agent/work/b5c630f73501b353/kotlin/kotlin-native/Interop/Runtime/src/main/kotlin/kotlinx/cinterop/Utils.kt:21
#7  0x000000000025df6c in kfun:kotlinx.cinterop#free__at__kotlinx.cinterop.NativeFreeablePlacement(kotlinx.cinterop.NativePointed){} (_this=0x30d3c8 <__unnamed_1994>, pointed=0x353980)
    at /mnt/agent/work/b5c630f73501b353/kotlin/kotlin-native/Interop/Runtime/src/main/kotlin/kotlinx/cinterop/Utils.kt:27
#8  0x000000000025e9d0 in kfun:kotlinx.cinterop.ArenaBase#clearImpl(){} (_this=0x7ffff6bcc0a0)
    at /mnt/agent/work/b5c630f73501b353/kotlin/kotlin-native/Interop/Runtime/src/main/kotlin/kotlinx/cinterop/Utils.kt:94
so it's getting freed at the end of the arena
what I'm pretty sure is happening: • Kotlin allocates a native C string representing
rawString
for the duration of the
gettext
call •
gettext
doesn't find a match in its catalog, so it returns the input string • Kotlin frees the temporary C string • now you have a dangling pointer
1
I think the
gettext
APIs cannot be safe to use with automatic string conversion due to this. if you make your own bindings with
noStringConversion
then you'd have control over when it gets freed
a
Cool that you figured this out! 😄 I think it would be great to have control over which CInterop functions get marshalled and which do not based on some filter in the .def file for this exact reason
e
there is control, with
noStringConversion
. but if you're using
platform.linux
then the decision has already been made for you
a
okay fair, true.
r
@ephemient Thank you for the suggestion! Since I’m creating bindings for GLib, I applied your approach to the GLib wrappers for gettext (
noStringConversion = g_dcgettext g_dgettext g_dngettext g_dpgettext
), and it seems to work fine:
Copy code
fun main() = memScoped {
    // Set the locale to the user's default environment locale
    setlocale(LC_ALL, null)

    // Bind text domain to current directory
    bindtextdomain("messages", ".")
    textdomain("messages")

    // Prepare your raw string in a stable C buffer
    val rawString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
    val cString = rawString.cstr.ptr

    // Also prepare the domain name as a C string
    val domainName = "messages".cstr.ptr

    // Call dgettext with noStringConversion
    val resultPtr = g_dgettext(domainName, cString)

    // Convert the raw pointer to a Kotlin String
    val translated = resultPtr?.toKString() ?: rawString

    println("dgettext result = \"$translated\"")
}
However, I’m wondering if the current behavior of
platform.linux.gettext
is still considered a bug. As it stands, it provides a broken binding for this very popular library. Should this be addressed in Kotlin/Native directly?
e
yes I think it's bad that Kotlin ships with a binding that's so unsafe to use. I'm not sure if there's a way to fix it though, there's probably other unsafe apis and the surface is too large to inspect them all…
(btw you can create defs for the same functions, they won't conflict. I'd either use glib wrappers only, or the underlying functions, not a mix. it works out on linux but just in case there's platforms where the glib wrappers aren't just simple wrappers)
👍 1