Hi. I am trying to move some image processing rela...
# webassembly
b
Hi. I am trying to move some image processing related code from js to wasm to speed things up. While the calculations themselves do run considerably faster indeed, passing the input/output-arrays between js and wasm is taking more time than I think it should. This is why I think it shouldn't:
Copy code
@JsExport
fun fillSlow(n: Int): Int32Array {
    val array = Int32Array(n)
    for (i in 0..<n) {
        array[i] = i
    }
    return array
}
Calling the wasm-function
fillSlow
from js takes 4-5 times as long as running the function in js directly. My guess is that's because to write it makes
n
calls back into the imported js-function
'org.khronos.webgl.setMethodImplForInt32Array' : (obj, index, value) => { obj[index] = value; }
, and the wasm->js overhead is that expensive. Fair enough:
Copy code
@JsExport
fun fillFast(n: Int): Int32Array = withScopedMemoryAllocator { allocator ->
    val p = allocator.allocate(n * 4)
    var addr = p.address.toInt()    
    for (i in 0..<n) {
        p.storeInt(i)
        addr += 4
    }
    importArrayFromMem(p.address.toInt(), n)
}

fun importArrayFromMem(address: Int, length: Int): Int32Array =
    js("new Int32Array(wasmExports.memory.buffer, address, length)")
Calling the wasm-function
fillFast
from js is sufficiently fast, in fact 2-3x faster than even running
fillSlow
in js directly. It does all the operations that I think should be required to do the same work: compute an offset (
addr +=4
) and store a value (
p.storeInt
). It is obviously not producing the right result however, as it keeps writing to the same address, i.e. to the beginning of the array. Unfortunately, changing this to
(p+addr).storeInt(i)
immediately make it extremely slow, 2-3x slower than running
fillSlow
in wasm and over 10x slower than running
fillSlow
in js directly. The main reason for the slowdown appear to be creation of all the
Pointer
-objects that are required in order to call
storeInt
. Even if I remove the store/add-instructions and only create a new pointer in every iteration, the function remains this slow. Is there perhaps another way that allows writes to memory without having to create all those pointer-objects ? Thanks 🙂 (I test on kotlin multiplatform 1.9.24 and Chrome)
s
Hi.
(p+addr).storeInt(i)
should be fast and should not create any Pointer objects (it is a
value
class). Are you running a “production” or a “development” build?
b
Hi. Yes you're right there. I'm running a production build (wasmJsBrowserProductionLibraryDistribution)
it may also be related to
Uint
somehow: if I change the loop to
Copy code
var addr = p.address
for (i in 0..<n) {
    addr += 4u
}
it's still way slower than when incrementing the Int
s
Oh, this does sound like an issue with value classes performance. Could you share some of your findings in a new ticket: https://kotl.in/issue? Unfortunately we don’t have an API for accessing memory without a Pointer. But you can use unstable internal API to test the hypothesis, like this:
Copy code
@Suppress("INVISIBLE_MEMBER", "INVISIBLE_REFERENCE") 
    @kotlin.wasm.internal.WasmOp(kotlin.wasm.internal.WasmOp.I32_STORE)
    public fun myStore(addr: Int, value: Int): Unit = 
    	error("intrinsic")
b
ah great, i saw those internal op-functions but didn't know how to access them. Yes I'll try that and create a ticket, thanks!
cool, that does work fast indeed!
I created https://youtrack.jetbrains.com/issue/KT-68329/Performance-issue-with-value-classes. Thanks again for your fast response 🙂