Hello guys, I'm using Kotlin Multiplatform to buil...
# multiplatform
p
Hello guys, I'm using Kotlin Multiplatform to build a WebRTC based video conference SDK, which support iOS and Linux using Kotlin Native. I'm following the idea of "single thread worker task queue programming", which means that all incoming API calls will be scheduled to a single thread worker task queue. I'm using
Worker
API on native platforms (iOS and Linux), and since
Worker
need all the referenced objects to be frozen, so all my states are stored in
HashMap
+
AtomicReference
, and whenever I need to update state, I make a deep copy of the HashMap, then modify it, then call
state.compareAndSet(state.value, newState)
to update state. It appears to work fine at first, which is about half year ago, but abortions about
runtime assert: Must be newly frozen
in
runtime/src/main/cpp/Memory.cpp
happen randomly when I test it more recently. I tried Kotlin 1.3.61 and 1.4-M1, it happened on both version. Here is my
Worker
and state code:
Copy code
actual class WorkerTaskQueue<T> actual constructor(state: T)  {
  private val worker = Worker.start()
  private val state = AtomicReference(state.freeze())

  actual fun state(): T = state.value

  actual fun updateState(newState: T): T {
    newState.freeze()
    while (!state.compareAndSet(state.value, newState)) {
    }
    return state.value
  }

  actual fun execute(task: () -> Unit) {
    worker.executeAfter(0, task.freeze())
  }
}
The abortion is caused by
newState.freeze()
above. I tried to simplify my project, or the scenario to reproduce this abortion, but I failed, I can only reproduce it when a iOS client and Linux client join the same meeting on a local test server, so I'm not sure if it's appropriate to provide the full project. But I can provide it if necessary. Please help me, thanks!
k
Not that it matters, but why this?
Copy code
while (!state.compareAndSet(state.value, newState)) {
    }
Is your actual code copying the hash map in the empty block?
Where’s the state being updated from? I guess that’s not important for the particular issue, but if there’s code you’re omitting, it would be good to see.
We have a state isolated structure you might like. You keep the map local to a worker thread but mutable, but it’s really hard to understand what else is going on, so hard to say.
If you’re changing that map often, though, and there’s a lot of data in it, HashMap may not be a great option.
p
Hi @kpgalligan, thank you so much for your reply! I 'll answer your questions one by one. 1. About the while loop: I just want to make sure the set will succeed, the hash map won't be changed during the loop, because I only update state in the worker thread; 2. As I said in point 1, the hash map will only be updated in the worker thread, but actually my current code set the initial state in main thread, but that could be removed; 3. The hash map will be changed very often. Could you please tell me more details about the "state isolated structure"? I only use Kotlin and Kotlin Multiplatform for about half year, the current HashMap + AtomicReference way was figured out half year ago right here, at this slack channel.
k
Well, there are a few points here. I am concerned about the error you’re seeing, as it’s potentially a race condition in memory management, but that’s a different topic. If only one thread is ever setting state in the AtomicReference, set should never fail. `compareAndSet`only exists if you need to ensure safe concurrent access, but that’s a bigger topic.
updateState
could look like:
Copy code
actual fun updateState(newState: T): T {
    state.value = newState.freeze()
    return state.value
  }
Again, it shouldn’t fail, unless you have found some kind of memory race condition, but
compareAndSet
won’t help
I have a library called Stately. It has an older collections set that implements HashMap, but with AtomicReference, so you don’t need to deep copy each time. Performance isn’t great, but lots of changes to a HashMap that needs a full copy each time would be not great too
However, https://github.com/touchlab/Stately#stately-isolate, stately-isolate, could keep a multable map in isolation. There are more or less performant ways of implementing the same thing in your specific case, but I don’t really understand how the state is actually being updated. We’re currently on 1.3.72, but should be publishing to 1.4 at some point
I’m very overdue for blog posts about isolated state, but here’s an older one: https://dev.to/touchlab/kotlin-native-isolated-state-50l1
p
Okay, I'll try Stately later, and come back with result, thank you! BTW, the abortion happens when I call
newState.freeze()
. So I think remove
compareAndSet
couldn't help, right?
Hi @kpgalligan, I read your blog posts, and watched your talk on KotlinConf 2019, and also checked stately-isolate, they are awesome! I did a quick and dirty test, replace my
WorkerTaskQueue
with
IsolateState
, and remove all the freeze call to my state, and call
ensureNeverFrozen()
in state init function. The abortion still happens:
Copy code
/Users/teamcity1/teamcity_work/4d622a065c544371/runtime/src/main/cpp/Memory.cpp:2500: runtime assert: Must be newly frozen
(lldb) bt
* thread #2, stop reason = signal SIGABRT
    frame #0: 0x00000001a2d62df0 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x00000001a2c82930 libsystem_pthread.dylib`pthread_kill + 228
    frame #2: 0x00000001a2c10ba4 libsystem_c.dylib`abort + 104
    frame #3: 0x0000000100fcc634 AvConf`konan::abort() + 12
    frame #4: 0x0000000100fc4d74 AvConf`RuntimeAssertFailed(char const*, char const*) + 168
    frame #5: 0x0000000100fdbbd0 AvConf`FreezeSubgraph + 16404
    frame #6: 0x0000000100ff98cc AvConf`Kotlin_Worker_freezeInternal + 32
  * frame #7: 0x0000000100d53764 AvConf`kfun:kotlin.native.concurrent.freeze@T.(<this>=0x000000028111e348){0<kotlin.Any?>}Generic at Freezing.kt:33:5
    frame #8: 0x0000000100defe70 AvConf`kfun:co.touchlab.stately.isolate.BackgroundStateRunner.stateRun$lambda-1#internal(it=0x000000028111cfc8) at BackgroundStateRunner.kt:13:26
    frame #9: 0x0000000100ff6d8c AvConf`Worker::processQueueElement(bool) + 2220
    frame #10: 0x0000000100ff7f1c AvConf`(anonymous namespace)::workerRoutine(void*) + 72
    frame #11: 0x00000001a2c818fc libsystem_pthread.dylib`_pthread_start + 168
BTW, I'm using 1.3.71 now. Is this some race condition inside KN itself?
Additional information: I only get object out of
IsolateState
in two places in my code, the abortion seems not happen in these two places, but I'm not 100% sure...
k
It’s really hard to talk about this in any detail without some kind of runnable sample. I’m only kind of seen snapshots of what you’re doing. Ideally if I could run something that crashes locally it would be a lot easier to discuss.
p
Hi @kpgalligan, thank you for your reply. Yeah I understand that, I tried to prepare a simple project that could reproduce it, but I failed. Although I could share the whole project, but it's a little bit complicated, not sure if it's appropriate.
k
If it’s not a secret project and I can build it without crazy setup, I’d take a look. You have a very unique error. If nothing else, I’ve learned how to narrow down the cause of these kinds of things.
p
That's sounds great! Thank you so much! I'll try to prepare a step by step doc about it.
Hi @kpgalligan, I was trying to push my project to GitHub, but I don't have enough LFS storage quota, so I create a snapshot of the debug branch, and send it through Slack here. Please check
docs/kn_freeze_abort_debug.md
for setup guide. Thank you so much!
Hi @kpgalligan, do you have a chance to try the project I sent yesterday? I really appreciate your help, thank you in advance!
k
Yesterday got busy. Just took a look. Some things are missing.
p
Thank you Kevin, are you able to reproduce when running the iOSExample project?
I'll check the missing part right now
Please ignore other platforms, I only change code for iOS, so Android/Windows still has compile issue...
k
I ran it on a phone. Still running rebuild, so haven’t run it again, but it seems like I haven’t run into the issue. I assume I need to run it a few times?
p
Are you seeing the camera preview and "retrying" toast? If so, it should happen in several minutes, at least I can reproduce it in several minutes for many times.
k
I see nothing, so I guess it’s not working as expected
129 Views