I have got `Zipline` to implement a plugin system....
# squarelibraries
s
I have got
Zipline
to implement a plugin system. • Most of the plugins are working, but one is making my app crash without any helpful information, no catchable exception is thrown, even application level
UncaughtExceptionHandler
isn't being invoked. • App just straight crashes out when loading that plugin, can follow till
ZiplineLoader.load
using a debugger, then it just closes with following in logcat. Would appreciate if you guys can help me make sense of whats going on, point in the right direction. Full Error trace in thread 🧵
Copy code
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa9bb6ffed65f03c6 in tid 1400 (DefaultDispatch), pid 1341 (ndbound.preview)
2023-08-07 23:52:43.486  1341-1393  EGL_emulation           in.shabinder.soundbound.preview D app_time_stats: avg=1101.57ms min=4.20ms max=35475.07ms count=33
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A Cmdline: in.shabinder.soundbound.preview
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A  pid: 1341, tid: 1400, name: DefaultDispatch  >>> in.shabinder.soundbound.preview <<<
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #00 pc 000000000005d6f0  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so (BuildId: 75be7cfbfc363fdc714c69f104ad8a92d2b77b84)
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #01 pc 0000000000055500  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #02 pc 000000000005d1a4  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #03 pc 00000000000562fc  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #04 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #05 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #06 pc 000000000005622c  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #07 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #08 pc 000000000005622c  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #09 pc 000000000005622c  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #10 pc 000000000005622c  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #11 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #12 pc 000000000005622c  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #13 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #14 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #15 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #16 pc 000000000005d1a4  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #17 pc 00000000000562fc  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #18 pc 00000000000563b8  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #19 pc 000000000004f818  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #20 pc 000000000005e544  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so 
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #21 pc 00000000000403a4  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so
2023-08-07 23:52:43.595  1472-1472  DEBUG crash_dump64 A #22 pc 0000000000042c50  /data/app/~~rjydBkrqsHT23TVoRZYBgQ==/in.shabinder.soundbound.preview-kt6F8SHLSTrrNg_RqPwgIw==/lib/arm64/libquickjs.so (Java_app_cash_zipline_QuickJs_execute+24)
l
A SigSeg is a low-level error (not in the JVM or JS runtime, but in C or equivalent). That's why you can't catch it. A sigseg means that a pointer to invalid memory was dereferenced. It could be a NULL pointer, a pointer that was freed some time ago, or random data being interpreted as a pointer. Are you binding to or otherwise calling C code?
s
Zipline is working with quickJS, I dont have any low-level code impl in my plugins. and thanks for the info about the crash as well.
weird, seems crashes are happening when I am using
supervisorScope{}
and
async{}
in
ZiplineServices
am i missing something ? is this a known limitation ? or should I report it in github issues ? PS: yup, can confirm, using either
async{}
/
launch{}
results in said crash.
PS2: using
GlobalScope
also leads to crashes...
j
Likely a stack overflow
The main places we’ve caused ourselves grief in our use is stack overflows, either due to legitimate too-small stacks
Or due to infinite recursion
s
sorry, but didnt got you, so this is a known limitation, we cant do async / launch concurrency from zipline services, because its not able to handle the contunations in the stack in quickJs side ?
j
Coroutines work fine
You can’t use multiple threads
Your Zipline Dispatcher needs to use exactly one thread, the same thread you interact with it with
s
suspend functions do yes, is
Copy code
supervisorScope {
 listOf(
    async{} , async
  ).awaitAll()
}
mutli threaded, I suppose not if I init like below
Copy code
private val ziplineExecutorService = Executors.newSingleThreadExecutor { Thread(it, "Zipline") }
private val ziplineDispatcher = ziplineExecutorService.asCoroutineDispatcher(
?
but even with above setup i cant use asyn within supervisor/coroutine scope in suspend fun in ZiplineService.
lmk if I am unclear or can share more details on my current impl.
j
That looks good
The only host (ie. JVM) thread that can ever interact with your Zipline instance is that one
If you’re creating your Zipline instance on another thread, you’re going to have a bad time
You can use withContext or something to run on that dispatcher when you talk to Zipline
s
clarifying, whenever I need to run any thing from my zipline service, I need to use a withContext to move that operation into this dispatcher first from my host side. So I should not pass my zipline services around, instead use a wrapper which run everything by context switching to above dispatcher and then interacting with any zipline service I have taken from above loader, yes ?
j
Sure
Zipline won’t do thread hopping for you
But it’ll crash hard if you use two threads with it simultaneously
There’s lots of different ways to accomplish this
It’s similar to Android being strict about views on the main thread
s
Understood, thanks for the details. Also since zipline is already aware of all possible functions and details of my services, down the lane it should/can handle this for us, maybe using an opt-in behaviour ? One last request,
There’s lots of different ways to accomplish this.
Can you mention the name of the widely used ones ? will give a read before trying to impl and even learn.
j
It’s unlikely we’ll handle it for you automatically for you… when you do your own thread hopping you can batch up hops for best performance
If we did it automatically it’d be slow cause we’d need to hop for each call
As for strategies, my best recommendation is to move every class that touches a Zipline API to be given the dispatcher and to be responsible for making calls on that dispatcher
👍 1
Vs. Making it automatic
Another strategy is to just not have multiple threads in your program
You could use the main thread for Zipline too
The only reason to have multiple threads is so you can have concurrency, and you don’t necessarily need that
s
I understand ur point of view, but in my use case, all of my zipline services are network and data related, have created HttpClient Service pointing to host's ktor to achieve that. So
move every class that touches a Zipline API to be given the dispatcher
this looks better for at least my use case, can create a repo which will use all the plugins and schedule loads on the said dispatcher. Again thanks for all this info, looks good will try to impl and see where I land, feel free to share any other recommendations / catches which I should keep in mind.
👍🏻 1
Got it working 🙌 , incorrect threading was the issue. Already loving zipline, with wasm support so many possibilities of using libs from other languages, even with js when and if external node modules get supported, that unlocks gr8 possibilities. Thanks for sharing all this info and also for zipline. K
j
Yay! Very glad to hear it’s working okay. Please report any bugs you find!
s
https://github.com/cashapp/zipline/issues/1099 👀 seeing these crashes on crashlytics.
https://github.com/cashapp/zipline/issues/1163 Having a hard time debugging issues with some of my services, Any help is appreciated.
j
This looks like a kotlinx.serialization problem?
s
https://kotlinlang.slack.com/archives/C5HT9AL7Q/p1699451378061239?thread_ts=1699394797.750469&cid=C5HT9AL7Q no reason for it to be, if I run the same function without using zipline and in host directly, it works correctly. We are now dicussing in two thread 🫠 , sorry.
The function is wrapped in runCacthing in zipline, host also does try catch over that. the exception is not catched by either, and crashes the app. although as mentioned, from logs I can see the whole function ran.
j
I’m drawn to this part of the stacktrace:
Copy code
at captureStack (runtime/coreRuntime.kt:12)
	at ClassCastException_init_$Create$ (kotlin-kotlin-stdlib-js-ir.js)
	at THROW_CCE (runtime/hacks.kt)
	at <anonymous> (soundbound-extensions-lib.js)
	at <anonymous> (opt/buildAgent/work/b2fef8360e1bcf3d/formats/json/commonMain/src/kotlinx/serialization/json/internal/Polymorphic.kt:20)
kotlinx.serialization is calling some of your code (don’t know what), and that’s throwing a
ClassCastException
Is it possible your
Json
instance has different configuration in guest vs. host code?
s
Nope, its stored in a module, common to both.
j
darn
That’s a good design BTW!
🫡 1
s
Code Snippet: This is the function and I can see the last log in my logcat, so I dont think there is scope of any code to crash. I stongly suspect this is coming, from some kind of Zipline Bridge,
Copy code
app.cash.zipline.ZiplineException: CompletionHandlerException: Exception in completion handler InvokeOnCompletion@4[job@5] for DeferredCoroutine{Completed}@5
Which completion Handler 👀 ?
Something Interesting, my function execution is not being stopped as it appeared in between, but host never returns and at last crashes, when I try to invoke this service again. So Conclusions: • only happens on this function call. • Function Code seems to run completely in vm. • host never returns. • exception comes in between, but only when this function runs.
It now happened as soon as function was invoked from host. but still inside vm, function proceeds normally, but never returns for host. 🫠
j
It’s a crash in serialization so that’s a good place to start? Print the encoded call from
EventListener
, look at the encoded JSON, and maybe write a
commonTest
unit test that encodes and decodes them directly?
s
If I use this module directly in my JUnit Test, it works perfectly fine...
j
Actually, hang on, this is happening in encoding which is a really weird time for an exception like this
I wonder if your
List
contains an element that isn’t in the type parameters? Like a
List<Int>
that contains a String?
s
shouldnt my Junit crash as well then, I inspected and its all rght.
would having the code might help diagnose ? can add you in private repo, if that would help. however will need some setup, since for testing I just have everything published to mavenLocal from related projects.
have literally commented everything and is barebones now.
j
Sure, invite me to a private repo?
Though I’m unlikely to see it til the end of the week; I’m currently sneaking some coding time while on vacation!
🤫
s
Yeah, will try some more things, shall clean things up a bit and invite u there with steps for repro. thanks 🙌
enjoy ur vacation
Something weird is absolutely going on... have created a minimal reproducer, will try more things, if no avail, will send you the invite and steps to get this to build.
seems like recursion / looping related. If I remove the function which loops, it runs, but with that in there it crashes after last line. even though the whole function ran.
Have sent a dm with reproducer, would like some help whenever you find the time.
j
Nice
TY
s
have narrowed down the issue, hunch related to recursion was not the issue. instead below line is: commenting it make this works. but this doesnt explain how this function run fine individually but crashes when used in above snippet (thisCrashes/thisWorks)
kotlinJS (IR) is the culprit ? 🤔
since below runs as well. I am just not able to return that.
j
Is there anything suspending here?
s
wdym exactly ? all lines in above are sequential, but the function itself is suspending.
also got the invite for the repo ?
j
yep! ty
I wonder if this code is actually just using a ton of stack frames when it could instead do something like tailrec
s
the current code isn't recursive, its imperative only and using a while loop. also still doesnt explain what is the actual difference between thisCrashes and thisWorks methods, and one works and one doesnt 🤔
Some more renditions of what works and what doesnt
image.png
Fixed finally, something related to zipline, only. doing below change fixed all the scenarios. have pushed to reproducer (Commit: 3ad29a18, 3a28eb07) shall help in fixing this in zipline itself.
j
I’m really intrigued
s
JVM samples seems resolved, but using in Android, still logs the THROW_CCE, and new function
relatedSongs()
never returns but atleast, the whole app doesnt crash anymore. will be awaiting ur message when you have checked the reproducer and diagnosed the cause of this weird issue.
Fixed it aswell, some issue in returning a list from service, returning object works.
j
Is Zipline incorrectly recursively calling the function back on itself?
s
nope, just changing the return type from a list to an object containing a list parameter fixed these issues.
j
oh I don’t like that , our serialization should be smarter about that
Ie. it shouldn’t make a difference
s
for some reason it does, and the weird thing about throwing THROW_CCE now makes somewhat sense, if the serialisation itself is confused/wrong in generated code.