Jeff Lockhart
11/23/2022, 1:30 AMkotlinx-coroutines-test
. Each test is run within a runTest()
coroutine. All the tests execute and pass on JVM. On iOS they usually pass as well, but often (~30% of the time) one of the tests hangs, blocking the test suite's completion. I've seen almost all the tests cause this, from the first to the last, so it's not caused by a specific test.
If I add a print statement to the bottom of each of the runTest()
calls, the print statement is always executed, so it's hanging after the test code completes, but runtTest()
apparently isn't returning for some reason.
Any idea what could be the cause of this? The code is based on this test suite from SQLDelight, but using multiplatform paging and a different database.Jeff Lockhart
11/23/2022, 2:37 AMrunTest()
that's hanging. If I wrap the call with:
fun runTestAndLogCompletion(testBody: suspend TestScope.() -> Unit) {
runTest(testBody = testBody)
println("completed")
}
"completed" is logged before the test goes on to hang indefinitely. It's not clear what could be causing this, other than the iOS test runner itself then.Jeff Lockhart
11/23/2022, 3:05 AMkotlinx-coroutines-test
that hang like this.andylamax
11/23/2022, 6:43 AMJeff Lockhart
11/23/2022, 4:44 PMkotlinx-coroutines-test
and hang on the iOS target?andylamax
11/25/2022, 2:04 AMJeff Lockhart
11/25/2022, 7:24 AMandylamax
11/25/2022, 2:10 PMandylamax
11/25/2022, 2:10 PMJeff Lockhart
11/25/2022, 4:24 PMJeff Lockhart
11/28/2022, 4:30 PMkotlinx-coroutines-test
on Apple targets? Do you have any suggestions on how we might diagnose the cause?Dmitry Khalanskiy [JB]
11/29/2022, 11:27 AMrunTest
does finish, but then again, the flakiness does mean that there's some non-determinism involved. Very odd indeed.andylamax
11/29/2022, 12:18 PMJeff Lockhart
11/29/2022, 5:55 PMkotlinx-coroutines-test
is just one of the things specific to this code, vs other tests in my project that haven't ever experienced this. The other thing specific to these tests would be the paging extension code itself. But again, it also completes execution before going on to hang. I'll see if I can put together something for you to be able to take a look at.Jeff Lockhart
03/06/2023, 9:33 PMrunTest { ... }
, I'm able to workaround the hanging on iOS. I haven't been able to get any of the tests to hang after replacing runTest { ... }
with this `runTestAndPause { ... }`:
fun runTestAndPause(
testBody: suspend TestScope.() -> Unit
) {
runTest(testBody = testBody)
runBlocking { delay(1) }
}
Without this workaround, 1 of the 23 tests in this specific test suite that uses kotlinx-coroutines-test
will almost always hang, preventing the suite from completing (although occasionally they will all complete).Dmitry Khalanskiy [JB]
03/07/2023, 3:03 PMrunBlocking { delay(1) }
with something other than runBlocking
? I suppose there is a way to sleep for a given amount of time on iOS. Worst comes to worst, there's the non-optimizable busy-loop
repeat(10000) {
assertTrue(Random.nextInt(until = 100) < 100)
}
Jeff Lockhart
03/07/2023, 7:21 PMrunBlocking { delay(1) }
for the ease of multiplatform support in common tests. I just tested with a ThreadUtils.sleep(1)
expect function where the iOS implementation is NSThread.sleepForTimeInterval(millis.toDouble() / 1000)
and this also works to prevent the tests from hanging.Jeff Lockhart
03/07/2023, 7:52 PMDmitry Khalanskiy [JB]
03/08/2023, 11:30 AMJeff Lockhart
03/08/2023, 6:28 PMprintln("completed")
statement wasn't enough to prevent the tests from hanging. The odd thing was just that the print statement logged and then the test still went on to hang indefinitely.Jeff Lockhart
03/08/2023, 7:36 PMrunBlocking
. But I ended up not being able to find a good replacement for TestScope.advanceUntilIdle()
, essentially await until the coroutine suspends to perform a check. I'll have to play with the tests some more to see if I can rework this part and see if it's still reproducible without the coroutines test dependency.Dmitry Khalanskiy [JB]
03/09/2023, 10:46 AMadvanceUntilIdle
, but as a (typically) non-idiomatic but robust approach, a large enough delay
does the trick.Jeff Lockhart
03/09/2023, 11:43 PMTestScope.advanceUntilIdle()
is used here.
I haven't been able to reproduce the hanging with the SQLDelight tests (I ported the SQLDelight paging extension to multiplatform and ran the tests a bunch in the process). The SQLDelight tests run faster than my other database tests though. So could just be different timing conditions.Dmitry Khalanskiy [JB]
10/25/2023, 11:08 AMJeff Lockhart
10/27/2023, 9:47 PMTestScope.advanceUntilIdle()
no longer does what I need it to, so I've had to replace it with an arbitrary delay now.)
I went back to an older commit before I removed coroutines-test and reproduced this again. If you run ./gradlew :couchbase-lite-paging:cleanAllTests :couchbase-lite-paging:iosX64Test
repeatedly on the *paging-ios-tests-hang* branch, eventually the iOS tests will hang indefinitely.
Based on other times I've experienced this same issue of iOS tests hanging indefinitely, it seems to be caused by some background thread still being active when the test execution completes, which makes sense why delaying a short period at the end of the test may resolve the problem. JVM tests don't exhibit this same behavior though.Dmitry Khalanskiy [JB]
10/30/2023, 9:40 AMWe take breakage seriously, so if, after upgrading to some version of the test library, the behavior ofno longer does what I need it toTestScope.advanceUntilIdle()
advanceUntilIdle
changed, it's a cause for concern. Could you describe how exactly the behavior changed so that we could decide if it's a regression or intended behavior that should be documented?
the paging-ios-tests-hang branch, eventually the iOS tests will hang indefinitelyThank you for the reproducer! We'll look into it.
Dmitry Khalanskiy [JB]
10/30/2023, 12:30 PMDmitry Khalanskiy [JB]
10/30/2023, 1:19 PMandylamax
10/30/2023, 2:44 PMDmitry Khalanskiy [JB]
10/30/2023, 3:08 PMCBLQuery.execute
.Jeff Lockhart
10/30/2023, 5:37 PMCould you describe how exactly the behavior changed so that we could decide if it's a regression or intended behavior that should be documented?Sorry, to clarify, it didn't break after a coroutines-test update, but after a change in my library's code. In order to avoid double-querying, with both the initial query execute and the query change listener, the code now uses the query change listener for the initial query as well. But because the query is not executed on the coroutine directly anymore,
TestScope.advanceUntilIdle()
doesn't do what it was doing before, waiting for the completion of the query work before checking the results.
I need to find a better way to definitively wait for the paging query work to complete, rather than an arbitrary delay, as every once in a while a test will fail.
How long does the bug usually take to reproduce?Usually it happens within a dozen or so runs. My computer has 16 cores / 32 threads. So not sure if that makes a difference in reproducing more often. I just ran again on that branch and it happened on the first run, then on the fifth, sixth, and tenth runs after that.
are you sure this is still the same issue? ...the test is not finishedInteresting, I'll have to look into this some more. I just tried adding the
println("completed")
log to see if this was doing what it was doing before. The first time the tests hung, I didn't see "completed" logged. But the second time, I did. Maybe there are multiple possible causes for a deadlock going on.Jeff Lockhart
10/30/2023, 9:55 PMCBLQuery.execute
and the second it logs "completed", but actually hangs during the @AfterTest
database deletion. I didn't think about how this could be the cause! I should have used the debugger to check the deadlocked stack trace originally (this didn't use to work well for iOS).
The fact I haven't been seeing these hanging tests in my latest code leads me to believe the locking issue may have been resolved with another change. I'll let you know if I see this again and can confirm the code causing it is from Kotlin or coroutines. Thank you for your help looking into this!Dmitry Khalanskiy [JB]
10/31/2023, 7:38 AMI need to find a better way to definitively wait for the paging query work to complete, rather than an arbitrary delay, as every once in a while a test will fail.You may be interested in https://github.com/Kotlin/kotlinx.coroutines/issues/3919
It definitely seems to happen less frequently when run on a debugger!It doesn't have to run in a debugger from the start. I ran
while true; do $your_command; done
in the terminal, and when the test hanged, I used the Xcode debugger to attach to the already-running process `test.kexe`: https://stackoverflow.com/questions/9721830/how-to-attach-debugger-to-ios-app-after-launch.Jeff Lockhart
11/01/2023, 10:25 PMdelay
with awaitAllChildren
, the tests just hang. I need to look into possible APIs to get a proper signal for when the pager has its results.