I have a suite of common tests that run on iOS and JVM that kotlinlang #coroutines

I have a suite of common tests that run on iOS and...

Jeff Lockhart

11/23/2022, 1:30 AM

I have a suite of common tests that run on iOS and JVM that use

kotlinx-coroutines-test

. Each test is run within a

runTest()

coroutine. All the tests execute and pass on JVM. On iOS they usually pass as well, but often (~30% of the time) one of the tests hangs, blocking the test suite's completion. I've seen almost all the tests cause this, from the first to the last, so it's not caused by a specific test. If I add a print statement to the bottom of each of the

runTest()

calls, the print statement is always executed, so it's hanging after the test code completes, but

runtTest()

apparently isn't returning for some reason. Any idea what could be the cause of this? The code is based on this test suite from SQLDelight, but using multiplatform paging and a different database.

Jeff Lockhart

11/23/2022, 2:37 AM

Even weirder, it's not even

runTest()

that's hanging. If I wrap the call with:

Copy code

fun runTestAndLogCompletion(testBody: suspend TestScope.() -> Unit) {
    runTest(testBody = testBody)
    println("completed")
}

"completed" is logged before the test goes on to hang indefinitely. It's not clear what could be causing this, other than the iOS test runner itself then.

Jeff Lockhart

11/23/2022, 3:05 AM

All the other tests in my project run without issue. It's only these ones that use

kotlinx-coroutines-test

that hang like this.

andylamax

11/23/2022, 6:43 AM

I have experienced this before, I thought I never considered it to be the library's problem, but my own partial knowledge.... But maybe, just maybe, it might be the library doesn't play very well yet with some apple targets

Jeff Lockhart

11/23/2022, 4:44 PM

Interesting. So these tests you've experienced this with are also using

kotlinx-coroutines-test

and hang on the iOS target?

andylamax

11/25/2022, 2:04 AM

Not ios target perse but I have been having a test that would pass on every target except watchOs (forgoten if it was arm64 or the other one), but when I run it again it passed it was really flacky

Jeff Lockhart

11/25/2022, 7:24 AM

Oh, ok. And when the test doesn't pass, does it not complete at all, just hangs without completing?

andylamax

11/25/2022, 2:10 PM

exactly that, it just hangs

andylamax

11/25/2022, 2:10 PM

doest run to completion somehow

Jeff Lockhart

11/25/2022, 4:24 PM

Definitely sounds like the same thing!

Jeff Lockhart

11/28/2022, 4:30 PM

@Dmitry Khalanskiy [JB] have you seen this hanging behavior with

kotlinx-coroutines-test

on Apple targets? Do you have any suggestions on how we might diagnose the cause?

Dmitry Khalanskiy [JB]

11/29/2022, 11:27 AM

Nope, your report is the first one I've seen. The most straightforward way to diagnose this is to send us some code that triggers this (it's okay if you do this in private as well), preferably small and self-contained. I don't think coroutines have anything to do with this, given that

runTest

does finish, but then again, the flakiness does mean that there's some non-determinism involved. Very odd indeed.

andylamax

11/29/2022, 12:18 PM

Let me see if I can put together something. I have an OSS lib I can share

Jeff Lockhart

11/29/2022, 5:55 PM

Thanks. My library isn't currently open source, but I'm working on getting it there. The fact that everything finishes and it still hangs is certainly baffling.

kotlinx-coroutines-test

is just one of the things specific to this code, vs other tests in my project that haven't ever experienced this. The other thing specific to these tests would be the paging extension code itself. But again, it also completes execution before going on to hang. I'll see if I can put together something for you to be able to take a look at.

Jeff Lockhart

03/06/2023, 9:33 PM

Revisiting this, I've found that by introducing a 1ms delay after

runTest { ... }

, I'm able to workaround the hanging on iOS. I haven't been able to get any of the tests to hang after replacing

runTest { ... }

with this `runTestAndPause { ... }`:

Copy code

fun runTestAndPause(
    testBody: suspend TestScope.() -> Unit
) {
    runTest(testBody = testBody)
    runBlocking { delay(1) }
}

Without this workaround, 1 of the 23 tests in this specific test suite that uses

kotlinx-coroutines-test

will almost always hang, preventing the suite from completing (although occasionally they will all complete).

Dmitry Khalanskiy [JB]

03/07/2023, 3:03 PM

What if you replace

runBlocking { delay(1) }

with something other than

runBlocking

? I suppose there is a way to sleep for a given amount of time on iOS. Worst comes to worst, there's the non-optimizable busy-loop

Copy code

repeat(10000) {
  assertTrue(Random.nextInt(until = 100) < 100)
}

Jeff Lockhart

03/07/2023, 7:21 PM

I used

runBlocking { delay(1) }

for the ease of multiplatform support in common tests. I just tested with a

ThreadUtils.sleep(1)

expect function where the iOS implementation is

NSThread.sleepForTimeInterval(millis.toDouble() / 1000)

and this also works to prevent the tests from hanging.

Jeff Lockhart

03/07/2023, 7:52 PM

If I run the tests enough, after dozens of runs, they still will occasionally hang on the 21st test (without the delay, they usually hang sooner). So the small delay seems to usually allow whatever causes the deadlock to clear up. If I add a 50ms or 100ms delay, I haven't been able to get the tests to hang. But of course now the suite takes 1-2 seconds longer to run, which is considerably longer than the tests themselves take (~300-400ms).

Dmitry Khalanskiy [JB]

03/08/2023, 11:30 AM

Filed an issue: https://github.com/Kotlin/kotlinx.coroutines/issues/3666

Jeff Lockhart

03/08/2023, 6:28 PM

Thank you! Just to clarify, the

println("completed")

statement wasn't enough to prevent the tests from hanging. The odd thing was just that the print statement logged and then the test still went on to hang indefinitely.

Jeff Lockhart

03/08/2023, 7:36 PM

I tried removing the coroutines test dependency, to use just pure coroutines with

runBlocking

. But I ended up not being able to find a good replacement for

TestScope.advanceUntilIdle()

, essentially await until the coroutine suspends to perform a check. I'll have to play with the tests some more to see if I can rework this part and see if it's still reproducible without the coroutines test dependency.

Dmitry Khalanskiy [JB]

03/09/2023, 10:46 AM

Got it, fixed the issue description. Without knowing what your tests do exactly, tough to say how you can get rid of

advanceUntilIdle

, but as a (typically) non-idiomatic but robust approach, a large enough

delay

does the trick.

Jeff Lockhart

03/09/2023, 11:43 PM

The tests are based on this test suite from SQLDelight multiplatform paging extension, modified to use a different database.

TestScope.advanceUntilIdle()

is used here. I haven't been able to reproduce the hanging with the SQLDelight tests (I ported the SQLDelight paging extension to multiplatform and ran the tests a bunch in the process). The SQLDelight tests run faster than my other database tests though. So could just be different timing conditions.

Dmitry Khalanskiy [JB]

10/25/2023, 11:08 AM

Hello! Does anyone have a publically available project where this reproduces?

Jeff Lockhart

10/27/2023, 9:47 PM

I opened source my library recently, although I haven't been experiencing tests hanging on the most recent versions of my code, even after removing the delay workaround. I'm no longer using the coroutines-test library as well, which could be a contributing factor. (I'm no longer using coroutines-test because

TestScope.advanceUntilIdle()

no longer does what I need it to, so I've had to replace it with an arbitrary delay now.) I went back to an older commit before I removed coroutines-test and reproduced this again. If you run

./gradlew :couchbase-lite-paging:cleanAllTests :couchbase-lite-paging:iosX64Test

repeatedly on the *paging-ios-tests-hang* branch, eventually the iOS tests will hang indefinitely. Based on other times I've experienced this same issue of iOS tests hanging indefinitely, it seems to be caused by some background thread still being active when the test execution completes, which makes sense why delaying a short period at the end of the test may resolve the problem. JVM tests don't exhibit this same behavior though.

Dmitry Khalanskiy [JB]

10/30/2023, 9:40 AM

TestScope.advanceUntilIdle()
no longer does what I need it to

We take breakage seriously, so if, after upgrading to some version of the test library, the behavior of

advanceUntilIdle

changed, it's a cause for concern. Could you describe how exactly the behavior changed so that we could decide if it's a regression or intended behavior that should be documented?

the paging-ios-tests-hang branch, eventually the iOS tests will hang indefinitely

Thank you for the reproducer! We'll look into it.

Dmitry Khalanskiy [JB]

10/30/2023, 12:30 PM

How long does the bug usually take to reproduce? It's been going on for ten minutes, but tests consistently pass without issues, with the command you provided and on the correct branch.

Dmitry Khalanskiy [JB]

10/30/2023, 1:19 PM

Nevermind, it reproduced after an hour of attempts!

andylamax

10/30/2023, 2:44 PM

Now that is one tricky bug

👌 1

Dmitry Khalanskiy [JB]

10/30/2023, 3:08 PM

@Jeff Lockhart, are you sure this is still the same issue? Yes, your tests do occasionally hang, but when I attach a debugger to one of them, the test is not finished; instead, the main thread hangs inside calls to

CBLQuery.execute

Jeff Lockhart

10/30/2023, 5:37 PM

Could you describe how exactly the behavior changed so that we could decide if it's a regression or intended behavior that should be documented?

Sorry, to clarify, it didn't break after a coroutines-test update, but after a change in my library's code. In order to avoid double-querying, with both the initial query execute and the query change listener, the code now uses the query change listener for the initial query as well. But because the query is not executed on the coroutine directly anymore,

TestScope.advanceUntilIdle()

doesn't do what it was doing before, waiting for the completion of the query work before checking the results. I need to find a better way to definitively wait for the paging query work to complete, rather than an arbitrary delay, as every once in a while a test will fail.

How long does the bug usually take to reproduce?

Usually it happens within a dozen or so runs. My computer has 16 cores / 32 threads. So not sure if that makes a difference in reproducing more often. I just ran again on that branch and it happened on the first run, then on the fifth, sixth, and tenth runs after that.

are you sure this is still the same issue? ...the test is not finished

Interesting, I'll have to look into this some more. I just tried adding the

println("completed")

log to see if this was doing what it was doing before. The first time the tests hung, I didn't see "completed" logged. But the second time, I did. Maybe there are multiple possible causes for a deadlock going on.

Jeff Lockhart

10/30/2023, 9:55 PM

It definitely seems to happen less frequently when run on a debugger! I managed to get it to hang in two ways, the first the test is still running with

CBLQuery.execute

and the second it logs "completed", but actually hangs during the

@AfterTest

database deletion. I didn't think about how this could be the cause! I should have used the debugger to check the deadlocked stack trace originally (this didn't use to work well for iOS). The fact I haven't been seeing these hanging tests in my latest code leads me to believe the locking issue may have been resolved with another change. I'll let you know if I see this again and can confirm the code causing it is from Kotlin or coroutines. Thank you for your help looking into this!

Dmitry Khalanskiy [JB]

10/31/2023, 7:38 AM

I need to find a better way to definitively wait for the paging query work to complete, rather than an arbitrary delay, as every once in a while a test will fail.

You may be interested in https://github.com/Kotlin/kotlinx.coroutines/issues/3919

It definitely seems to happen less frequently when run on a debugger!

It doesn't have to run in a debugger from the start. I ran

while true; do $your_command; done

in the terminal, and when the test hanged, I used the Xcode debugger to attach to the already-running process `test.kexe`: https://stackoverflow.com/questions/9721830/how-to-attach-debugger-to-ios-app-after-launch.

Jeff Lockhart

11/01/2023, 10:25 PM

Thanks for the links. I did read that coroutines issue, but it seems my use case isn't covered by the solutions described. If I replace the

delay

with

awaitAllChildren

, the tests just hang. I need to look into possible APIs to get a proper signal for when the pager has its results.

42 Views

Open in Slack

Previous Next