As of the latest coroutine updates, we have a lot ...
# coroutines
p
As of the latest coroutine updates, we have a lot of flaky tests, failing with:
Copy code
kotlinx.coroutines.test.UncompletedCoroutinesError: After waiting for 10s, the test coroutine is not completing
They all have in common that they use mockk, so ByteBuddy - and that seems to just be expensive, especially on the first invocations. I found this (closed issue) https://github.com/Kotlin/kotlinx.coroutines/issues/3270 Did someone find a workaround here?
❤️ 1
👀 1
j
You can pass a higher timeout to the
runTest
function. I had to change it due Windows CI was taking more than 10 secs on GitHub Actions.
p
Yeah that's what I thought as well. Create an utility module with an own run test function and write a lint rule prohibiting the usage of the coroutines one 🙈
👍 1
d
Or you could initialize the mocking mechanism outside
runTest
.
p
We would need to refactor hundreds of tests. But actually I wonder: shouldn’t timeouts be the responsibility of the test framework? Jnit 4+5 have support for that already. With that being said I see multiple possible solutions: • Everyone creates their own run test functions because busy and possibly slow CIs, as well as slow byte buddy making all tests flaky: This doesn’t make sense to me because of a huge overhead • Adding an option to make it globally configurable, i.e. through an env variable - could be a viable option. • Revering the change - does make sense to me. I rather wait 50 seconds longer for a really stale test instead of finding out the CI failed again due to flakiness and restarting the tests a few times until they succeed • Moving all mocking setup outside run test. That will only partly work because often times it’s the first invocation that’s slow. Also this forces users to workaround and split up their logic due to technical reasons • Removing the timeout from runTest and instead adding a runTestWithTimeout function - also a viable option. I think something really needs to be done here - this will affect every developer who uses bytebuddy (so for instance mockk users) who run tests in a CI
j
My use case is similar in terms of, it is normal the test takes more time, it is using property testing, so it is running a lot of tests in one
c
since its just the first invocation of mockk you can just do a mockk call in a static block. the subsequent calls will be faster.
p
So every single test should start with warming up mocks so the coroutines timeout measures correctly? That’s a lot of noise
c
not every single test. just one static block in your test codebase
p
What’s a “static block”?
p
And in which test would you put that?
(And in which module)
c
does not matter. just something that is loaded when the tests are loaded
p
As the test order is not deterministic it would need to be put into every test
c
or somewhere thats imported from every test
p
That does not sound like a good solution for every developer and project
2
c
my solution is to not use mockk.
🙄 1
because 3 seconds is more than my whole test suite takes
p
🙈
c
apart from that I think most people have a shared class that is imported in all test cases, and there i would put it.
I wrote a very simple java proxy based mocking library because I found mockk too slow: https://github.com/failgood/failgood/blob/main/failgood/jvm/test/failgood/mock/MockExample.kt
d
But actually I wonder: shouldn’t timeouts be the responsibility of the test framework?
Ideally, maybe, but as far as I know, there's no cross-platform mechanism for this. When writing tests in common code, you have no way to indicate "This particular test may take a whole minute, whereas all other tests take ten seconds at most".
Everyone creates their own run test functions because busy and possibly slow CIs, as well as slow byte buddy making all tests flaky: This doesn’t make sense to me because of a huge overhead
"Everyone" here is a bit misleading. The overwhelming majority of tests don't have this problem. 10 seconds is an enormous amount of time by computing standards.
Adding an option to make it globally configurable, i.e. through an env variable - could be a viable option.
Agreed. If there's enough demand, we should look into it.
Revering the change - does make sense to me. I rather wait 50 seconds longer for a really stale test instead of finding out the CI failed again due to flakiness and restarting the tests a few times until they succeed
This only makes sense if the problems from having a 10-second default are worse than the problems from not having it, and in the issue you linked, there are some compelling arguments for this default. If it turns out that the default actually doesn't make sense for most use cases, then this change should be reverted, sure, but so far it's not at all clear.
Moving all mocking setup outside run test. That will only partly work because often times it’s the first invocation that’s slow. Also this forces users to workaround and split up their logic due to technical reasons
Maybe you have a case here. Depends on what the mocking code looks like. I imagined something like
Copy code
@Test fun testObtainingWebpage() = runTest {
  ensureStableConnection()
  val result = getPage("<https://google.com/>")
  assertIs<Success>(result)
  assertEquals("Google", result.title)
}

@Test fun testTimingOut() = runTest {
  ensureFlakyConnection()
  val result = getPage("<https://google.com>")
  assertIs<Failure>(result)
}

@BeforeTest fun mockConnection() {
  // initialization here
}
So the code in
runTest
roughly looks like some execution similar to what's going on in production, interspersed with checking the results and controlling the details of the execution context, and the code in
BeforeTest
performs the initialization of the components needed for that. Maybe the typical usage of mocking is somehow different and doesn't map to this pattern, I don't really know.
Removing the timeout from runTest and instead adding a runTestWithTimeout function - also a viable option.
This, too, depends on the use case that's most commonly needed. If it turns out that the overwhelming majority of people do have a test suite where most tests finish in under a second and for a test to take 10 seconds certainly means that the test hanged, then the right choice,
runTest
with a timeout, should be easily accessible.
My use case is similar in terms of, it is normal the test takes more time, it is using property testing, so it is running a lot of tests in one
It's surprising to see that someone is using property testing with coroutines! I'd expect to see property tests of some well-defined pure code that performs a specific computation, but not property tests of concurrent procedures.
p
Our tests which use coroutines are not really doing big delays. For instance we use coroutines for all DB operations. For instance: withContext(Dispatchers.IO){ db.writeStuff() } In many cases it’s not easy to move out the mocking call outside (and that would also put a lot of mental burden to the developer to not forget about this). For instance we have some uitlity functions that setup classes with their mocking, and these functions also take the scope of the runTest as an input in order to properly setup the dispatchers. In our case, even the simplest tests start failing on the CI when it’s under a high load. So we really would need to adjust the timeout for every single test. Just this block of code:
Copy code
@Test
  fun myMockingTest() {
    measureTime {
      mockk<UUID>()
    }.also {
      println("That took $it")
    }
  }
Takes 1.5 seconds on my M1 Max. On a slower CI runner which is also possibly running multiple projects in parallel using containerization this is already getting flaky
(And it’s just the initialization of byte buddy. If I add it in a repeat loop we’ll end with
Copy code
That took 1.409419875s
That took 173.458us
That took 61.917us
That took 61.208us
That took 110.25us
That took 64.5us
That took 60.083us
That took 62.875us
That took 57.417us
That took 58us
j
@Dmitry Khalanskiy [JB] in my case is with Kotest, so it is like a "forEach" function with coroutines. I think the long term, ideal, support would be on allowing
suspend
test, and the framework can adapt to it, allowing all features it has but on the coroutines world too
p
g
apart from that I think most people have a shared class that is imported in all test cases, and there i would put it.
It's not true in my experience, there is just nothing to share in most cases, and a vast majority of projects on which I work don't have any base classes And it's even more rare recently with multi-module projects and composition over inheritance approach
my solution is to not use mockk.
It's not a general solution, is an attempt to have workaround for particular case
10 seconds is an enormous amount of time by computing standards
It really depends what is going on and what kind project it is For example when sync of the project in IDEA takes 5-10 minutes, or when execution of hello world Kotlin script takes a couple of seconds thanks to classpath size and jvm warm up
I think it's not a question of what is ideal test, and why some test suits run in milliseconds and some in hoours. And should one use mocking library or not. Or which mock library is better All those discussions are distracting from actual issues of hardcoded timeout 10 second timeout doesn't help make slow tests faster, the only actual help is that it fails faster in case there is a deadlock or other bug The issue is that now there is no simple way to configure it if I want a test to fail later or even faster, as you said, 3 seconds per whole test suite sounds great, in this case, why wold individual tests have 10 seconds timeout instead of 1 second or less The core or proposal is to make it configurable, not to force everyone to rewrite test suite just to avoid the default runTest timeout
👍 3
c
even in a test suite that runs in 3 seconds I use long timeouts, because like you said timeouts are just there to make sure the test suite does not hang forever, and not to indicate that a test is too slow.
g
60 seconds also will prevent it, same as 3, 10 or 100 seconds But the problem with a too short timeout is that I may need 20 more minutes to rerun all CI checks just because CI was under load and init took 11 seconds instead of the usual 3 🥹
👍 3