Hi! Does anyone have any pointer in how you could debug a coroutine that doesn’t seem to wake up at the correct time? I have a test where a message is picked up on a queue which results in a request towards a stub (which returns 404). The problem is that even though the stub is up and ready it seems that the coroutine that is suspending (it’s a post from the ktor http client) is not “woken up” until after about 30 seconds when the same message gets republished on the queue. The coroutine that I suspect to not be woken up correctly is surrounded in a repeat, I see that first repeat happen but then it just goes silent
g
gildor
05/24/2019, 3:23 PM
Looks like some problem in coroutine adapter implementation, or may be some deadlock
gildor
05/24/2019, 3:25 PM
When coroutine is not unsuspended it's like do not receive call back, so there is no simple way to debug, except check every step of this code invocation
gildor
05/24/2019, 3:28 PM
Do you use channels? Or just suspend functions?
g
gotoOla
05/26/2019, 7:40 PM
Just suspending functions, one of them is a http post with the ktor client (which hits a 404 endpoint), the other "suspension point" is that the post is surrounded by a repeat if it does not succeed and then hits a delay()
g
gildor
05/26/2019, 11:27 PM
Are you sure that it is really never return? Maybe 30 second before repeat is not enough?
g
gotoOla
05/27/2019, 7:16 AM
ah, so the logic is like this:
Do a http call (ktorClient.post - suspending function)
If that fails, try again within 0.1 seconds - (delay, suspending function)
If that fails, try again within 0.5 seconds - (delay, suspending function)
If that fails try again within 1 second - (delay, suspending function)
if that fails -> throw Exception
I have dummy debug logs around pretty much everything at this point and even though the ktorClient calls a 404-endpoint (which returns immediately) it doesn’t return and log that it will retry until 30 seconds later. The message that triggers this whole loop has a “visibility timeout” of 30 seconds so the behavior that I am seeing is that when the message get’s republished suddenly the first coroutine wakes up as well and they are both dealt with
g
gildor
05/28/2019, 9:53 AM
It may be some bug of course, but hard to say without reproduction, maybe you could create some sample and report to ktor issue tracker