I think I found a race in the ktor `SelectorManage...
# ktor
r
I think I found a race in the ktor
SelectorManager
on the JVM when it is closed almost immediately after opening it because of a connection failure. It appears to be a race between the
close
call (which is non-suspending) and coroutine cancellation. If I do the
close
, then delay 2s, then cancel the scope, the hang reproduces every time. If I cancel the scope, then do the close immediately or even after 2s, the hang reprdouces sometimes. Stack trace in 🧵 .
a
Can you please file an issue with the reproducer attached?
r
I updated the post with my findings. Here is the stack trace:
Copy code
Coroutine "selector#256":StandaloneCoroutine{Cancelling}@145e4c7c, state: SUSPENDED
	at io.ktor.network.selector.ActorSelectorManager.receiveOrNullSuspend(ActorSelectorManager.kt:162)
	at io.ktor.network.selector.ActorSelectorManager.process(ActorSelectorManager.kt:89)
	at io.ktor.network.selector.ActorSelectorManager$1.invokeSuspend(ActorSelectorManager.kt:43)
	(Coroutine creation stacktrace)
	at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
	at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
	at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
	at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
	at io.ktor.network.selector.ActorSelectorManager.<init>(ActorSelectorManager.kt:38)
	at io.ktor.network.selector.SelectorManagerKt.SelectorManager(SelectorManager.kt:13)
If this is not a known issue I'll see if I can create a reproducer I can share.
@Aleksei Tirman [JB] @e5l I still can't create a reproducer for this, but when it happens the server is shutting down, and the client is trying to connect. I see this packet flow:
Copy code
c -> s  SYN seq=0
s -> c  SYN, ACK seq=0 ack=1
c -> s  ACK seq=1 ack=1
s -> c  RST, ACK  seq=1 ack=1
The
RST, ACK
appears to have the wrong ack sequence, its acking 1 again though that has already been acked, unless I'm missing something. So it appears the ktor client never sees that
RST
and thinks the client connection is connected, but in fact it is not -- and now that connection's selector is hung and any operations on that connection (even trying to close it) hang. The server is also Ktor FWIW.
Here is the packet capture