Parijat Shah
03/23/2023, 9:14 PMmbonnin
03/23/2023, 9:22 PMviewModelScope
that should be done automaticallyParijat Shah
03/27/2023, 4:47 PMwebSocketReopenWhen
subscription never recovers when device comes off the sleep mode.
I looked at the source code SubscriptionWsProtocol.kt
, `WsProtocol.kt`& WebSocketNetworkTransport.kt
and did some digging in and here is my findings.
1. When devices goes into sleep mode, java.net.SocketException ( software caused connection abort ) is thrown in WsProtocol on line number 149 in method run
2. In the WebSocketNetworkTransport.kt, on line 151
, we have a while(true)
block, that consumes messages produced by message
channel
3. When exception ( java.net.SocketException ) is thrown, the message of type NetworkError
is received as expected, and since we have enabled webSocketReopenWhen
the following code on line 157
gets executed.
if (reopenWhen?.invoke(message.cause, reopenAttemptCount) == true) {
reopenAttemptCount++
activeMessages.values.forEach {
// Re-queue all start messages
// This will restart the websocket
messages.trySend(it)
}
}
We have like 5 active subscriptions, so this will run 5 times , sending 5 messages of type Command
. Each of the messages will be handled by trying to create protocol
and initializing connection
line number 203 to 215
But since the device is still in the sleep mode, exception is thrown
try {
protocol!!.connectionInit()
} catch (e: Exception) {
// Error initializing the connection
protocol = null
messages.send(NetworkError(e))
continue
}
This will result in 5 more messages of type NetworkError
being send to the channel.
At this point, you will have 5 messages.
At the next iteration, line 152
the next message is received which we know is of type Network Error
, the same reopenWhen
on line 157
will get executed, but here just 1 Network Error
message, will now result in 5
messages of type Command
so when all 5
network error messages are received, we have already sent 25
messages of type Command
If the device is still in the sleep mode, these 25 messages of type Command
will result in another 25
messages of type Network Error
if you have 60 seconds delay for your reopenWhen
block, handling receiving these 25
messages will itself take 25
minutes and after that you will have 125
messages of type Command
and which will result in another 125
messages of type Network Error
You can see now that even after device
comes out of sleep mode and network is available, the message of type Command
is burried under Network Error
messages.
I think there is a need to revisit how the network error
is handled, one way could be to dedupe the messages. There is no need to send the same
message 125 times when all that you need is one
for logic to work.mbonnin
03/27/2023, 6:26 PMreopenWhen?.invoke(message.cause, reopenAttemptCount)
is running. It's a single message queue so if you suspend 60s in reopenWhen
then no other message can be queued?Parijat Shah
03/27/2023, 7:02 PMactiveMessages.values.forEach {
// Re-queue all start messages
// This will restart the websocket
messages.trySend(it)
}
This runs 5 times since activeMessage size is 5 , the message type `Command` is sent.
Now you are looking at Message queue : Command,Command,Command,Command,Command
Each of these will be handled by creating protocol and trying to initialize connection. Since network is still off , network exception is thrown
messages.send(NetworkError(e))
Now you are looking at the queue
NE, NE, NE, NE, NE
Each of this will result in reopenWhen?. suspending for 60 seconds, but after 3 minutes , the queue will look
Command, Command ...... Command ( total 25 messages )
Each of this command will result in 1 exception
so you are now looking at the queue
NE,NE,NE .... NE ( 25 message )
Now for each NE, you will wait 60 seconds and go through activeMessages loop, resulting in
Command, Command ..... Command ( total 125 message )
If network is still off , this will result in
NE, NE .. NE ( 125 message )
At this point you after 60 seconds, lets say network is back on, the message Command
won't get processed until
125 messages of type Network Error are processed.
Each of the NE requires 60 seconds of wait so we are easily looking at 2+ hours after network is back on to try and attempt to create protocol and initialize connection .
This can be easily reproduced, by enabling webSocketWhen
and turning off network once subscription starts. You will see the queue size growing rather quickly. If network is off for a long time, this queue can grow to a large number.
``````activeMessages
sooner you will run into the problem.mbonnin
03/27/2023, 7:05 PMParijat Shah
03/27/2023, 7:06 PMmbonnin
03/27/2023, 7:06 PMParijat Shah
03/27/2023, 7:07 PMmbonnin
03/27/2023, 7:08 PMParijat Shah
03/27/2023, 7:09 PM