Parijat Shah
03/23/2023, 9:14 PMmbonnin
03/23/2023, 9:22 PMmbonnin
03/23/2023, 9:23 PMviewModelScope that should be done automaticallyParijat Shah
03/27/2023, 4:47 PMwebSocketReopenWhen subscription never recovers when device comes off the sleep mode.
I looked at the source code SubscriptionWsProtocol.kt , `WsProtocol.kt`& WebSocketNetworkTransport.kt and did some digging in and here is my findings.
1. When devices goes into sleep mode, java.net.SocketException ( software caused connection abort ) is thrown in WsProtocol on line number 149 in method run
2. In the WebSocketNetworkTransport.kt, on line 151 , we have a while(true) block, that consumes messages produced by message channel
3. When exception ( java.net.SocketException ) is thrown, the message of type NetworkError is received as expected, and since we have enabled webSocketReopenWhen the following code on line 157 gets executed.
if (reopenWhen?.invoke(message.cause, reopenAttemptCount) == true) {
reopenAttemptCount++
activeMessages.values.forEach {
// Re-queue all start messages
// This will restart the websocket
messages.trySend(it)
}
}
We have like 5 active subscriptions, so this will run 5 times , sending 5 messages of type Command. Each of the messages will be handled by trying to create protocol and initializing connection line number 203 to 215
But since the device is still in the sleep mode, exception is thrown
try {
protocol!!.connectionInit()
} catch (e: Exception) {
// Error initializing the connection
protocol = null
messages.send(NetworkError(e))
continue
}
This will result in 5 more messages of type NetworkError being send to the channel.
At this point, you will have 5 messages.
At the next iteration, line 152 the next message is received which we know is of type Network Error , the same reopenWhen on line 157 will get executed, but here just 1 Network Error message, will now result in 5 messages of type Command so when all 5 network error messages are received, we have already sent 25 messages of type Command
If the device is still in the sleep mode, these 25 messages of type Command will result in another 25 messages of type Network Error
if you have 60 seconds delay for your reopenWhen block, handling receiving these 25 messages will itself take 25 minutes and after that you will have 125 messages of type Command and which will result in another 125 messages of type Network Error
You can see now that even after device comes out of sleep mode and network is available, the message of type Command is burried under Network Error messages.
I think there is a need to revisit how the network error is handled, one way could be to dedupe the messages. There is no need to send the same message 125 times when all that you need is one for logic to work.mbonnin
03/27/2023, 6:26 PMmbonnin
03/27/2023, 6:28 PMreopenWhen?.invoke(message.cause, reopenAttemptCount) is running. It's a single message queue so if you suspend 60s in reopenWhen then no other message can be queued?mbonnin
03/27/2023, 6:28 PMParijat Shah
03/27/2023, 7:02 PMactiveMessages.values.forEach {
// Re-queue all start messages
// This will restart the websocket
messages.trySend(it)
}
This runs 5 times since activeMessage size is 5 , the message type `Command` is sent.
Now you are looking at Message queue : Command,Command,Command,Command,Command
Each of these will be handled by creating protocol and trying to initialize connection. Since network is still off , network exception is thrown
messages.send(NetworkError(e))
Now you are looking at the queue
NE, NE, NE, NE, NE
Each of this will result in reopenWhen?. suspending for 60 seconds, but after 3 minutes , the queue will look
Command, Command ...... Command ( total 25 messages )
Each of this command will result in 1 exception
so you are now looking at the queue
NE,NE,NE .... NE ( 25 message )
Now for each NE, you will wait 60 seconds and go through activeMessages loop, resulting in
Command, Command ..... Command ( total 125 message )
If network is still off , this will result in
NE, NE .. NE ( 125 message )
At this point you after 60 seconds, lets say network is back on, the message Command won't get processed until
125 messages of type Network Error are processed.
Each of the NE requires 60 seconds of wait so we are easily looking at 2+ hours after network is back on to try and attempt to create protocol and initialize connection .
This can be easily reproduced, by enabling webSocketWhen and turning off network once subscription starts. You will see the queue size growing rather quickly. If network is off for a long time, this queue can grow to a large number.
``````Parijat Shah
03/27/2023, 7:05 PMactiveMessages sooner you will run into the problem.mbonnin
03/27/2023, 7:05 PMParijat Shah
03/27/2023, 7:06 PMmbonnin
03/27/2023, 7:06 PMParijat Shah
03/27/2023, 7:07 PMmbonnin
03/27/2023, 7:08 PMParijat Shah
03/27/2023, 7:09 PMParijat Shah
03/27/2023, 7:33 PM