Hello. Is there anyway to enable pings in Apollo ...
# apollo-kotlin
p
Hello. Is there anyway to enable pings in Apollo kotlin client for websocket? I am running into problem while using subscription when device goes into sleep mode. Our backend sends a message at fixed interval but when device goes into sleep mode, network turns off and eventually our backend will give up without ack and closes the connection. In the client, since network is off FIN is never received . When eventually device wakes up, if client is sending pings, it will get exception on write and hopefully will be handled by establishing a new socket connection. Without ping/exception, I have a stale connection.
m
Depends your protocol.
GraphQLWsProtocol
has
pingPayload
and
pongPayload
(source )
More generally, I'd recommend closing the subscription when the device goes in deep sleep. If you run your subscriptions in
viewModelScope
that should be done automatically
p
Hi Martin. I confirmed that exception is actually thrown eventually when device goes into sleep mode. I was wrong concluding that it is not thrown after finding out that even with
webSocketReopenWhen
subscription never recovers when device comes off the sleep mode. I looked at the source code
SubscriptionWsProtocol.kt
, `WsProtocol.kt`&
WebSocketNetworkTransport.kt
and did some digging in and here is my findings. 1. When devices goes into sleep mode, java.net.SocketException ( software caused connection abort ) is thrown in WsProtocol on line number 149 in method
run
2. In the WebSocketNetworkTransport.kt, on line
151
, we have a
while(true)
block, that consumes messages produced by
message
channel 3. When exception ( java.net.SocketException ) is thrown, the message of type
NetworkError
is received as expected, and since we have enabled
webSocketReopenWhen
the following code on line
157
gets executed.
Copy code
if (reopenWhen?.invoke(message.cause, reopenAttemptCount) == true) {
              reopenAttemptCount++
              activeMessages.values.forEach {
                // Re-queue all start messages
                // This will restart the websocket
                messages.trySend(it)
              }
            }
We have like 5 active subscriptions, so this will run 5 times , sending 5 messages of type
Command
. Each of the messages will be handled by trying to create
protocol
and initializing
connection
line number
203 to 215
But since the device is still in the sleep mode, exception is thrown
Copy code
try {
              protocol!!.connectionInit()
            } catch (e: Exception) {
              // Error initializing the connection
              protocol = null
              messages.send(NetworkError(e))
              continue
            }
This will result in 5 more messages of type
NetworkError
being send to the channel. At this point, you will have 5 messages. At the next iteration, line
152
the next message is received which we know is of type
Network Error
, the same
reopenWhen
on line
157
will get executed, but here just 1
Network Error
message, will now result in
5
messages of type
Command
so when all
5
network error messages are received, we have already sent
25
messages of type
Command
If the device is still in the sleep mode, these 25 messages of type
Command
will result in another
25
messages of type
Network Error
if you have 60 seconds delay for your
reopenWhen
block, handling receiving these
25
messages will itself take
25
minutes and after that you will have
125
messages of type
Command
and which will result in another
125
messages of type
Network Error
You can see now that even after
device
comes out of sleep mode and network is available, the message of type
Command
is burried under
Network Error
messages. I think there is a need to revisit how the
network error
is handled, one way could be to dedupe the messages. There is no need to send the
same
message 125 times when all that you need is
one
for logic to work.
m
Thanks for the thorough review!
What I don't understand is how come you continue enqueuing messaging while
reopenWhen?.invoke(message.cause, reopenAttemptCount)
is running. It's a single message queue so if you suspend 60s in
reopenWhen
then no other message can be queued?
I know it's not easy with subscriptions but if you have a reproducer, that'd help a ton
p
I think I didn't put it correctly. After 60 seconds wait is over,
Copy code
activeMessages.values.forEach {
                // Re-queue all start messages
                // This will restart the websocket
                messages.trySend(it)
              }

This runs 5 times since activeMessage size is 5 , the message type `Command` is sent.  
Now you are looking at Message queue : Command,Command,Command,Command,Command

Each of these will be handled by creating protocol and trying to initialize connection. Since network is still off , network exception is thrown 

messages.send(NetworkError(e))
Now you are looking at the queue NE, NE, NE, NE, NE Each of this will result in reopenWhen?. suspending for 60 seconds, but after 3 minutes , the queue will look Command, Command ...... Command ( total 25 messages ) Each of this command will result in 1 exception so you are now looking at the queue NE,NE,NE .... NE ( 25 message ) Now for each NE, you will wait 60 seconds and go through activeMessages loop, resulting in Command, Command ..... Command ( total 125 message ) If network is still off , this will result in NE, NE .. NE ( 125 message ) At this point you after 60 seconds, lets say network is back on, the message
Command
won't get processed until 125 messages of type Network Error are processed. Each of the NE requires 60 seconds of wait so we are easily looking at 2+ hours after network is back on to try and attempt to create protocol and initialize connection . This can be easily reproduced, by enabling
webSocketWhen
and turning off network once subscription starts. You will see the queue size growing rather quickly. If network is off for a long time, this queue can grow to a large number. ``````
Also you can't reproduce it with just one subscription. You need to have more than 1 the larger the size of
activeMessages
sooner you will run into the problem.
m
I'll take a look. Do you mind opening an issue so that we keep track of this?
p
How do I go about it?
p
Will do that. My description of the problem may not be the best. I am not much of a writer . Hope that's ok.
m
You'd be surprised how much more detailed your report is compared to the average bug report 🙂 .
p
Ok. Will file the report then.