<@U0RM4EPC7> It seems like Ktor uses `Runtime.getR...
# functional
d
@simon.vergauwen It seems like Ktor uses
Runtime.getRuntime().addShutdownHook(hook)
, is that only SIGINT? It seems like in your video it was clearly using SIGINT, but in my current version not...
s
Hey @dave08, I was talking in the context of Kotlin/Native, JVM
ShutdownHook
also work on
SIGTERM
and other termination signals. If you prefer using
addShutdownHook { }
instead of
Resource
then also be sure to add
Thread.sleep(30_000)
to take into account the delay for the LoadBalancer/Ingress and to manually
close
all your resources like
PrometheusMetricRegistry
,
HikariDataSource
, Kafka, etc.
d
And
Resource
doesn't need to sleep? I mean, k8s wouldn't put up the new instance until the old one is down, so for a bunch of instances it would be another 30sec. per service to wait...
s
The
suspendapp-ktor
integration takes care of that of the delay, like I showed an explained in the webinar.
I mean, k8s wouldn't put up the new instance until the old one is down
With
RollingUpdates
or auto-scaling the problem is that the load balancer will still send requests to your terminating pods, and that will result in
502 Bad Gateway
without the delay. You can find the full example with more details, and references to other Kubernetes resource in the repo. https://github.com/nomisRev/ktor-k8s-zero-downtime
I also updated my other Ktor example this weekend to reflect all these changes, and updated libraries. https://github.com/nomisRev/ktor-arrow-example
d
And if I use their
application.environment.monitor.subscribe(ApplicationStopped)
hook + that addShutdownHook would I need that wait?
is that the load balancer will still send requests to your terminating pods
I thought the readinessProbe takes care of that? I'm surprised k8s isn't smart enough to stop sending requests to terminating pods... thanks for that piece of knowledge!
s
Sadly, it's not but k8s is not build for networking applications specifically. It's a higher level abstraction for distributed systems, albeit networking is probably it's most usage in practice. With any sufficiently large system that combines many different concerns there is sadly concern crossing issues. Network <~> distributed computation.
I thought the readinessProbe takes care of that?
Readiness probe is used for start-up not shutdown. Health probe is also not sufficient for this concern, also at least not by my experience in practice and all resources I come across and open issues I linked in the repo seem to confirm that.
d
We have Istio on our cluster, did you see the same problem with Istio?
s
I am not very familiar with Istio, but it could potentially fix this issue. Another popular alternative to sleep/delay in the application is adding a
bin/sleep
in the
preStop
hook of the K8s yaml files. Which delays the
SIGTERM
going to the pod, instead of delaying the
SIGTERM
inside the pod. Doing a quick search on their Github seems they is some open & closed issues related to this on Istio. Some are closed in December 2022, so possible fixed. If you see
502 Bad Gateway
around times that you're doing rolling updates / up-and-downscaling then it's probably not fixed. That might be a good indicator in your metric system to keep an eye out for.