Was there an update to the bindings server yesterd...
# github-workflows-kt
v
Was there an update to the bindings server yesterday? A workflow failed yesterday due to the server: https://github.com/spockframework/spock/actions/runs/13081698170/job/36506444705#step:3:7 If it was due to an update, can we maybe introduce a less disruptive update strategy like first starting the new server and only ditching the old one when the new is fully operationable or something similar? If it was not, what happened? :-) cc: @LeoColman
l
Yes, there was. A Kernel upgrade was applied at around 19h BRT. I see your failure was around 19:17 BRT, so I believe that must be the cause
If it was due to an update, can we maybe introduce a less disruptive update strategy like first starting the new server and only ditching the old one when the new is fully operationable or something similar?
I don't really know how to handle this scenario where a machine reboot is necessary It's manual, so I can choose the time, but we have a single machine only. Do you have any suggestion?
An easy thing I can do is delay updates and apply them once a month or so, but the disruption would still happen
v
Ah, ok, there it is hard without a second machine then, yes. What I fear is, that if it gets even more widely used and such interruptions affect many projects, that it might become bad for the reputation or people start at least switching off the consistency checks. My idea was about only a new software deployment, not a kernel update. :-)
l
The solution is probably having a load balancer that is able to point to a second machine when the first one goes down, but we don't have that kind of resources yet
The perfect solution for the moment would be a way to say "Hey, wait 20 seconds and try again" until the app is up again
o
Hi, sorry for intervening, but have you considered moving the server to a cloud provider? Many of them provide a free tier that might be sufficient for the server (for current load at least). https://github.com/cloudcommunity/Cloud-Free-Tier-Comparison It should definitely help with resolving unexpected interruptions and smooth updates
l
No problem on the intervention! Opinions are very welcome 🙂 No, that wasn't on our minds. We currently host on a bare metal on Hetzner The problem with free tiers is that they're not flexible and sometimes end without warnings. We also have a bit more of infrastructure (i.e. grafana and prometheus) that probably don't fit on a small machine that easily This given, as a secondary server behind a load balancer with just the application would solve this problem very well. A free tier machine could host the load balancer + the backup server
p
I actually did consider the cloud, but seemingly inherent issue there is that there's no way to set an upper limit for the spendings, if free tier is exhausted. I'm afraid that e. g. some bug or a very high traffic may lead to excessive logging, and logging costs. If we find a cloud solution that is indeed free (or with deterministic upper cost), I'm happy to consider it
Another problem is that with the current naive caching, more than one instance would cause some cache misses, and it would lead to increasing the time of the consistency check + refreshing the dependencies during development. This, however, could be somehow addressed, and is less of an issue than the cloud costs
l
for completeness sake:
My idea was about only a new software deployment, not a kernel update. 🙂
Software deployments are already rolled out with 2 pods up -- It waits for the upgrade to complete before shutting down the old pod
👍 1
👌 1