thanks < Vampire> for noticing and reporting a problem with kotlinlang #github-workflows-kt

thanks <@U0152P3VB5J> for noticing and reporting a...

Piotr Krzemiński

03/25/2025, 1:02 PM

thanks @Vampire for noticing and reporting a problem with Personal Access Tokens not being able to list versions for some actions (ref), and @LeoColman for driving a fix by adding a feature to authorize as a GitHub app where this problem doesn't exist 🎉 the power of community and open-source success baby

🦜 1

👌 1

Vampire

04/01/2025, 3:17 AM

Is something borked with the GitHub App? Errorcount increases and my workflows are failing with 500 for the maven-metadata.xml 😞 (cc @LeoColman)

Vampire

04/01/2025, 3:18 AM

For example

<https://bindings.krzeminski.it/Wandalen/wretry.action__main___major/maven-metadata.xml>

Piotr Krzemiński

04/01/2025, 4:47 AM

in the logs I see only:

Copy code

2025-04-01 04:30:17,005 <INFO > <eventLoopGroupProxy-4-5            > <[]> <{request-id=y585sbwpwgngn8t}> <                        io.ktor.server.Application> 500 Internal Server Error: GET - /Wandalen/wretry.action__main___major/maven-metadata.xml in 19799ms

no stack trace, very long latency...

Piotr Krzemiński

04/01/2025, 4:47 AM

ideas for next steps: • check if the token is fine, e.g. isn't marked as invalidated in GitHub's UI or something • try to reproduce locally

Piotr Krzemiński

04/01/2025, 4:48 AM

unfortunately I won't be able to take care of this today, so all hope is in Leo

Piotr Krzemiński

04/01/2025, 4:49 AM

@Vampire can you try useLocalBindingsServerAsFallback to unblock your workflows? (you'll have to regenerate the YAMLs)

Piotr Krzemiński

04/01/2025, 6:29 AM

Both private keys (what I called "token" previously) look healthy

Vampire

04/01/2025, 6:55 AM

I will not be able to regenerate, unless I use a local server with a working token. And the option wouldn't help anyway, unless I provide it with a token.

Vampire

04/01/2025, 6:56 AM

You should probably document that

Vampire

04/01/2025, 6:57 AM

Btw. with a locally started server with token it worked.

Piotr Krzemiński

04/01/2025, 7:24 AM

> You should probably document that yes, I'm planning to create a new page in the docs, devoted to the server, including what to do if it malfunctions

Vampire

04/01/2025, 7:31 AM

I meant documenting that you cannot use it if dynamic versions (anything else) are needed unless a token is provided that works with all used actions.

👍 1

Vampire

04/01/2025, 7:45 AM

Maybe the cached access token is expired somehow? Or maybe it has to do with summer-time switch?

Vampire

04/01/2025, 7:46 AM

But at 3:25 AM GMT+2 it still worked while at least since 4:41 AM GMT+2 it is broken

Vampire

04/01/2025, 8:05 AM

Btw. the client id should probably also be configured via env variable, that makes it easier to run an own instance with own GitHub App

👍 1

Vampire

04/01/2025, 8:11 AM

And probably the installation id whatever that is

👍 1

Vampire

04/01/2025, 8:33 AM

Copy code

2025-04-01 08:31:25,912 <TRACE> <eventLoopGroupProxy-4-8            > <[]> <{request-id=okucrz5scomwqmh}> <          io.ktor.client.plugins.HttpCallValidator> Processing exception java.io.EOFException: Failed to parse HTTP response: the server prematurely closed the connection for request <https://api.github.com/app/installations/62885502/access_tokens>
2025-04-01 08:31:25,913 <DEBUG> <eventLoopGroupProxy-4-8            > <[]> <{request-id=okucrz5scomwqmh}> <                        io.ktor.server.Application> Unhandled: GET - /Wandalen/wretry.action__main___major/maven-metadata.xml. Exception class java.io.EOFException: Failed to parse HTTP response: the server prematurely closed the connection
java.io.EOFException: Failed to parse HTTP response: the server prematurely closed the connection
	at io.ktor.client.engine.cio.UtilsKt$readResponse$2.invokeSuspend(utils.kt:172) ~[ktor-client-cio-jvm-3.1.1.jar:3.1.1]
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[kotlin-stdlib-2.1.20.jar:2.1.20-release-217]
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:113) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
2025-04-01 08:31:25,923 <INFO > <eventLoopGroupProxy-4-8            > <[]> <{request-id=okucrz5scomwqmh}> <                        io.ktor.server.Application> 500 Internal Server Error: GET - /Wandalen/wretry.action__main___major/maven-metadata.xml in 20016ms

Vampire

04/01/2025, 8:35 AM

So it fails to get an access token after 20 seconds

Piotr Krzemiński

04/01/2025, 8:38 AM

interesting, why did it suddenly stop working...? 🤔 maybe something wrong with egress (networking)?

Piotr Krzemiński

04/01/2025, 8:39 AM

but on the other hand, fetching actions' manifests works. Maybe it's a different endpoint

Piotr Krzemiński

04/01/2025, 8:40 AM

did you do anything special to get the stack trace, or did I miss it in the logs?

Vampire

04/01/2025, 8:40 AM

Look at the log level 🙂

Vampire

04/01/2025, 8:40 AM

By default the server is configured to log INFO+

Piotr Krzemiński

04/01/2025, 8:40 AM

oh... 💡

Piotr Krzemiński

04/01/2025, 8:40 AM

we should be able to dynamically change the log level

Vampire

04/01/2025, 8:41 AM

I did

Piotr Krzemiński

04/01/2025, 8:41 AM

it's possible already with the Log4j config: https://github.com/typesafegithub/github-workflows-kt/blob/caca02b0800944d3acb643ecf880f57036ce12ed/jit-binding-server/src/main/resources/log4j2.xml#L12

Vampire

04/01/2025, 8:41 AM

Exactly

Piotr Krzemiński

04/01/2025, 8:41 AM

I just didn't think of it in the morning, TIL

Vampire

04/01/2025, 8:41 AM

Like I configured the logging, the config file is not in the jar but separate and watched for changes

👍 1

Vampire

04/01/2025, 8:42 AM

So I just opened the console to the docker container and exchanged INFO by ALL in the file

👍 1

Vampire

04/01/2025, 8:42 AM

Alternatively you can also set

LOG_LEVEL

env variable and restart the container, but with editing the file you do not have service interuption

Vampire

04/01/2025, 8:43 AM

And do not loose the state that might be part of the problem

Piotr Krzemiński

04/01/2025, 9:10 AM

but separate and watched for changes

thanks, valuable info!

Vampire

04/01/2025, 9:34 AM

Network-wise it seems to work, I just did the access token request from the docker container using curl and it worked

Piotr Krzemiński

04/01/2025, 9:36 AM

I meant networking on Leo's cluster, maybe DNS broke or something?

Vampire

04/01/2025, 9:37 AM

I did it from the actual bindings server docker container that is right now running the bindings server

Piotr Krzemiński

04/01/2025, 9:37 AM

ah! got it

Vampire

04/01/2025, 9:38 AM

It does work fine when I run the server locally with the app private key

🤔 1

Piotr Krzemiński

04/01/2025, 9:39 AM

to be checked if it stopped working upon deployment, but thinking quickly, I think the timeline doesn't match

Vampire

04/01/2025, 9:39 AM

According to Grafana uptime is 5.75 days, so I'd say no

Vampire

04/01/2025, 9:40 AM

The docker container even says uptime 32 days 😕

Piotr Krzemiński

04/01/2025, 9:40 AM

the latest deployment was on Sunday: https://github.com/typesafegithub/github-workflows-kt/actions/runs/14151082606, so I think Grafana is lying

Piotr Krzemiński

04/01/2025, 9:41 AM

unless this step failed silently: https://github.com/typesafegithub/github-workflows-kt/blob/db1830e782e74cb9e181a1247f6273d4784e1e2a/.github/workflows/bindings-server.main.kts#L171

Vampire

04/01/2025, 9:41 AM

I think the docker container is lying with 32 days, but the container also says "last update" was "2025-03-26 164307" which matches what Grafana says

Piotr Krzemiński

04/01/2025, 9:42 AM

if so, then the question is why didn't the Sunday's deployment perform the actual image refresh

Piotr Krzemiński

04/01/2025, 9:42 AM

a few questions to answer here 🙂

Vampire

04/01/2025, 9:42 AM

definitely

Piotr Krzemiński

04/01/2025, 9:43 AM

https://hub.docker.com/r/krzema12/github-workflows-kt-jit-binding-server/tags

Vampire

04/01/2025, 9:47 AM

Yeah, no, that is not the image used. It uses

sha256:a5b10be062341cb9bdbe2abbf3026145ca58fca5e09df4d29977978fef24459f

Vampire

04/01/2025, 9:47 AM

What is

TRIGGER_IMAGE_PULL

Piotr Krzemiński

04/01/2025, 9:49 AM

A web hook Leo set up a long time ago, and it's stored in a secret: https://github.com/typesafegithub/github-workflows-kt/blob/db1830e782e74cb9e181a1247f6273d4784e1e2a/.github/workflows/bindings-server.main.kts#L33

Piotr Krzemiński

04/01/2025, 9:49 AM

Maybe the web hook is gone, but then why doesn't the request fail? Maybe by default the exit code is 0...

Vampire

04/01/2025, 9:50 AM

Yeah, but what is the value? 🙂

👍 1

Vampire

04/01/2025, 9:50 AM

Not for public channel of course

Piotr Krzemiński

04/01/2025, 9:53 AM

I'm afraid GitHub doesn't let me view it, I'd have to hack around to get it in a workflow

Piotr Krzemiński

04/01/2025, 9:53 AM

It just lets me set a new value

Vampire

04/01/2025, 9:54 AM

Yeah, it's secret. 😄 And GitHub will also not print in but mask it if you try to print it. You would need to use tmate and then get it via SSH or something like that.

Vampire

04/01/2025, 9:57 AM

But anyway I don't see any changes since that image that should change anything, just minor library updates

👍 1

Vampire

04/01/2025, 9:59 AM

I'll restart the server now and see whether that at least fixes it for now, agreed?

Piotr Krzemiński

04/01/2025, 10:00 AM

sure

Vampire

04/01/2025, 10:02 AM

Yes, after restart it works again now, even without re-pull

Piotr Krzemiński

04/01/2025, 10:06 AM

🤔 very weird. Thanks for mitigating the problem!

Piotr Krzemiński

04/01/2025, 10:06 AM

maybe Leo will know something more about this probelm

Vampire

04/01/2025, 10:08 AM

Regarding the "pull trigger", both https://github.com/typesafegithub/github-workflows-kt/actions/runs/14151082606/job/39644397189 and https://github.com/typesafegithub/github-workflows-kt/actions/runs/14059892760/job/39368035063 (last manual run by you and scheduled run) show

Vampire

04/01/2025, 10:09 AM

So the

curl

per-se is successful, it just did not do what it should have done. I wonder why the manual run then actually triggered the service to be updated, unless Leo did it manually.

Piotr Krzemiński

04/01/2025, 10:11 AM

thanks for noticing the response 🤦‍♂️ I didn't think of it

Piotr Krzemiński

04/01/2025, 10:12 AM

it hasn't been working for months

Vampire

04/01/2025, 10:12 AM

https://docs.portainer.io/user/docker/containers/webhooks says webhooks are only for portainer business edition. Maybe he had some evaluation period where it worked but is not anymore as it is the community edition?

💡 1

Piotr Krzemiński

04/01/2025, 10:12 AM

I'll add an assertion that the call was successful, and ask Leo to create the webhook

👌 1

Vampire

04/01/2025, 10:17 AM

Ah, it was probably this stack webhook:

Vampire

04/01/2025, 10:18 AM

Hm, no, that webhook gives a different message

Copy code

{
  "details": "Object not found inside the database",
  "message": "Unable to find the stack by webhook ID"
}

Vampire

04/01/2025, 10:24 AM

Ah, it probably updated last week because of Leo updating https://github.com/LeoColman/MyStack/commit/a7e847b151061ad96fce0ec690b3bbc4d473ad88

Vampire

04/01/2025, 10:33 AM

So we have: • externalize client id • externalize installation id • better error logging if access token retrieval fails after 20 seconds (currently only a stacktrace on DEBUG level) ◦ Use https://hub4j.github.io/github-api/ for talking to GitHub API for increased reliability and built-in error-handling • fail on failing auto-deployment • fix auto-deployment • hopefully find out why access token retrieval failed unless restarted and how to fix it (probably with GH API error handling there is more and important information i.e. the error message by GH) • fall back to PAT if authenticating with the GitHub app fails + record a metric for such events • anything else?

👍 1

Piotr Krzemiński

04/01/2025, 10:36 AM

I'd add a better error message logged together with the HTTP 500, right now it was just

500 Internal Server Error: GET - /Wandalen/wretry.action__main___major/maven-metadata.xml in 19799ms

and even without the stack trace, I'd expect more info

Vampire

04/01/2025, 10:43 AM

I also wonder that even with log level

ALL

there was not that much information, nothing about what ktor tried to do, only the

EOFException

TRACE

and the stacktrace at

DEBUG

. I would have expected some more logging.

Vampire

04/01/2025, 10:44 AM

But besides that, the missing details about the problem is probably the same as recently with the PAT-problem.

Vampire

04/01/2025, 10:44 AM

You try to parse the response as access token response but you get some error response that fails to parse and thus gives that exception

👍 1

Vampire

04/01/2025, 10:48 AM

Probably also the same can happen at the other two

body

calls where you get tagger from tag or author from commit

👍 1

Piotr Krzemiński

04/01/2025, 10:50 AM

it's this kind of code that is easy to write the happy path handling for, and easy to forget about the "what if" path 😄

Vampire

04/01/2025, 10:51 AM

Yeah, better to use a GitHub API wrapper that does proper error handling 😄 😛

Piotr Krzemiński

04/01/2025, 10:53 AM

hmm, I cannot recall if I didn't use the library because no decent one/official was available, or I thought I'll quickly code it on my own... now I see https://hub4j.github.io/github-api/, maybe there's something Kotlin-specific? If you have experience with a specific lib, let me know!

Vampire

04/01/2025, 10:54 AM

With the PAT-problem you said you just didn't want to use a big lib for "just one call", which obviously it is not. 😄

Vampire

04/01/2025, 10:54 AM

That's the one I typically used for talking to GitHub API too

Piotr Krzemiński

04/01/2025, 10:55 AM

I obviously missed the moment when the project grew sufficiently to justify adding a dependency on the lib 😛

Piotr Krzemiński

04/01/2025, 10:55 AM

which I'm partially happy about

Piotr Krzemiński

04/01/2025, 10:57 AM

could you edit your list of action items to add using the lib?

👌 1

Vampire

04/01/2025, 10:57 AM

Well, I usually use the

net.wooga.github

Gradle plugin which uses that API under-the-hood and exposes it for usage in custom tasks

👍 1

Vampire

04/01/2025, 10:57 AM

https://github.com/Vampire/setup-wsl/blob/master/gradle/build-logic/src/main/kotlin/net/kautler/publishing.gradle.kts#L230

LeoColman

04/01/2025, 11:41 AM

I'll read this with more details later today, but @Piotr Krzemiński briefed me of what is going on and I'm up-to-date 🙂

👍 1

👌 1

Piotr Krzemiński

04/01/2025, 11:42 AM

and we figured out one more action item: fall back to PAT if authenticating with the GitHub app fails + record a metric for such events

Vampire

04/01/2025, 1:16 PM

added above

👍 1

LeoColman

04/01/2025, 1:18 PM

We fixed the webhook for deploy The previous one was offline, but a new docker image made the server fetch in 5 minutes and deploy the new image so deploy was working. Not it's working as intended

Vampire

04/01/2025, 1:19 PM

Where is that webhook defined?

LeoColman

04/01/2025, 1:20 PM

Github Actions Secrets, I think. @Piotr Krzemiński has access

LeoColman

04/01/2025, 1:20 PM

It's generated by portainer at the host

Vampire

04/01/2025, 1:20 PM

I meant the webhook side, not the GHA side

Vampire

04/01/2025, 1:21 PM

Where in portainer? I only found one on the stack that seems to only work with business edition

LeoColman

04/01/2025, 1:22 PM

I only found one on the stack that seems to only work with business edition

That's the one I don't really know what it means with the toggleable features, but we tested and the deploy worked

Piotr Krzemiński

04/01/2025, 1:23 PM

to be precise: the webhook was called successfully, but I haven't actually checked if the request was honored by Portainer

🤔 1

Vampire

04/01/2025, 1:24 PM

Really? Interesting. I copied that link and sent it a POST request but it said the hook was not found, but also with a different detail message than the workflow showed

Vampire

04/01/2025, 1:24 PM

Now the call to the webhook indeed is successful. What did you do to make the webhook work again?

LeoColman

04/01/2025, 1:25 PM

Just moved from polling to webhook 😂

Piotr Krzemiński

04/01/2025, 1:26 PM

I got a new webhook URL from Leo and put it under the secret in GH

Vampire

04/01/2025, 1:27 PM

That is not alternative, is it? I understood that it polls and additionally you can trigger the webhook.

Vampire

04/01/2025, 1:27 PM

Alos, triggering the webhook does not do anything.

LeoColman

04/01/2025, 1:27 PM

I understood that it polls and additionally you can trigger the webhook.

Hmm.. Maybe, I don't really know

LeoColman

04/01/2025, 1:27 PM

to be precise: the webhook was called successfully, but I haven't actually checked if the request was honored by Portainer

It didn't honor. It's still on an old image

Vampire

04/01/2025, 1:27 PM

I just triggered the webhook "successfully", but the container is still the one I restarted at 12:00 CEST

Vampire

04/01/2025, 1:28 PM

Probably because the webhook just gets the compose file from your repo and the web ui says if there was a change it would apply it

Vampire

04/01/2025, 1:28 PM

But there was no change

Vampire

04/01/2025, 1:28 PM

To work as intended the re-pull image would need to work and that only works on business feature

LeoColman

04/01/2025, 1:28 PM

So we'd need the repull from business edition anyway?

LeoColman

04/01/2025, 1:28 PM

Damn. Let me investigate further, but I think you're right. I'll see if there's a workaround without having an explicit version

Vampire

04/01/2025, 1:28 PM

unless you invent something else somehow that triggers it

Vampire

04/01/2025, 1:30 PM

We could for example add a "shutdown" api to the server so that it quits and then portainer will hopefully start a new container, but it would probably still not re-pull

LeoColman

04/01/2025, 1:30 PM

Could use watchdog

Vampire

04/01/2025, 1:31 PM

you would probably need to somehow make a pull from inside a container, but that does probably not work

LeoColman

04/01/2025, 1:32 PM

https://github.com/containrrr/watchtower I meant this guy

LeoColman

04/01/2025, 1:32 PM

Not watchdog 😂

Vampire

04/01/2025, 1:32 PM

You would probably need to make a commit to https://github.com/LeoColman/MyStack/blob/main/github-workflows-kt/docker-compose.yml that changes it with some UUID so that there is a change and then trigger the webhook

LeoColman

04/01/2025, 1:33 PM

that changes it with some UUID

Could even be the version tag

Vampire

04/01/2025, 1:33 PM

We do not recommend using Watchtower in a commercial or production environment

😄

Vampire

04/01/2025, 1:34 PM

But yeah, that watchtower is basically what I meant. You map the docker socket from the host into a container, so that the container can then tell the docker daemon to pull the image.

LeoColman

04/01/2025, 1:35 PM

It shouldn't be used in production because the alternative is better (using the right tool for the job), but our alternative is to implement it by hand, which I think is worse

🤷‍♂️ 1

LeoColman

04/01/2025, 1:37 PM

Maybe not feasible on the short term, but do you know if moving from Portainer to K8s is something feasible? I don't have that hard love for Portainer, and this would eventually solve many problems on all my stacks hahah

Vampire

04/01/2025, 1:47 PM

I used neither so far, sorry

Piotr Krzemiński

04/01/2025, 1:48 PM

I personally have bad experience with k8s, maybe because I haven't had an opportunity to learn it properly. For me, it makes simple things overly complex. But if you know it and manage it, and the service works, I'm fine with it 😄

LeoColman

04/02/2025, 12:16 PM

Watchtower enabled for our container only. Let's see how it goes. Updates should happen as soon as the docker image is created in dockerhub and the webhook is no longer necessary (probably worked only as a placebo anyways)

❤️ 2

Piotr Krzemiński

04/02/2025, 1:07 PM

ok, I'll remove calling the webhook, thanks!

👌 1

LeoColman

04/02/2025, 11:43 PM

time="2025-04-02T234321Z" level=warning msg="Could not do a head request for \"krzema12/github-workflows-kt-jit-binding-server:latest@sha256:4d302b1122bed8f58bcf56f4bbe6fa27517d9f1d653cb76feaf626c4439b06b8\", falling back to regular pull." container=/github-workflows-kt_github-workflows-kt.1.xojfjx7xhyho6xdwu6ikur6f0 image="krzema12/github-workflows-kt-jit-binding-server:latest@sha256:4d302b1122bed8f58bcf56f4bbe6fa27517d9f1d653cb76feaf626c4439b06b8"

time="2025-04-02T234321Z" level=warning msg="Reason: Parsed container image ref has no tag: docker.io/krzema12/github-workflows-kt-jit-binding-server@sha256:4d302b1122bed8f58bcf56f4bbe6fa27517d9f1d653cb76feaf626c4439b06b8" container=/github-workflows-kt_github-workflows-kt.1.xojfjx7xhyho6xdwu6ikur6f0 image="krzema12/github-workflows-kt-jit-binding-server:latest@sha256:4d302b1122bed8f58bcf56f4bbe6fa27517d9f1d653cb76feaf626c4439b06b8"

time="2025-04-02T234322Z" level=info msg="Session done" Failed=0 Scanned=1 Updated=0 notify=no

Some problems with watchtower

😭 2

Piotr Krzemiński

04/03/2025, 6:19 AM

@LeoColman getting failures when listing versions (e.g. https://bindings.krzeminski.it/actions/checkout/maven-metadata.xml) logs:

```java.lang.IllegalStateException: Missing environment variables for generating an auth token. There are two options:

1. Create a personal access token at https://github.com/settings/tokens.

The token needs to have public_repo scope. Then, set it in
GITHUB_TOKEN
env var.

With this approach, listing versions for some actions may not work.

2. Create a GitHub app, and generate a private key. Then, set it in
APP_PRIVATE_KEY
env var.

With this approach, listing versions for all actions works.```

Piotr Krzemiński

04/03/2025, 6:20 AM

APP_PRIVATE_KEY

is filled,

GITHUB_TOKEN

is empty - I think we should remove

GITHUB_TOKEN

entirely, let me try it

Piotr Krzemiński

04/03/2025, 6:22 AM

the same is happening

Piotr Krzemiński

04/03/2025, 6:25 AM

ok, works - I needed to point to the newest image manually in Portainer

Vampire

04/03/2025, 7:34 AM

Yeah, that's the portainer issue Leo mentioned above probably

Piotr Krzemiński

04/03/2025, 9:55 AM

the two env vars (for the client and installation ID) were absent, I had to re-add them 🤔

LeoColman

04/03/2025, 10:17 AM

In the container itself? I think they were set for the environment

Piotr Krzemiński

04/03/2025, 10:18 AM

Yes, sorry, maybe - so now we may have duplicated env vars

LeoColman

04/03/2025, 10:23 AM

Ok, it was actually problematic. I didn't tell the stack to get the env variables from the env

LeoColman

04/03/2025, 10:23 AM

Your solution was the right one for a temp fix

👍 1

👌 1

Vampire

04/09/2025, 11:36 AM

As far as I can see most points of above are handled. Missing is • migrating remaining calls to GitHub API lib • adding a metric when falling back to token, currently only a warning is logged 🙂

👍 1

Piotr Krzemiński

04/09/2025, 11:55 AM

yep, both tracked in the issues

👌 1

Vampire

04/11/2025, 6:47 AM

Something is going on again

Vampire

04/11/2025, 7:38 AM

Copy code

2025-04-11 07:36:37,166 <TRACE> <eventLoopGroupProxy-4-6            > <[]> <{request-id=whdh0up66n95zs5}> <typesafegithub.workflows.shared.internal.GithubApi> REQUEST <https://api.github.com/repos/actions/cache/git/matching-refs/tags/v> failed with exception: java.io.EOFException: Not enough data available
2025-04-11 07:36:37,166 <TRACE> <eventLoopGroupProxy-4-6            > <[]> <{request-id=whdh0up66n95zs5}> <          io.ktor.client.plugins.HttpCallValidator> Processing exception java.io.EOFException: Not enough data available for request <https://api.github.com/repos/actions/cache/git/matching-refs/tags/v>
2025-04-11 07:36:37,167 <DEBUG> <eventLoopGroupProxy-4-6            > <[]> <{request-id=whdh0up66n95zs5}> <                        io.ktor.server.Application> Unhandled: GET - /actions/cache__restore___major/maven-metadata.xml. Exception class java.io.EOFException: Not enough data available
java.io.EOFException: Not enough data available
	at io.ktor.utils.io.ByteReadChannelOperationsKt.readByte(ByteReadChannelOperations.kt:48) ~[ktor-io-jvm-3.1.2.jar:3.1.2]
	at io.ktor.utils.io.ByteReadChannelOperationsKt$readByte$1.invokeSuspend(ByteReadChannelOperations.kt) ~[ktor-io-jvm-3.1.2.jar:3.1.2]
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[kotlin-stdlib-2.1.20.jar:2.1.20-release-217]
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:113) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704) ~[kotlinx-coroutines-core-jvm-1.10.1.jar:?]

Vampire

04/11/2025, 7:42 AM

Doing the same request with the same bearer token from my machine works without problems 😕

Vampire

04/11/2025, 7:43 AM

I restarted the container, now it'S works again

thank you color 1

Piotr Krzemiński

04/11/2025, 7:44 AM

ack, @LeoColman we should get the alerting up and running 😬

👍 1

Piotr Krzemiński

04/11/2025, 7:44 AM

this time the problem is different than previously when it stopped working, right?

Vampire

04/11/2025, 7:44 AM

Vampire

04/11/2025, 7:44 AM

The only difference is that we now have more logging

👍 1

Vampire

04/11/2025, 7:45 AM

Well, last incident we only got the 500 line so it could well be a different problem

Piotr Krzemiński

04/11/2025, 7:45 AM

I cannot investigate right now (crunch at work), let's see how it behaves after the restart

Vampire

04/11/2025, 7:45 AM

But it is really strange that the container cannot get the response while with the same bearer token I locally can get it

👍 1

Vampire

04/11/2025, 7:46 AM

Right now it works again

Vampire

04/11/2025, 7:47 AM

If it would be rate limiting, there should be an according response, not just nothing

Piotr Krzemiński

04/11/2025, 9:33 AM

and also if it was rate limiting, the restart wouldn't fix the issue because the rate depend on the number of incoming calls - we don't have e.g. some worker threads that could make the calls on their own

Piotr Krzemiński

04/11/2025, 2:35 PM

I'm wondering if it has something to do with the large number of "runnable" threads. Probably the thread pool just grows to a particular upper cap, and it's only around 100 so it doesn't seem bad, I just observed it

Vampire

04/11/2025, 5:27 PM

Could well be the problem though, yes

LeoColman

04/11/2025, 9:00 PM

You guys can add yourselves to https://grafana.bindings.colman.com.br/alerting/notifications?search= When I create the alert, Grafana will send an e-mail to you when the conditions (>=1 error) are met

👍 1

Piotr Krzemiński

04/12/2025, 9:26 PM

added myself as well, but there's this warning next to our contact points - does it work in practice?

LeoColman

04/12/2025, 9:31 PM

No, currently it's doing nothing. I wanted you to take a look at the platform before adding an alert

LeoColman

04/12/2025, 9:31 PM

What we are missing is defining how many errors for an alert and I think we are good to create an alert

Vampire

04/12/2025, 11:53 PM

1? 🙂

👍 1

Piotr Krzemiński

04/13/2025, 2:59 PM

Yeah, any failure is worth investigating

Piotr Krzemiński

04/13/2025, 3:01 PM

BTW, I usually call an "error" a user error, so HTTP 4xx. Failures are server faults, so HTTP 5xx. It's good to agree our common terminology

LeoColman

04/13/2025, 3:08 PM

Ok for me to email only on server failures i.e. 5xx

👍 1

LeoColman

04/13/2025, 3:23 PM

Alerts for 5xx setup to our e-mails 🎉

👌 2

LeoColman

04/13/2025, 3:23 PM

Now to test it.... 🤔

Vampire

04/13/2025, 4:58 PM

Of course 5xx only, 4xx does not make sense unless we abuse it. 😄

Vampire

04/13/2025, 5:01 PM

I would just not use error and failure, especially as if I would use it the other way around, in accordance with tests, were a failure is the softer thing that means a should have been 1 but was 2, while an error is an unexpected exception or something else unexpected. So best just be explicit and call it user error and server error or 4xx error and 5xx error, then there can hardly be misunderstandings. 🙂

👍 1

20 Views

Open in Slack

Previous Next