Hey everyone :wave: The Ktor team is considering i...
# ktor
a
Hey everyone 👋 The Ktor team is considering improvements to the OpenTelemetry experience with Ktor, and your feedback would be incredibly helpful! We have two quick requests: 1. Are there any specific metrics you’re missing from the default set provided by the OpenTelemetry SDK with the Ktor plugin? Please share your suggestions in the 🧵. 2. If you’re using OpenTelemetry in your Ktor project and are open to briefly chatting, please react to this message with a 🔭 emoji. I’ll reach out with a few quick questions. Thanks so much for your help.
🔭 2
m
hey, are you interested in dist tracing use with OTEL as well or only metrics? We have a mixed setup with prometheus metrics via micrometer and using OTEL for sending trace data to tempo
âž• 1
a
In the first question - we want to verify whether the standard OpenTelemetry SDK metrics suffice for most use cases, and — if we spot extra metrics that many teams add in common—it would make sense to bake those into our defaults.
m
I have some feedback on metrics in general: • I would really love to see some of the engine and coroutine metrics exposed in some way. By far our biggest problem with observability is some route blocking the netty coroutines and there's no visibility into when this happens. We do some horrible reflection today to get some of these things into prometheus. • this might be more about ktor errors / status codes, but the standard micrometer ktor plugin over-reports 5xx replies (e.g. a closed connection will count towards a 5xx response due to channel closed) which limits the per-route metric usefulness a bit. For some reason deserialization errors (e.g. request has an invalid shape when deserializing json with
kotlinx.serialization)
also generate 500 responses instead of 400. • ktor 3 launched without opentelemetry support for dist tracing, we had to operate an in-house fork of the ktor 2 otel tracing support for a while - it's fine if the alpha/beta/rc versions don't have all bases covered but a stable major version should have all of these pieces in place. Especially as the OTEL support was contributed by JB.
w.r.t reflection hacks, we specifically started to expose the netty call group event executor
pendingTasks()
to prometheus for gauging how well routes were behaving