Hi all We are using KTOR in a multiplatform library we devel kotlinlang #ktor

Hi all, We are using KTOR in a multiplatform libra...

Andreas Dybdahl

02/27/2025, 9:34 AM

Hi all, We are using KTOR in a multiplatform library we develop to share code between our mobile and tv app platforms, and use ktor to make API calls and handle authentication. We are using KTOR 3.0.3 currently, and are using OAuth using the Auth plugin and the Bearer auth configuration. We have been investigating an issue for 1-2 months now where users are logged out due to a failing refreshTokens call that returns a 401. We are recreating the issue by reloading a page in our app very quickly, and thus this makes a lot of API calls start, but soon after be canceled due to us making a new refresh in the app. Thus making a lot of API calls at the same time, and we are cancelling the calls to replace them with new ones soon after. The issue we are seeing generally follows this pattern: 1. The app makes API calls intermittently, and the accessToken expires and we get a 401 response on the call. 2. KTOR triggers the Bearer configs

refreshTokens

lambda, and we make a call to our backend using the

oldTokens.refreshToken

supplied to refresh the refresh token. 3. Step 2 succeeds, the token is updated and the call from step 1 completes successfully. 4. At some point in time the app makes 10-15 simultaneous requests again while the accessToken has just expired, and soon after starting the calls we are cancelling them again due to a new refresh happening in the page in the app. 5. KTOR triggers the

refreshTokens

lambda again two times (A and B) very quickly after one another. 6. Call A takes the

oldTokens.refreshToken

and refreshes successfully using our backend, and returns the BearerTokens object in the lambda with the now updated token. 7. Call B takes the

oldTokens.refreshToken

as well, but this time the token is the one from before step 6 had run, so this token is now invalid. It makes a refresh call to our backend and gets a 401 because the token supplied was not valid anymore because it was just refreshed and thus outdated 15ms before that. 8. Because we get a 401 on a refreshToken call, we return null from the

refreshTokens

lambda and the user is logged out and sent to the login screen. We have read and read up on the causes of the issue, and we have found in multiple threads on youtrack and in this channel that the Bearer auth handling in Ktor should be thread safe by design, but this does not seem to be the case with our findings. • We have tried implementing a locking mechanisn using a Mutex, that will make sure the

loadTokens

lambda is locked while the

refreshTokens

lambda is refreshing a token, but this does not work as the documentation on the plugin is wrongly stating that the loadTokens is called before every call, while this is not the case, it is only called once, and subsequent tokens are loaded from an internal cache in ktor. • We have tried implementing a mechanism that unmarshals the JWT at the start of the

refreshTokens

lambda and checks the expireTime of the token, and in the case the token is not yet expired, does not try to refresh it using our backend, but instead immediately returns the same token as was input in

oldTokens

. We did this because our locking mechanism did not prevent multiple

refreshTokens

calls from happening at almost the same time. This does not work either, this issue is still happening. • We have also tried making a locking mechanism using a Mutex that will make sure any read anywhere of a token from our internal token storage will be locked as soon as the

refreshTokens

lambda is running and vice versa, any token read operation will lock the

refreshTokens

until the read is done. This however makes parallel calls impossible to do, as they lock each other making calls synchronous and thus extremely slow, locking up the app, and causing an ANR that prompts the user to close the app because it is so slow that it is not responding. So we have tried quite a few things, but the issue persists, somehow the

refreshTokens

lambda is triggered twice in a very short timeframe, and the first request goes well and refreshes the token, and the second request does not receive this token in the

oldTokens

but receives the now invalidated refreshToken and thus makes a call, gets a 401 and logs the user out. We are working with the hypothesis that the two calls to the

refreshTokens

happen because KTOR allows the

refreshTokens

lambda to be called again immediately after, but now with the old token because Ktor thinks the token just returned was not valid due to the call being cancelled, although in reality we already made our API call to refresh before being cancelled thus the token was valid and should be replaces in the internal token cache. We have avoided the

clearToken

function that seems to cause more issues than it solves, and also by different users seems to cause multiple refresh calls as well. We are using the

markAsRefreshTokenRequest()

function on the request builder for the refresh token call we are making. We are using a single Ktor Http client to make all authenticated requests as recommended in the threads we could find on youtrack. Here is our current Bearer auth config

Copy code

install(Auth) {
    bearer {
        // Specify which calls don't need to refresh after a 401 response.
        sendWithoutRequest { request ->
            // This callback should return true when we are making a request to the login endpoint, as this endpoint should be sent without waiting for the 401.
            val result = !(request.url.encodedPath.contains("auth/token")
                    && (request.body as? PostAuthenticateUserRequestNetworkModel)?.grantType == GrantTypeNetworkModel.PASSWORD)

            Napier.d(tag = "Kmm.Auth", message = "SendWithoutRequest: $result")

            return@sendWithoutRequest result
        }

        // Invoked during requests
        loadTokens {
            Napier.d(tag = "Kmm.Auth", message = "LoadTokens")
            onLoadTokens(oAuthTokenStorage)
        }

        // Refresh invalid access token
        refreshTokens {
            refreshMutex.withLock {
                if (jwtUtils.parseJwtPayloadToExpireTime(oldTokens!!.accessToken)!!.epochSeconds > Clock.System.now().epochSeconds) {
                    return@refreshTokens oldTokens
                }
                PerformedCall.from(
                    jsonRequestBody = response.request.content.toString(),
                    response = response.bodyAsText(),
                    endpointName = response.request.url.encodedPathAndQuery,
                    headers = response.request.headers.entries(),
                    url = response.request.url.toString(),
                )?.let {
                    options.onCallPerformed.invoke(it)
                }
                Napier.d(tag = "Kmm.Auth", message = "RefreshTokens $oldTokens")
                withContext(NonCancellable) {
                    onRefreshToken(
                        options = options,
                        baseParamsProvider = ntvbBaseParamsProvider,
                        oAuthTokenStorage = oAuthTokenStorage,
                        oldTokens = oldTokens,
                    )
                }
            }
        }
    }
}

I hope someone can help us, as we seem to cannot fix the issue no matter what we try.

🧵 2

Vita Sokolova

02/27/2025, 12:24 PM

We were struggling with token updates in my app also. I’m not sure if I can help you with solving it, but I share your feelings that token update mechanism in ktor needs some love, improvements and locks. For us the main point of misunderstanding was that even though

loadTokens()

takes a lambda function it caches these tokens under the hood, it doesn’t call this lambda on every time tokens are needed, the only way to force updating it is to call

httpClient.invalidateBearerTokens()

every time you tokens change. But probably, you already know it.

💯 2

Aleksei Tirman [JB]

02/27/2025, 12:28 PM

We have an issue to address this general problem by providing control over the tokens' storage. It would be great if you create a sample project or write a self-contained code snippet to reproduce the problem of the

refreshTokens

block triggered twice in a very short timeframe.

Andreas Dybdahl

02/27/2025, 2:42 PM

@Vita Sokolova I had not heard about the invalidate function you mentioned, but I can't seem to find it. I searched for the function in the entire Ktor github repository and did not find it. Is it called what you stated, or was it maybe a function in an earlier version of ktor?

Andreas Dybdahl

02/27/2025, 2:43 PM

@Aleksei Tirman [JB] I am working on a unit test that will recreate the issue, but I have not cracked the code to do that yet, I will let you know when I have any updates on it

🙏 1

Vita Sokolova

02/27/2025, 2:57 PM

Sorry, my bad, it’s my extension function:

Copy code

fun HttpClient.invalidateBearerTokens() {
    authProviders
        .filterIsInstance<BearerAuthProvider>()
        .singleOrNull()?.clearToken()
}

Andreas Dybdahl

02/28/2025, 7:01 AM

@Vita Sokolova it was the clearToken function I though you used as well. Thanks for the feedback anyway

Andreas Dybdahl

03/04/2025, 10:53 AM

@Aleksei Tirman [JB] I have been trying to recreate the issue we are seeing in a clean HttpClient with a MockEngine simulating a Bearer auth scheme. But I have been completely unable to recreate the issue outside of our app environment for some reason, it is really a weird thing that we can't seem to recreate the issue in a test but it is happening all the time in the app

Aleksei Tirman [JB]

03/04/2025, 12:36 PM

Would the solution of giving the users control for the tokens storage (KTOR-8180) solve your problem? Or would you rather have a solution for the current problem?

Andreas Dybdahl

03/05/2025, 7:03 AM

We are working with our BE team to implement a grace period of tokens so that we can solve the problem there until we have more control of what tokens are sent using ktor

Andreas Dybdahl

03/05/2025, 2:43 PM

Untitled.kt

Andreas Dybdahl

03/05/2025, 2:51 PM

@Aleksei Tirman [JB] I have an update on the matter. I was able to find the reason for our problem, and also created a test that confirms this is what happens, so you can recreate it yourself. The problem happens because the coroutineScopes that start coroutines when they are tied to a UI in Android for example often are cancelled when the user goes to a different page while networking is happening. The issue occurs because when cancellation happens between an actual request to a backend to refresh a token and the return of the refreshTokens lambda, a cancellation exception happens that makes the now updated tokens invalidate the ktor tokens cache, and they are not updated due to the cancellation. If we wrap our refresh logic in a

withContext(NonCancellable)

block to prevent cancellations from happening at all, the issue still occurs, as the parent scope that is calling our refreshTokens lambda is still being cancelled, so ktor is completely ignoring the new tokens that are then sent back from the lambda because cancellation exceptions still happen inside the ktor code in the Auth plugin. So we are unable to resolve the issue on the client side because the issue happens both when we allow for cancellations and when we don't, as the problem occurs inside the Auth plugin. You can run the test I attached above where I have mocked a server allowing for an endpoint to refresh tokens and one for getting some data. I regards to your question above about whether or not the proposed solution of making the client be the TokenHolder, this would still not solve the problem as it is happening inside the Auth plugin itself it seems. If you would fix this issue, there would still be a possibility for it to happen though, as the root cause of the problem: the server getting an actual request and creating a new token, but the client never getting a response, can still happen in cases where the client looses the internet connecting at about the same time the cancellation happens in the test, this would be a much rarer case though, so I still do believe that a solution could be very beneficial in ktor as well

Aleksei Tirman [JB]

03/06/2025, 10:00 AM

Thank you for the thorough explanation and the test. I've filed an issue to address this problem.

🙌 1

Andreas Dybdahl

03/07/2025, 7:12 AM

Thank you to you too Aleksei, I will follow the issue for any further updates while we fix the issue in our BE temporarily too.

Kaira Diagne

06/18/2025, 1:24 PM

My team would also be very happy to see this issue addressed. We’re facing a similar problem where refresh tokens can be reused due to coroutine cancellation during the token refresh flow, which sometimes leads to forced logouts. This is something we want to avoid at all costs to ensure a smooth user experience. It would prevent valid refresh responses from being discarded and reduce the risk of token reuse errors. I believe many teams would benefit from this, as most common auth providers either support or expect this kind of behavior. I appreciate the investigation and the proposed improvements, this would be a great addition to the plugin.

27 Views

Open in Slack

Previous Next