Andreas Dybdahl
02/27/2025, 9:34 AMrefreshTokens
lambda, and we make a call to our backend using the oldTokens.refreshToken
supplied to refresh the refresh token.
3. Step 2 succeeds, the token is updated and the call from step 1 completes successfully.
4. At some point in time the app makes 10-15 simultaneous requests again while the accessToken has just expired, and soon after starting the calls we are cancelling them again due to a new refresh happening in the page in the app.
5. KTOR triggers the refreshTokens
lambda again two times (A and B) very quickly after one another.
6. Call A takes the oldTokens.refreshToken
and refreshes successfully using our backend, and returns the BearerTokens object in the lambda with the now updated token.
7. Call B takes the oldTokens.refreshToken
as well, but this time the token is the one from before step 6 had run, so this token is now invalid. It makes a refresh call to our backend and gets a 401 because the token supplied was not valid anymore because it was just refreshed and thus outdated 15ms before that.
8. Because we get a 401 on a refreshToken call, we return null from the refreshTokens
lambda and the user is logged out and sent to the login screen.
We have read and read up on the causes of the issue, and we have found in multiple threads on youtrack and in this channel that the Bearer auth handling in Ktor should be thread safe by design, but this does not seem to be the case with our findings.
• We have tried implementing a locking mechanisn using a Mutex, that will make sure the loadTokens
lambda is locked while the refreshTokens
lambda is refreshing a token, but this does not work as the documentation on the plugin is wrongly stating that the loadTokens is called before every call, while this is not the case, it is only called once, and subsequent tokens are loaded from an internal cache in ktor.
• We have tried implementing a mechanism that unmarshals the JWT at the start of the refreshTokens
lambda and checks the expireTime of the token, and in the case the token is not yet expired, does not try to refresh it using our backend, but instead immediately returns the same token as was input in oldTokens
. We did this because our locking mechanism did not prevent multiple refreshTokens
calls from happening at almost the same time. This does not work either, this issue is still happening.
• We have also tried making a locking mechanism using a Mutex that will make sure any read anywhere of a token from our internal token storage will be locked as soon as the refreshTokens
lambda is running and vice versa, any token read operation will lock the refreshTokens
until the read is done. This however makes parallel calls impossible to do, as they lock each other making calls synchronous and thus extremely slow, locking up the app, and causing an ANR that prompts the user to close the app because it is so slow that it is not responding.
So we have tried quite a few things, but the issue persists, somehow the refreshTokens
lambda is triggered twice in a very short timeframe, and the first request goes well and refreshes the token, and the second request does not receive this token in the oldTokens
but receives the now invalidated refreshToken and thus makes a call, gets a 401 and logs the user out.
We are working with the hypothesis that the two calls to the refreshTokens
happen because KTOR allows the refreshTokens
lambda to be called again immediately after, but now with the old token because Ktor thinks the token just returned was not valid due to the call being cancelled, although in reality we already made our API call to refresh before being cancelled thus the token was valid and should be replaces in the internal token cache.
We have avoided the clearToken
function that seems to cause more issues than it solves, and also by different users seems to cause multiple refresh calls as well.
We are using the markAsRefreshTokenRequest()
function on the request builder for the refresh token call we are making.
We are using a single Ktor Http client to make all authenticated requests as recommended in the threads we could find on youtrack.
Here is our current Bearer auth config
install(Auth) {
bearer {
// Specify which calls don't need to refresh after a 401 response.
sendWithoutRequest { request ->
// This callback should return true when we are making a request to the login endpoint, as this endpoint should be sent without waiting for the 401.
val result = !(request.url.encodedPath.contains("auth/token")
&& (request.body as? PostAuthenticateUserRequestNetworkModel)?.grantType == GrantTypeNetworkModel.PASSWORD)
Napier.d(tag = "Kmm.Auth", message = "SendWithoutRequest: $result")
return@sendWithoutRequest result
}
// Invoked during requests
loadTokens {
Napier.d(tag = "Kmm.Auth", message = "LoadTokens")
onLoadTokens(oAuthTokenStorage)
}
// Refresh invalid access token
refreshTokens {
refreshMutex.withLock {
if (jwtUtils.parseJwtPayloadToExpireTime(oldTokens!!.accessToken)!!.epochSeconds > Clock.System.now().epochSeconds) {
return@refreshTokens oldTokens
}
PerformedCall.from(
jsonRequestBody = response.request.content.toString(),
response = response.bodyAsText(),
endpointName = response.request.url.encodedPathAndQuery,
headers = response.request.headers.entries(),
url = response.request.url.toString(),
)?.let {
options.onCallPerformed.invoke(it)
}
Napier.d(tag = "Kmm.Auth", message = "RefreshTokens $oldTokens")
withContext(NonCancellable) {
onRefreshToken(
options = options,
baseParamsProvider = ntvbBaseParamsProvider,
oAuthTokenStorage = oAuthTokenStorage,
oldTokens = oldTokens,
)
}
}
}
}
}
I hope someone can help us, as we seem to cannot fix the issue no matter what we try.Vita Sokolova
02/27/2025, 12:24 PMloadTokens()
takes a lambda function it caches these tokens under the hood, it doesn’t call this lambda on every time tokens are needed, the only way to force updating it is to call httpClient.invalidateBearerTokens()
every time you tokens change. But probably, you already know it.Aleksei Tirman [JB]
02/27/2025, 12:28 PMrefreshTokens
block triggered twice in a very short timeframe.Andreas Dybdahl
02/27/2025, 2:42 PMAndreas Dybdahl
02/27/2025, 2:43 PMVita Sokolova
02/27/2025, 2:57 PMfun HttpClient.invalidateBearerTokens() {
authProviders
.filterIsInstance<BearerAuthProvider>()
.singleOrNull()?.clearToken()
}
Andreas Dybdahl
02/28/2025, 7:01 AMAndreas Dybdahl
03/04/2025, 10:53 AMAleksei Tirman [JB]
03/04/2025, 12:36 PMAndreas Dybdahl
03/05/2025, 7:03 AMAndreas Dybdahl
03/05/2025, 2:43 PMAndreas Dybdahl
03/05/2025, 2:51 PMwithContext(NonCancellable)
block to prevent cancellations from happening at all, the issue still occurs, as the parent scope that is calling our refreshTokens lambda is still being cancelled, so ktor is completely ignoring the new tokens that are then sent back from the lambda because cancellation exceptions still happen inside the ktor code in the Auth plugin.
So we are unable to resolve the issue on the client side because the issue happens both when we allow for cancellations and when we don't, as the problem occurs inside the Auth plugin.
You can run the test I attached above where I have mocked a server allowing for an endpoint to refresh tokens and one for getting some data.
I regards to your question above about whether or not the proposed solution of making the client be the TokenHolder, this would still not solve the problem as it is happening inside the Auth plugin itself it seems.
If you would fix this issue, there would still be a possibility for it to happen though, as the root cause of the problem: the server getting an actual request and creating a new token, but the client never getting a response, can still happen in cases where the client looses the internet connecting at about the same time the cancellation happens in the test, this would be a much rarer case though, so I still do believe that a solution could be very beneficial in ktor as wellAleksei Tirman [JB]
03/06/2025, 10:00 AMAndreas Dybdahl
03/07/2025, 7:12 AMKaira Diagne
06/18/2025, 1:24 PM