Given the following example how do I ensure `expensiveLookup kotlinlang #coroutines

Given the following example, how do I ensure `expe...

Richard Gomez

08/11/2021, 7:14 PM

Given the following example, how do I ensure

expensiveLookup

only executes once (for a given id) and subsequent executions receive the cached value?

Copy code

suspend fun expensiveLookup(id: String): ExpensiveResult {
  val cached = cache.get(id)
  if (cache != null) {
    return cached
  }

  val result = // logic here
  
  cache.put(id, result)
  return result
}

This blog post from Doordash in 2018 has an interesting solution. Is there a 'better' way of doing this now? https://doordash.engineering/2018/08/03/avoiding-cache-stampede-at-doordash/

mkrussel

08/11/2021, 7:27 PM

What I did recently was store a Deferred<T> in the cache, and then have multiple calls to lookup wait for the response. Has some complications. 1. The owner of the cache, needs its own scope so that it can call

async

. This was to make the deferred keep running until all the callers were no longer using it. 2. Needed to implement some ref counting, so I could cancel the work if all the calls to

expensiveLookup

were canceled. There are probably other ways to solve the above issues, but have a scope owned by the cache worked for me. Other thoughts would be for the first guy creating the Deferred to do the work and if it gets canceled, to single that in the Deferred so another caller can try to start the work.

Tijl

08/11/2021, 9:17 PM

a simple Mutex*.*withLock around the function (

suspend fun expensiveLookup(id: String): ExpensiveResult = myCacheMutex.withLock {

) will achieve the same thing. Using a map will give you a little more control over the locking, but

computeIfAbsent

will need some kind of lock mechanism (

ConcurrentHashMap

just uses

synchronized

for this method so will be no faster than Mutex, and with the deferred overhead (which also uses locks I suspect) probably slower) They are using the wrong pattern to avoid a cache miss storm” however, as they say:

and the remaining coroutines should wait on the returned
Deferred
rather than spinning up their own read

While this fixes one half of the cache storm (exhausting your upstream level with many request), now suddenly for a short while all your requests which were blinding fast before are now almost as slow as this one request refreshing the cache. The real solution is to always keep serving the cached content and to update it asynchronously when it gets out of date. Ironically this is super easy to implement. just serve the cached value (always) using lock free reads, only then check if it’s out of date (can be done probabilistically, i.e. not every request if you have really heavy loads), and then only if it’s out of date lock, check noone else is refreshing the cache, and then asynchronously (like @mkrussel suggests you don’t want to use the request’s scope for this) start the update and unlock. Of course this still leaves the case when your cache is simply empty (e.g. on startup or if you have some cache pressure mechanism), then you fall back on the above methods. The way you manage this in your infra is to first “warm up” an instance before you pelt it with requests. The Doordash solution is terrible for this, because your loadbalancer will think this server is warmed up, but every time a cache entry disappears the average request time shoots up a bit. They probably just paper over this with larger margins / more infra, or just have such low loads it’s just a few requests. Or maybe just drop some requests every now and then blob shrug

Jeremy

08/11/2021, 10:47 PM

You could also use

StateFlow

Richard Gomez

08/13/2021, 5:04 PM

I appreciate everyone's input. 🙂 @Tijl: I agree that the implementation is suboptimal, however, I don't think

Mutex.withLock

is semantically equivalent. The 1st method would only lock requests for the same

id

(forgive the terrible ad-hoc code):

Copy code

launch {
    println("Starting A1...")
    debouncer.debounce("A", this) { expensiveLookup("A") }.await()
    println("Done A1")
}
launch { 
    println("Starting B1...")
    debouncer.debounce("B", this) { expensiveLookup("B") }.await()
    println("Done B1")
}
launch {
    println("Starting A2...")
    debouncer.debounce("A", this) { expensiveLookup("A") }.await()
    println("Done A2")
}
---
Starting A1...
Starting B1...
Starting A2...
[A] Waiting..
[B] Waiting..
[A] Done!
Done A1
Done A2
[B] Done!
Done B1

Wouldn't wrapping the entire function in a Mutex cause it to synchronize for every request, regardless of the owner?

Copy code

launch {
    println("Starting A1...")
	expensiveLookup("A")
    println("Done A1")
}
launch {
    println("Starting B1...")
    expensiveLookup("B")
    println("Done B1")
}
launch {
    println("Starting A2...")
    expensiveLookup("A")
    println("Done B1")
}
---
Starting A1...
[A] Waiting..
Starting B1...
Starting A2...
Exception in thread "main" java.lang.IllegalStateException: Already locked by A
 at kotlinx.coroutines.sync.MutexImpl.tryLock (Mutex.kt:177) 
 at kotlinx.coroutines.sync.MutexImpl.lock (Mutex.kt:188) 
 at FileKt.expensiveLookup (File.kt:73)

Richard Gomez

08/13/2021, 5:13 PM

Your comments about properly managing caches and infrastructure are on point. That said, I don't actually care about cache stampedes.

Of course this still leaves the case when your cache is simply empty

This is my use case: we're parsing 100GBs of logs, which requires external look-up for things like user info, IP info, resource info, etc., and want to avoid extraneous requests. e.g., if we have 10k unique users spread across 10M log lines, ideally we'd make 10k requests (at most).

Tijl

08/16/2021, 3:34 PM

the write lock (computeIfAbsent) in the linked implementation is on the map, so it’s one global lock. Only on top of that do you get additional deffered locking, but indeed this is per id. it’s true the lock I suggest should only be for filling the cache (not when there is a cache item), so after

val result

(not your whole method), naming threw me off there

Tijl

08/16/2021, 7:18 PM

actually I’m probably incorrect, seems for even quite old version computeIfAbsent uses a key reservation system so locking is done per object. Using a deferred in combination with that kind of makes sense even if the contrast with “approaches using locking” does not.

6 Views

Open in Slack

Previous Next