Did some basic performance evaluation of file load...
# coroutines
j
Did some basic performance evaluation of file loading on Android with different scheduling methods. I found that
<http://Dispatchers.IO|Dispatchers.IO>
produced significant garbage overhead. Log snippet:
Copy code
Background concurrent copying GC freed 1692(904KB) AllocSpace objects, 127(2MB) LOS objects, 49% free, 3MB/7MB, paused 390us total 103.292ms
Background concurrent copying GC freed 1965(1129KB) AllocSpace objects, 163(3MB) LOS objects, 50% free, 3MB/7MB, paused 552us total 118.408ms
Long wait of 8.836ms for Thread[20,tid=16153,Native,Thread*=0xe0c63c00,peer=0x12c40078,"DefaultDispatcher-worker-6"] suspension!
Background concurrent copying GC freed 2006(1273KB) AllocSpace objects, 197(4MB) LOS objects, 50% free, 3MB/7MB, paused 613us total 120.900ms
Background concurrent copying GC freed 2420(1437KB) AllocSpace objects, 216(4MB) LOS objects, 37% free, 10MB/16MB, paused 1.202ms total 256.912ms
Waiting for a blocking GC ProfileSaver
WaitForGcToComplete blocked ProfileSaver on HeapTrim for 29.549ms
Background concurrent copying GC freed 2242(1268KB) AllocSpace objects, 189(4MB) LOS objects, 49% free, 3MB/6MB, paused 520us total 102.964ms
Background concurrent copying GC freed 1782(1071KB) AllocSpace objects, 154(3MB) LOS objects, 49% free, 5MB/10MB, paused 386us total 112.534ms
Any clue why this might be?
m
IO thread pool is dynamic, it can grow to 64 threads, so probably new thread allocations (but there's probably something else too)
j
Thread count is indeed higher when using
<http://Dispatchers.IO|Dispatchers.IO>
but I still think the amount of garbage is too much though. In my test the number of distinct threads spawned for 1000 iterations is 5~6 on IO versus 2 on the default dispatcher.
m
could you provide the test code?
j
Copy code
val timeline = mutableListOf<Long>()
repeat(1000) {
  val ms = measureTimeMillis {
    runBlocking {
      withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        ApplicationProvider.getApplicationContext<Application>().assets
          .open("testAssetFile.json").use { it.reader().readText() }
      }
    }
  }
  timeline += ms
}

println("Avg=${timeline.average()}ms, max=${timeline.max()}, min=${timeline.min()}")
Only with
<http://Dispatchers.IO|Dispatchers.IO>
I see the GC working overtime.
m
maybe if you put
runBlocking
around
repeat
j
That wouldn’t make the test representative. I’m comparing directly blocking the main thread versus offloading to background threads.
Measure time until the result is delivered to the main thread hence
runBlocking
Scratch that. On second thought it should be equivalent. Tried it, same result
In this case it doesn’t spawn extra threads. Just one. But the garbage overhead remains the same
m
what would happen if you started one coroutine that would do the json reading 1000x times?
j
You mean this? That’s what I just described.
Copy code
runBlocking {
  repeat(1000) {
    val ms = measureTimeMillis {
      withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        loadAsset("testAssetFile.json")
      }
    }
    timeline += ms
  }
}
m
is there a chance that reading assets in test on background thread is causing garbage?
j
Repeating the reads within the same context doesn’t produce the garbage overhead, but that’s to be expected.
Copy code
runBlocking {
  withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
    repeat(1000) {
      val ms = measureTimeMillis {
        loadAsset("testAssetFile.json")
      }
      timeline += ms
    }
  }
}
m
@elizarov is this GC expected on 1000
withContext
switches?
d
Hmm, what if you do,
runBlocking(Dispatchers.Default)
?
j
This produces far less overhead.
Copy code
runBlocking(Dispatchers.Default) {
  repeat(1000) {
    val ms = measureTimeMillis {
      withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        loadAsset("testAssetFile.json")
      }
    }
    timeline += ms
  }
}
Produces more overhead than
runBlocking  withContext(Dispatchers.Default)
but not as much as with just the IO dispatcher.
I see ~6 concurrent sweeps with
runBlocking(Dispatchers.Default)
. Versus 1~2 concurrent sweeps with
runBlocking withContext(Dispatchers.Default)
d
Default and IO share thread pools, so I guess the overhead is in spawning new threads in the pool.
I see this as a wontfix. Not much can be done about thread creation but perhaps the threads could be kept alive for a bit.
Using an actor is an option but not a very nice one.
j
As fas as I could tell the number of threads spawned was too low to explain this amount of garbage, purely by thread creation. If I find the time I’ll do some memory profiling and share the results.
g
withContext for sure does allocations (lambda, new context, coroutine scope) and code above is suboptimal, more question what it actually allocating and how big this gc pressure Also 64 threads by itself is 32 Mb off memory
l
@gildor I thought it was about 1MB per thread, so 64MB
g
0.5mb is default for desktop as I remember, but yeah, it may be also 64 for sure