Did some basic performance evaluation of file loading on And kotlinlang #coroutines

Did some basic performance evaluation of file load...

jsijsling

07/23/2019, 10:19 AM

Did some basic performance evaluation of file loading on Android with different scheduling methods. I found that

<http://Dispatchers.IO|Dispatchers.IO>

produced significant garbage overhead. Log snippet:

Copy code

Background concurrent copying GC freed 1692(904KB) AllocSpace objects, 127(2MB) LOS objects, 49% free, 3MB/7MB, paused 390us total 103.292ms
Background concurrent copying GC freed 1965(1129KB) AllocSpace objects, 163(3MB) LOS objects, 50% free, 3MB/7MB, paused 552us total 118.408ms
Long wait of 8.836ms for Thread[20,tid=16153,Native,Thread*=0xe0c63c00,peer=0x12c40078,"DefaultDispatcher-worker-6"] suspension!
Background concurrent copying GC freed 2006(1273KB) AllocSpace objects, 197(4MB) LOS objects, 50% free, 3MB/7MB, paused 613us total 120.900ms
Background concurrent copying GC freed 2420(1437KB) AllocSpace objects, 216(4MB) LOS objects, 37% free, 10MB/16MB, paused 1.202ms total 256.912ms
Waiting for a blocking GC ProfileSaver
WaitForGcToComplete blocked ProfileSaver on HeapTrim for 29.549ms
Background concurrent copying GC freed 2242(1268KB) AllocSpace objects, 189(4MB) LOS objects, 49% free, 3MB/6MB, paused 520us total 102.964ms
Background concurrent copying GC freed 1782(1071KB) AllocSpace objects, 154(3MB) LOS objects, 49% free, 5MB/10MB, paused 386us total 112.534ms

Any clue why this might be?

Marko Mitic

07/23/2019, 10:24 AM

IO thread pool is dynamic, it can grow to 64 threads, so probably new thread allocations (but there's probably something else too)

jsijsling

07/23/2019, 10:55 AM

Thread count is indeed higher when using

<http://Dispatchers.IO|Dispatchers.IO>

but I still think the amount of garbage is too much though. In my test the number of distinct threads spawned for 1000 iterations is 5~6 on IO versus 2 on the default dispatcher.

Marko Mitic

07/23/2019, 10:56 AM

could you provide the test code?

jsijsling

07/23/2019, 11:00 AM

Copy code

val timeline = mutableListOf<Long>()
repeat(1000) {
  val ms = measureTimeMillis {
    runBlocking {
      withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        ApplicationProvider.getApplicationContext<Application>().assets
          .open("testAssetFile.json").use { it.reader().readText() }
      }
    }
  }
  timeline += ms
}

println("Avg=${timeline.average()}ms, max=${timeline.max()}, min=${timeline.min()}")

jsijsling

07/23/2019, 11:01 AM

Only with

<http://Dispatchers.IO|Dispatchers.IO>

I see the GC working overtime.

Marko Mitic

07/23/2019, 11:03 AM

maybe if you put

runBlocking

around

repeat

jsijsling

07/23/2019, 11:03 AM

That wouldn’t make the test representative. I’m comparing directly blocking the main thread versus offloading to background threads.

jsijsling

07/23/2019, 11:05 AM

Measure time until the result is delivered to the main thread hence

runBlocking

jsijsling

07/23/2019, 11:28 AM

Scratch that. On second thought it should be equivalent. Tried it, same result

jsijsling

07/23/2019, 11:31 AM

In this case it doesn’t spawn extra threads. Just one. But the garbage overhead remains the same

Marko Mitic

07/23/2019, 11:33 AM

what would happen if you started one coroutine that would do the json reading 1000x times?

jsijsling

07/23/2019, 11:34 AM

You mean this? That’s what I just described.

Copy code

runBlocking {
  repeat(1000) {
    val ms = measureTimeMillis {
      withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        loadAsset("testAssetFile.json")
      }
    }
    timeline += ms
  }
}

Marko Mitic

07/23/2019, 11:34 AM

is there a chance that reading assets in test on background thread is causing garbage?

jsijsling

07/23/2019, 11:36 AM

Repeating the reads within the same context doesn’t produce the garbage overhead, but that’s to be expected.

Copy code

runBlocking {
  withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
    repeat(1000) {
      val ms = measureTimeMillis {
        loadAsset("testAssetFile.json")
      }
      timeline += ms
    }
  }
}

Marko Mitic

07/23/2019, 11:37 AM

@elizarov is this GC expected on 1000

withContext

switches?

Dominaezzz

07/23/2019, 11:39 AM

Hmm, what if you do,

runBlocking(Dispatchers.Default)

jsijsling

07/23/2019, 11:45 AM

This produces far less overhead.

Copy code

runBlocking(Dispatchers.Default) {
  repeat(1000) {
    val ms = measureTimeMillis {
      withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        loadAsset("testAssetFile.json")
      }
    }
    timeline += ms
  }
}

Produces more overhead than

runBlocking  withContext(Dispatchers.Default)

but not as much as with just the IO dispatcher.

jsijsling

07/23/2019, 11:47 AM

I see ~6 concurrent sweeps with

runBlocking(Dispatchers.Default)

. Versus 1~2 concurrent sweeps with

runBlocking withContext(Dispatchers.Default)

Dominaezzz

07/23/2019, 11:50 AM

Default and IO share thread pools, so I guess the overhead is in spawning new threads in the pool.

Dominaezzz

07/23/2019, 11:55 AM

I see this as a wontfix. Not much can be done about thread creation but perhaps the threads could be kept alive for a bit.

Dominaezzz

07/23/2019, 11:56 AM

Using an actor is an option but not a very nice one.

jsijsling

07/23/2019, 11:59 AM

As fas as I could tell the number of threads spawned was too low to explain this amount of garbage, purely by thread creation. If I find the time I’ll do some memory profiling and share the results.

gildor

07/24/2019, 1:38 AM

withContext for sure does allocations (lambda, new context, coroutine scope) and code above is suboptimal, more question what it actually allocating and how big this gc pressure Also 64 threads by itself is 32 Mb off memory

louiscad

07/24/2019, 1:57 PM

@gildor I thought it was about 1MB per thread, so 64MB

gildor

07/24/2019, 2:34 PM

0.5mb is default for desktop as I remember, but yeah, it may be also 64 for sure

118 Views

Open in Slack

Previous Next