Hello Everyone, I'm not sure if this is the right...
# coroutines
b
Hello Everyone, I'm not sure if this is the right place to ask, but I'm trying to deepen my understanding of Kotlin coroutines, particularly regarding the term non-blocking and how dispatchers behave under different workloads. Recently, I was assigned a task where I need to process thousands of files — primarily involving downloading them (I/O-bound) and then performing CPU-intensive processing on each file. Naturally, I considered leveraging coroutines for parallelisation. I have a few conceptual doubts: 1. *I/O-bound tasks and Dispatchers.IO:* 2. When using
<http://Dispatchers.IO|Dispatchers.IO>
, I understand that it's optimized for I/O operations. Internally, does this utilize mechanisms like Java NIO or even OS-level calls such as
epoll_create1()
? Is it accurate to think of it as posting a task to the OS and letting the OS notify the application when the I/O completes? 3. CPU-bound tasks and Dispatchers.Default: 4. For CPU-bound tasks, we typically use
Dispatchers.Default
, which is backed by a shared pool of threads equal to the number of CPU cores (e.g. 8 threads for 8 cores). ◦ Does this mean that only up to 8 CPU-bound coroutine tasks can be executed concurrently? I know more can be submitted, but they would be scheduled based on thread availability. ◦ Are these tasks tied to specific CPU cores? For instance, if Task A starts on Core 1, will it continue on Core 1 for the sake of cache locality (i.e. core affinity), or is that not guaranteed? ◦ Is time-slicing used when there are more tasks than available cores? Would really appreciate any clarification or pointers. Thanks in advance!
r
The only meaningful difference between the
IO
dispatcher and the
Default
dispatcher is the size of their threadpools.
<http://Dispatchers.IO|Dispatchers.IO>
has no special hidden functionality to actually make it better at I/O operations.
Dispatchers.Default
has a threadpool with size equal to the number of CPU cores, and
<http://Dispatchers.IO|Dispatchers.IO>
allows for many more additional threads. To better understand the intent behind these dispatchers, let's step back from Kotlin and Coroutines for a moment. Imagine your machine has 4 CPU cores. If you launch a program with 4 threads or less running in parallel, each thread can do its work using its own CPU core. If you launch a program with 5 threads or more running in parallel, some of the threads will need to share usage of CPU cores; in this scenario the OS is responsible for scheduling how multiple threads can use the same CPU core, and it will typically have the threads take turns using the CPU core. Threads that are sharing CPU cores will take longer to do their work than ones that each have their own CPU core. CPU cores are the bottleneck on overall computational speed. Coming back to Kotlin coroutines, this is why
Dispatchers.Default
only makes
n
worker threads for
n
CPU cores; more worker threads will do nothing to speed up CPU-bound work (and might even slow it down, due to factors such as context-switching, thrashing, etc.) The whole goal of
suspend
functions, coroutines, etc. is to make better use of the CPU cores that you have. If you have one coroutine that at some point needs to wait for something else (perhaps a network call, disk I/O, or the completion of another coroutine) and it's not using any CPU power, the coroutine system allows other tasks to step in and make use of that thread / CPU core that would be otherwise idle. The trick with
<http://Dispatchers.IO|Dispatchers.IO>
is that it's only useful if you have code which you know is waiting on something else (e.g. a network call) but the function you're calling is not declared with `suspend`; this is often the case with low-level system calls. Any non-suspending function holds onto its thread, and doesn't let others use it. This is a
blocking
function call; it
blocks
the thread from doing other work. If this blocking function is being run on a thread owned by
Dispatchers.Default
, this means it's hogging a thread that could be used for other work, and you might have an idle CPU core even though there is work waiting. This is when you want to use
<http://Dispatchers.IO|Dispatchers.IO>
, which will run it in an extra thread outside of the
n
owned by
Dispatchers.Default
. If you know the task is primarily waiting and not using CPU power, this extra thread will not slow down your main CPU work. Here is a tutorial video that I personally found very helpful in understanding when to use `Dispatchers.IO`:

https://www.youtube.com/watch?v=J5h0IJs8aV0

🙌 1
To try to directly answer your bullet-point questions under #4: • If you have 8 CPU cores, only 8 tasks will be actively running code at any given moment in time. This does not mean that [the number of tasks that have been started and not yet finished] is limited to 8. All of the tasks owned by the dispatcher will take turns using the 8 threads, depending on their suspension points. When one suspends and leaves a thread unused, a different one that's ready and waiting will step up to take the thread. • No. The docs state: "a coroutine is not bound to any particular thread. It may suspend its execution in one thread and resume in another one." The main exception is if you have no suspension points, in which case there is no opportunity for anything to get switched up. • I'm not sure I know what you mean by time-slicing in this scenario, but perhaps this answers your question: Tasks define their own suspension points; the dispatcher does not arbitrarily pause a task after some time frame of it being active.
b
Thanks @rkechols
d
To the OP, yeah, I leapt to hoping/believing
<http://Dispatchers.IO|Dispatchers.IO>
replaced the low level IO ops to suspend and thus gave nearly optimal context switching. Alas, it's only at the
suspend fun
level