Running Kotlin coroutines on Project Loom's virtua...
# coroutines
j
Running Kotlin coroutines on Project Loom's virtual threads: https://kt.academy/article/dispatcher-loom
👌 6
👏🏾 1
💡 4
👏 3
1
c
This doesn't really show an increase in performance. Sure, using Thread.sleep using Loom is faster, but it's not clear to me whether using Loom improves the performance of real code
j
any suggestions? I don't think my ISP will be very happy if I send a million emails twice. My database won't allow me to open a million connections. I don't think Linux would allow me to concurrently open a million files. If I hit someone's API with a million concurrent requests, my IP will be blocked for DOSS-ing them assuming I can open that many connections. Thread.sleep is a nice practical way to show that you can spin up a million threads without running into context switching issues Actual performance will depend on what you use it for - in my case swapping Dispatchers.IO for Dispatchers.LOOM when sending bulk email from an AWS micro instance did make a difference - instead of my HTTP request / second throughput dropping as the number of outgoing emails increased, I no longer had this performance drop and I was able to keep the production system running on a micro instance
c
Well first, it would be interesting to compare with
delay
. But even then, “it makes doing nothing faster” is not a strong argument.
m
I was going to ask the exact same question. Is there anything else besides
Thread.delay
that benefits from Loom?
I guess any posix
select
needs to schedule a new thread, right?
c
In theory, since all blocking calls actually happen in the JDK, the idea was that it the JDK developers could transparently replace blocking calls by non-blocking calls at the edge
I have no idea if that's actually going to be a thing though
If they do manage to do it, that would mean even IO tasks wouldn't block threads, which would be amazing, but I haven't followed Loom for years now so I don't know if it's still on the table or if it was just an idea they had at some point
m
I mean the kernel doesn't have that many IO primitives, it's mostly either
sleep()
or
select()
IIRC from my low level C days
So the JVM is limited in what it can do. Whenever some
inputStream
calls
select()
my understanding is it's going to schedule a new (real, not virtual) thread.
Unless there are some virtual threads in the kernel these days...
c
Disclaimer: I really don't know much about this, but I believe there are alternative system calls which take a callback or signal invoked when they're completed, which would lift the blocking out of the threads
I could be completely wrong though
m
I see, interesting..
It seems to exist
My understanding of Loom when it was introduced was that one of the goals was to allow green threads to use these, but again I have no idea what became of this
m
This is where select(2), poll(2) and the epoll(7) family of system calls come in
This is the
select()
I was talking about. I guess what the JVM can do is keep track of the file handles that your app is using and "group" and
select()
accross all of them
I would expect this to be done by netty or other frameworks for any performance sensitive app but there might be use cases beyond that
c
I did not check that at all, but how does ThreadLocal behave with virtual threads? That could make them a bit more expensive than coroutines
m
Yikes no idea about
ThreadLocal
..
c
https://www.javaadvent.com/2020/12/project-loom-and-structured-concurrency.html Ouch, they seem to be copied to each new virtual thread
m
Are
ThreadLocal
used a lot though?
Now that you're talking about it, I remember Moshi uses it..
So might be a thing
c
Well if you're using Coroutines it's a very bad idea to use them anyway
But I have seen a lot of them in old codebases ported to Kotlin
It seems a lot of people used to confuse thread locals with locks 😕
m
😕
c
IMO ThreadLocal was already a code smell even in regular Java with regular threads
The only safe use case I can think of is a cache, where a cache miss is just a performance hit but doesn't impact the logic
(and that's still safe with coroutines)
m
In that case, yea, no need to copy the thread local to each virtual thread, it can be shared
1 real thread == 1 thread local
On a separate topic: @janvladimirmostert what API are you using to send the emails?
FWIW, the code that does the select in the JDK is there
(at least some of it, there are gazillion different paths in the JDK but that file shows clearly the difference between virtual and regular threads)
c
So Loom will indeed have benefits for IO tasks, that's great!
m
Yup
c
Well that's the kind of things I would have wanted to know from an article about Loom and Coroutines 🙂
m
I wonder how the JVM does it without compiler support. Does it rewrite the bytecode to put "Continuation" or does it do something else...
c
Does it even need to do that? It controls all 'suspension' points and the heavy lifting is done by the kernel
I don't think it's much different for the JDK than blocking is, since it's all deep in the internals and everything else just runs on top of it unaware
m
yea maybe...
c
Well now that we know that, I would assume the future of Kotlin coroutines is that Dispatchers.IO will end up using Loom automatically, but we'll have to wait to see
k
I mean the kernel doesn’t have that many IO primitives, it’s mostly either
sleep()
or
select()
IIRC from my low level C days
This isn’t true. On Linux you have
select
,
poll
,
epoll
, and
io_uring
. I’m actually working on binding io_uring to K/N. On bsd you have
kqueue
and
select
. on mingw you have
ioctl
which is iirc similar to kqueue.
select is really bad as the performance hit scales exponentially (quadratically?) with the number of file descriptors you’re watching
poll and epoll can only work on socket type things, not file based thing.
io_uring works async on all io, file and socket based.
m
@kevin.cianfarini I put
select
and
poll
in the same category
poll
being an improvement on
select
IIRC?
k
From what I remember when looking into Java NIO it will pick the best system call for a given system to do async io. I’m sure they’d do the same for loom
I actually don’t remember the history of select/poll/epoll
m
Like every time it's the same concept of setting a few bitflags and waiting on several file descriptors/socket at once
k
kind of. io_uring is a bit different because it can theoretically work on any arbitrary syscall
not just io stuff
it describes what syscall you want to perform and then sends the request to the kernal via a memory mapped queue, the kernel completes it, and then gives it back to you in a different memory mapped queue. The big perf benefit here is that you don’t have to make as many (or any!) syscalls
m
My point is that Loom can "collect" file descriptors (or syscall) accross different virtual threads and ask the kernel all of them at once, meaning it schedules only once
k
oh, yes!!
I expect loom will do all sorts of neat things with io and make it totally transparent to the user
m
So it's doing something that Kotlin coroutines can't
k
It will absolutely have benefits
yes.
It’s not to say that Kotlin/Native couldn’t achieve something with a similar level of perfomance without loom
m
Oh right, Kotlin Native 👀
k
Loom is just virtual machine managed magic around transforming what seems to be blocking IO syscalls into nonblocking ones.
The VM would then choose what else in the program to run
In Kotlin world, a dispatcher would choose what else to run. This is contingent upon someone using coroutines, though.
It’s a different layer of abstraction.
c
That's why I would expect
<http://Dispatchers.IO|Dispatchers.IO>
to end up Loom at some point in the future, so we would get all of this for free for current coroutine-based programs, without any behavior difference for other platforms
k
I would agree with that.
To an extent. I’m sure there will be use cases for real blocking IO
Also I’m sure the coroutines people are thinking about this much more accurately than we are. I don’t suspect slapping a
Dispatchers.LOOM
will be the best interop story between coroutines and loom.
They might have some other tricks up their sleeves.
c
It's probably an OK first-adoption strategy if you want to start using Loom before it's fully released and the Coroutine devs have had the time to do their thing
k
And re, this:
So the JVM is limited in what it can do. Whenever some
inputStream
calls
select()
my understanding is it’s going to schedule a new (real, not virtual) thread.
I don’t think that’s necessary true. A general pattern I’ve seen used is to have one “nonblocking main thread” and one “worker thread”. The worker thread is the entity responsible for notifying consumers of IO completion events and dispatching them. The main thread will request some IO thing to be performed and go do other non-blocking things. That’s how the non-blocking IO impl works on top of
select(2)
in the ktor-network package and that’s how my io_uring impl works. If you want a program to be truly single threaded and perform io_work, you have to transform the entire program into an event loop. This essentially condenses the work of a dispatcher type thing and the business logic of your program into a single thing. the io_uring examples do this where the main function is pretty much just a wrapper around a
while (!done)
loop.
So I doubt loom will be spawning a new background thread for every IO thing for you. That wouldn’t present many benefits on top of what we already have with managed threadpools. Threads take resources
m
> So the JVM is limited in what it can do. Whenever some
inputStream
calls
select()
my understanding is it’s going to schedule a new (real, not virtual) thread.
I don’t think that’s necessary true
Yup, that was my wrong initial assumption. Loom can actually
select()
/
poll()
from multiple
inputStreams
in different virtual thread and schedule only one carrier thread. Cool stuff!
This is just out: https://mastodon.social/@jessewilson/109680215462297851. Haven't listened to it yet but planning to
370 Views