<https://kotlinlang.org/docs/reference/coroutines/...
# coroutines
j
https://kotlinlang.org/docs/reference/coroutines/coroutine-context-and-dispatchers.html#dispatchers-and-threads appears to be the starting point i am looking for, and i find "newSingleThreadContext". I think to myself that having threads with CPU affinity is good for breaking a memory mapped file into N lines/CPU chunks. upon clicking on that link i read bold letters: "*NOTE: This API will be replaced in the future.*" In c++ there is openMP, and a pragma lets you simply annotate a loop to parralelize it. Is there something so simple I am missing from the coroutines docs?
o
raw coroutines is more about good concurrency primitives than high-level parallelization, you can accomplish decent parallelization by simply spawning a number of coroutines using
launch
that all take from the same Channel, or by splitting the job in advance and handing it off to a number of
async
jobs that produce a result you can combine. basically, you build it yourself out of what exists
I don't think you can actually get CPU affinity guarantees from the JVM, so even newSingleThreadContext probably doesn't do what you think it does, and may be counter-productive.
c
Dispatchers.Default
is probably your best bet. It uses a thread pool with threads equal to the number of CPUs available, so is good for optimally running CPU-bound tasks (like processing memory buffers). So just break up your buffers, and send each task to an
async
call, and it should run about as optimally as it can on your hardware
o
you can technically get some parallel stuff going with
Flow
, but it's not primarily designed for it at the moment, giving you only `flatMapMerge`/`flattenMerge`
j
i have a mmap bytebuffer, i think that buf.duplicate() is enough to scope a coroutine operation concurrently among millions of lines and dozens of cores.
i have each byte buffer line returning a flow, for performing the seek at decode time.
fixed width records lend themselves well to random access memory mapped cursors. the "2GB per core" ideal of hadoop sort of hits MAXINT as a limit early in the jvm NIO bytebuffers, without some additional mental accounting i haven't applied, like seperate mmap per core. (sorry thinking out load) having coroutines overly dispatched could really kill the partitioned access, but i wanted to be able to queue up enormous numbers of flows to be dispatched in io elevator-freindly ways.
the idea would be to be able to map a dataframe's permutations given flows, to arbitrary depths, and backlog the IO to arrive at opportunistic flow fulfillment with optional ordered reassmbly
pmap will at least not decrease thruput over the single-threaded code now.