``` produce lt String gt capacity = 1024 input forEachLine a kotlinlang #coroutines

``` produce<String>(capacity = 1024) { ...

bjonnh

03/11/2019, 4:05 AM

Copy code

produce<String>(capacity = 1024) {
            input.forEachLine { async { send(it) } }
        }.map {
            async {
                val tokenized = tokenizerFactory.create(pp.preProcess(it)).tokens.joinToString(" ")
                tokenized + "\n"
            }
        }.map {
            output.appendText(it.await())
        }

gildor

03/11/2019, 4:08 AM

I suppose that you use sequences, this will not work, sequence is sequential primitive

gildor

03/11/2019, 4:08 AM

You have to get rid of forEachLine or write own suspend version

gildor

03/11/2019, 4:11 AM

The source of this problem is that sequence extensions are not inlined (and they cannot be inlined due to lazy nature of sequence, and because of that you cannot use suspend functions and you have to run async on each item which in general not the best idea in terms of performance, but also you just cannot use map and other primitives properly

bjonnh

03/11/2019, 4:18 AM

hmmm

sitepodmatt

03/11/2019, 4:18 AM

Copy code

produce<String>(capacity = 1024) {
                val that = this
                listOf("1","2","3").asSequence().forEach { async { that.send(it) } }
            }

would a closure work?

bjonnh

03/11/2019, 4:18 AM

forEachLine is coming from a file

bjonnh

03/11/2019, 4:18 AM

and the file is too big to be loaded in memory

bjonnh

03/11/2019, 4:19 AM

Copy code

val reader =        produce<String>(<http://Dispatchers.IO|Dispatchers.IO>, capacity = 1024) {
            input.forEachLine { runBlocking { send(it) } }
            this.close()
        }

        val writer = produce<String>(context, capacity=8) {
            reader.consumeEach {
                val tokenized = tokenizerFactory.create(pp.preProcess(it)).tokens.joinToString(" ")
                send(tokenized + "\n")
            }
        }

        writer.consumeEach {
            output.appendText(it)
        }

bjonnh

03/11/2019, 4:19 AM

now I tried that, I runBlock on the send (because if capacity is 1024, it should really block only if it is full right?

sitepodmatt

03/11/2019, 4:20 AM

runBlocking shouldnt be used here

bjonnh

03/11/2019, 4:20 AM

the problem I had with using async, is that I would get billions of asyncs (at least that's what memory usage looked like)

sitepodmatt

03/11/2019, 4:21 AM

maybe try: (untested)

Copy code

val that = this
input.forEachLine { that.send(it) }

bjonnh

03/11/2019, 4:22 AM

if I use async, my problem is that I cannot close the stream either

bjonnh

03/11/2019, 4:23 AM

if I do what you wrote

bjonnh

03/11/2019, 4:23 AM

it says that suspension functions can only be used within coroutine body

bjonnh

03/11/2019, 4:24 AM

(which is the reason why I found that runBlocking was working

sitepodmatt

03/11/2019, 4:25 AM

offer?

sitepodmatt

03/11/2019, 4:26 AM

input.forEachLine { offer(it) }

bjonnh

03/11/2019, 4:26 AM

but then I lose the lines that weren't added…

bjonnh

03/11/2019, 4:28 AM

so if I start buffering (which is likely as processing is much slower than file reading), then I loose lines randomly

sitepodmatt

03/11/2019, 4:30 AM

you could check the result of offer() and loop until true with a delay/sleep ?

bjonnh

03/11/2019, 4:30 AM

what is wrong with using runBlocking?

sitepodmatt

03/11/2019, 4:33 AM

I was under the impression there should only one runblocking as it turns current thread in an event loop, I'm probably out of my depth here

sitepodmatt

03/11/2019, 4:33 AM

from docs

sitepodmatt

03/11/2019, 4:33 AM

Runs new coroutine and blocks current thread interruptibly until its completion. This function should not be used from coroutine. It is designed to bridge regular blocking code to libraries that are written in suspending style, to be used in main functions and in tests. The default CoroutineDispatcher for this builder in an internal implementation of event loop that processes continuations in this blocked thread until the completion of this coroutine. See CoroutineDispatcher for the other implementations that are provided by kotlinx.coroutines.

bjonnh

03/11/2019, 4:34 AM

I see…

bjonnh

03/11/2019, 4:35 AM

so it works, but only by luck…

sitepodmatt

03/11/2019, 4:39 AM

Let us know what you find. I would figure since the library is not using coroutines and you're running in the dispatcher.IO pool it would be acceptable to Thread.sleep(10) until you are able to reoffer to forEachLine

sitepodmatt

03/11/2019, 4:39 AM

but that's only a guess

bjonnh

03/11/2019, 4:45 AM

I have to admit I am a bit lost with all of that… I wanted to do a [Read Lines]->[Process lines in parallel]->[Write processed lines]

bjonnh

03/11/2019, 4:46 AM

I can make it work with map, except that then it is not parallel for the lines processing…

sitepodmatt

03/11/2019, 4:46 AM

produce gives you a channel

sitepodmatt

03/11/2019, 4:46 AM

pass the channel to consumers

sitepodmatt

03/11/2019, 4:46 AM

repeat(8) { for(x in channel) { consume(x) } }

sitepodmatt

03/11/2019, 4:46 AM

8 consumers

bjonnh

03/11/2019, 4:47 AM

does that stop only when channel is closed?

sitepodmatt

03/11/2019, 4:48 AM

Copy code

runBlocking {
            val channel = produce<Double>(capacity = 1024) {
                while(true) {
                    send(Math.random())
                }
                    
            }
            
            repeat(8) {
                launch {
                    for (item in channel) {
                        println(item)
                    }
                }
            }
        }

sitepodmatt

03/11/2019, 4:49 AM

yes

sitepodmatt

03/11/2019, 4:49 AM

Copy code

runBlocking {
            val channel = produce<Double>(capacity = 1024) {
                while(true) {
                    send(Math.random())
                }

            }

            repeat(8) {
                launch {
                    for (item in channel) {
                        println(item)
                    }
                }
            }

            delay(500)
            channel.cancel()
        }

sitepodmatt

03/11/2019, 4:50 AM

a better example

sitepodmatt

03/11/2019, 4:50 AM

Copy code

runBlocking {
            val channel = produce<Double>(capacity = 1024) {

                var x = 0
                while(true) {
                    send(Math.random())
                    x++
                    if(x > 100) {
                        break //closing
                    }
                }

            }

            repeat(8) {
                launch {
                    for (item in channel) {
                        println(item)
                    }
                }
            }

        }
        println("all done")

sitepodmatt

03/11/2019, 4:51 AM

I think I recall an example like this in

https://www.youtube.com/watch?v=a3agLJQ6vt8▾

bjonnh

03/11/2019, 4:53 AM

If I do that it just get stuck in the repeat

bjonnh

03/11/2019, 4:53 AM

ooh I see why

bjonnh

03/11/2019, 4:53 AM

because I was writting to a writter channel

sitepodmatt

03/11/2019, 4:57 AM

how do you mean?

bjonnh

03/11/2019, 4:59 AM

I was writing to a writechannel but not reading from it

bjonnh

03/11/2019, 4:59 AM

Copy code

val reader = produce<String>(capacity = 1024) {
            input.useLines {
                val iterat = it.iterator()
                while (iterat.hasNext()) {
                    send(iterat.iterator().next())
                }
            }
            println("Can close now")
            this.close()
        }

        val writer = launch(<http://Dispatchers.IO|Dispatchers.IO>) {
            writerChannel.consumeEach {
                output.appendText(it)
            }
        }

        repeat(8) {
            println("Analyzer number $it")
            for (x in reader) {
                val tokenized = tokenizerFactory.create(pp.preProcess(x)).tokens.joinToString(" ")
                writerChannel.send(tokenized + "\n")
            }
        }

        writerChannel.close()
        writer.join()

bjonnh

03/11/2019, 4:59 AM

that's what I have so far

bjonnh

03/11/2019, 5:00 AM

but it prints:

Copy code

Analyzer number 0 (spent all the processing time here)
Can close now
Analyzer number 1 (go really quickly across all of the following)
Analyzer number 2
Analyzer number 3
Analyzer number 4
Analyzer number 5
Analyzer number 6
Analyzer number 7

bjonnh

03/11/2019, 5:04 AM

I tried to put repeat(8) { launch { … }} but doesn't seem to improve

gildor

03/11/2019, 5:12 AM

I believe for this use case channel just adds unnecessary overhead, because this operation is completely blocking by nature, so I would rewrite it in a sequential way, I’m not sure what you want to achive and what you need this channel, so I will show 2 examples

👍 2

bjonnh

03/11/2019, 5:15 AM

[Reading Lines]->[n threads each processing a line]->[Writing Lines]

gildor

03/11/2019, 5:16 AM

Copy code

withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
    output.bufferedWriter().use { out ->
        input.forEachLine { line ->
            val tokenized = tokenizerFactory.create(pp.preProcess(line)).tokens.joinToString(" ")
            out.write(tokenized + "\n")
        }
    }
}

gildor

03/11/2019, 5:18 AM

also it significantly more efficient, because

appendText

opens file and close it on each operation, which cause huge overheaader

bjonnh

03/11/2019, 5:19 AM

With the test file I go (with my solution) from 24s to 9s (on a 2 CPU machine)

bjonnh

03/11/2019, 5:19 AM

so yes that seem to help

bjonnh

03/11/2019, 5:20 AM

but then on the full file on the production machine, I end up with a single cpu used

bjonnh

03/11/2019, 5:21 AM

the full run would take hours, but it seems to run as fast as what I had initially (really close to your solution except using appendText)

bjonnh

03/11/2019, 5:23 AM

And I'm back to my initial idea to be able to use the 12 threads of that machine (24 w/ HT but it is disabled) to do the tokenizer part, which is what takes the most time in that whole thing

gildor

03/11/2019, 5:23 AM

I see what you mean, it’s more efficient, but not parallel. If your processing is really heavy you can use async solution that read files line by line and multiple workers process it and than write, but it would make sense for really heavy processing

bjonnh

03/11/2019, 5:23 AM

well I'm processing text files that are >20G

gildor

03/11/2019, 5:24 AM

I understand, but I mean “processing” of single line

gildor

03/11/2019, 5:25 AM

otherwise cost of multithread, atomics in channels may be higher or close to coast of your processing

gildor

03/11/2019, 5:25 AM

and you will end up with solution that utilise all cores but will be not efficient and maybe not significantly faster or slower

bjonnh

03/11/2019, 5:25 AM

yep that's exactly what happened to me 😄

gildor

03/11/2019, 5:26 AM

and it make sense

bjonnh

03/11/2019, 5:26 AM

so maybe I should do processing by blocks

bjonnh

03/11/2019, 5:26 AM

take x thousands of lines

bjonnh

03/11/2019, 5:26 AM

and have workers doing that

gildor

03/11/2019, 5:26 AM

yeah, if you have a lot of files it’s not a problem, each core will just work on each file

gildor

03/11/2019, 5:27 AM

for cases when you have one huge file maybe split it up to multiple would be a good solution

gildor

03/11/2019, 5:31 AM

it’s actually usual problem of paralleling of any work, that parallel processing is efficient only in some usecases, you can find many talks in Google about this related to ForkJoinPool and parallel streams from Java 8, but general approach also applied to coroutines

gildor

03/11/2019, 5:36 AM

One more addition to my code above, probably would be better to support coroutine cancellation for this code, so you can rewrite it like this:

Copy code

output.bufferedWriter().use { out ->
        input.bufferedReader().use { inp ->
            while (isActive) {
                val line = inp.readLine() ?: break
                out.write(line + "\n")
            }
        }
    }
}

gildor

03/11/2019, 5:38 AM

take x thousands of lines

Or even more, depends on line length, because you pay significant price for IO operations, especially for opening/closing

bjonnh

03/11/2019, 7:41 AM

Or using parallel streams:

Copy code

output.bufferedWriter().use { out ->
            input.bufferedReader().lines().parallel().forEach {
             out.write(tokenizerFactory.create(pp.preProcess(it)).tokens.joinToString(" ") + "\n")
            }
        }

This is by far the fastest option at least 10-15 times faster than anything else I tried. It uses more memory though… But I can spare some for that

gildor

03/11/2019, 7:44 AM

did you compare it with single-thread version that just split file?

gildor

03/11/2019, 7:46 AM

also, are you sure that order of lines is not important for your case? because forEach returns events not in the same order

gildor

03/11/2019, 7:47 AM

Anyway, tho parallel streams are very optimized for paralleling some job I’m really not sure that it’s make sense for this use case, only if tokenizer is really heavy

gildor

03/11/2019, 7:47 AM

but it’s relatively easy way to parallel it, but maybe split file and do processing on each of them may be even more efficient, you do not pay price of context switch

gildor

03/11/2019, 7:48 AM

also you can do something similar with coroutines, but it would be probably less efficient, just becasuse parallel streams optimized for such tasks

bjonnh

03/11/2019, 7:59 AM

order of line is not important no

bjonnh

03/11/2019, 7:59 AM

tokenizer is pretty heavy

bjonnh

03/11/2019, 8:00 AM

alsparallel stream have an issue with batch size that increase over time

bjonnh

03/11/2019, 8:00 AM

I just adapted a splitter following: https://www.airpair.com/java/posts/parallel-processing-of-io-based-data-with-java-streams

bjonnh

03/11/2019, 8:02 AM

works well

bjonnh

03/11/2019, 8:08 AM

but yes splitting file once it is generated may be an option in the future

bjonnh

03/11/2019, 8:08 AM

(the input file)

03/11/2019, 12:47 PM

I had to solve something like this recently. I ended up writing https://bitbucket.org/marshallpierce/task-throttle/src/master/ to allow me to have bounded concurrency so I could have some i/o work happening before the CPU was ready for it so the CPU was never idle.

03/11/2019, 12:47 PM

I can go into more detail if you're still looking for further optimization

bjonnh

03/11/2019, 3:56 PM

neat! thanks. That's something I tried to do, but I made it slower than anything else

3 Views

Open in Slack

Previous Next