Hi all, I have a list of strings and I need to bat...
# getting-started
a
Hi all, I have a list of strings and I need to batch it into list of list where each list doesn’t exceed 10 items AND also each list doesn’t exceed 100 chars. for example
Copy code
input: [ "this is item number one",
         "this is item number two",
         "this is item number three",
         "this is item number four",
         "this is item number five",
         "this is item number six"
       ]

output: [
         ["this is item number one", "this is item number two", "this is item number three", "this is item number four"],
         ["this is item number five", "this is item number six"]
       ]
I wrote this
Copy code
private fun List<String>.batch(): List<List<String>> {
    return fold(mutableListOf(mutableListOf<String>())) { accumulator, string ->
        val currentBucket = accumulator.last()
        val currentBucketCharsCount = currentBucket.sumOf { item -> item.length }
        val willBucketExceedMaximumCharCount = currentBucketCharsCount + string.length > 100
        val willBucketExceedMaximumSize = currentBucket.size + 1 > 10
        if (willBucketExceedMaximumCharCount || willBucketExceedMaximumSize) {
            accumulator.add(mutableListOf(string))
        } else {
            currentBucket.add(string)
        }
        accumulator
    }
}
but I have a feeling that it can be done in a more cleaner/simple way. any ideas?
r
sequences sound great for readability here
Copy code
fun List<String>.batch(): Sequence<List<String>> = sequence {
    var currentBucket = mutableListOf<String>()
    var currentCharCount = 0

    for (item in this@batch) {
        val willBucketExceedMaximumCharCount = currentCharCount + item.length > 100
        val willBucketExceedMaximumSize = currentBucket.size + 1 > 10

        if (willBucketExceedMaximumCharCount || willBucketExceedMaximumSize) {
            yield(currentBucket)
            currentBucket = mutableListOf()
            currentCharCount = 0
        } 
    
        currentBucket.add(item)
        currentCharCount += item.length
    }

    if (currentBucket.isNotEmpty()) {
        yield(currentBucket)
    }
}
also easy optimization so you don't need to iterate the last bucket every time
a
Oh I didn’t think of that! Thanks for the tip. I need it to be a list at the end so that I can map over it and call a suspend func.
Sequence.map
doesn’t work for suspend func. So although using sequence here made the performance better for me (3ms vs 6ms) when I do
.toList
after that the runtime actually goes up to 9ms.
m
I wish ".windowed()" had a lambda step size, so you could do something like this:
Copy code
list.asSequence().windowed(10, partialWindows = true) { window ->
        window.runningFold(0) { acc, s -> acc + s.length }.indexOfFirst { it > 100 }.let {
            when (it) {
                -1 -> 10
                in 1..10 -> it
                else -> error("can't decide step size")
            }
        }
    }
👍 1
r
You could use
Flow
instead of
Sequence
if you have suspending stuff to do
a
An example of using flow would be something like this, right?
Copy code
fun List<String>.batch(): Flow<List<String>> = flow {
    var currentBucket = mutableListOf<String>()
    var currentCharCount = 0

    for (item in this@batch) {
        val willBucketExceedMaximumCharCount = currentCharCount + item.length > 100
        val willBucketExceedMaximumSize = currentBucket.size + 1 > 10

        if (willBucketExceedMaximumCharCount || willBucketExceedMaximumSize) {
            emit(currentBucket)
            currentBucket = mutableListOf()
            currentCharCount = 0
        } 
    
        currentBucket.add(item)
        currentCharCount += item.length
    }

    if (currentBucket.isNotEmpty()) {
        emit(currentBucket)
    }
}
If yes, I had this before when the
batch
function was returning a
List<List<String>>
Copy code
val requests = request.contentsList.batch()
val responses = requests.map { batch ->
    doAsyncJob(batch)
}.awaitAll()
How can I do this now after changing the
batch
function to return a
Flow<List<String>>
? Sorry for the too many questions.
I can do this but not sure if it’s the best way to do it.
Copy code
val requests = request.contentsList.batch()
val responses = requests.map { batch ->
    doAsyncJob(batch)
}.toList().awaitAll()