Hi everyone Is it possible to sum average values of several kotlinlang #announcements

Hi everyone. Is it possible to sum average values ...

Nikolay Puliaev

04/22/2021, 11:34 AM

Hi everyone. Is it possible to sum average values of several Int lists in more beautiful way, to reduce code duplication?

Copy code

val red = data.red.map { it.toUByte().toInt() }.average()
val green = data.green.map { it.toUByte().toInt() }.average()
val blue = data.blue.map { it.toUByte().toInt() }.average()
val luminance = red + green + blue

Jonathan Olsson

04/22/2021, 11:49 AM

Haven't tested but something like this perhaps?

Copy code

val lumincance = listOf(data.red, data.green, data.blue).sumOf { cs ->
    cs.map { 
        it.toUByte().toInt()
    }.average()
}

Michael Böiers

04/22/2021, 11:51 AM

Or how about this:

Copy code

val luminance = with(data) {
    sequenceOf(red, green, blue)
        .map { it.map(SomeClass::toUByte).map(UByte::toInt)}
        .map { it.average() }
        .sum()
}

Michael Böiers

04/22/2021, 11:52 AM

New to Kotlin myself … didn’t know the sumOf function. Nice!

🍻 1

Nikolay Puliaev

04/22/2021, 11:54 AM

Thanks to both of you!

Nir

04/22/2021, 1:57 PM

one thing to note here, if your data is very very large, all 3 approaches here will create new arrays the same size as the original pixel data

👍 1

Nikolay Puliaev

04/22/2021, 1:58 PM

Is it possible to avoid it?

Nir

04/22/2021, 1:58 PM

You can avoid this in e.g. Jonathan's approach by changing

cs.asSequence().map

Nir

04/22/2021, 1:59 PM

now the map becomes lazy, and when you take the average it should just do it via accumulation and never store the new array (assuming a sane approach in the stdlib)

Nikolay Puliaev

04/22/2021, 1:59 PM

Wow, sounds great, thanks!

Nir

04/22/2021, 1:59 PM

Michael Böiers

04/22/2021, 2:02 PM

@Nir unfortunately sequences don’t support average calculations, do they?

Michael Böiers

04/22/2021, 2:03 PM

Yes, they do. Neat! 🙂

Michael Böiers

04/22/2021, 2:03 PM

It would be nice to have an averageOf, but I suppose we cannot have all the agg operations built in 🙂

Michael Böiers

04/22/2021, 2:04 PM

Something like

Copy code

listOf("a", "xx").averageOf { it.length }

Michael Böiers

04/22/2021, 2:05 PM

Makes sense as an extension function, to hide the asSequence call.

Nir

04/22/2021, 2:08 PM

Yes, that would be a nice addition

Nir

04/22/2021, 2:08 PM

thankfully it's easy to write on our own

Nir

04/22/2021, 2:09 PM

extension functions are pretty fantastic 🙂

Michael Böiers

04/22/2021, 2:15 PM

Yes. This is how it would look like, following the sumOf implementation. Kind of messy, but optimum (inlined) performance.

Copy code

inline fun <T> kotlin.collections.Iterable<T>.averageOf(selector: (T) -> <http://kotlin.Int|kotlin.Int>): kotlin.Double {
    var sum: Long = 0.toLong()
    var count = 0
    for (element in this) {
        sum += selector(element)
        count++
    }
    return sum.toDouble()/count
}

listOf("a", "xx").averageOf { it.length }

Nir

04/22/2021, 2:33 PM

@Michael Böiers there are actually a bunch of issues with that, FYI

Nir

04/22/2021, 2:34 PM

sorry, not a bunch, just one 🙂

Nir

04/22/2021, 2:34 PM

you can have overflow issues very easily

Nir

04/22/2021, 2:34 PM

You want to use an algorithm where you continuously update the average most likely

Michael Böiers

04/22/2021, 2:34 PM

The obvious issue is dealing with an empty iterable 🙂

Michael Böiers

04/22/2021, 2:35 PM

Hm … would a sum of ints ever overflow a long?

Nir

04/22/2021, 2:36 PM

the empty iterable is a fair point, but presumably it could be like the reduce (I think? reduce, not fold) family algorithms where it's just an error to use with an empty container

Michael Böiers

04/22/2021, 2:37 PM

Here’s the official average implementation, I think you had a fair point regarding the overflow. 🙂

Nir

04/22/2021, 2:37 PM

it's possible that they would not I suppose, I'd have to think about it, but even then, it would only work for the very specific case of Int

Nir

04/22/2021, 2:38 PM

there might still be other issues there

Nir

04/22/2021, 2:38 PM

that's actually surprisingly naive

Michael Böiers

04/22/2021, 2:39 PM

How so? I don’t think the average can be calculated without storing sum and count.

Nir

04/22/2021, 2:39 PM

yeah, you use an update algorithm

Nir

04/22/2021, 2:41 PM

you basically have an estimate of the mean that you update. And, even better than that you have algorithms like this that further try to compensate for numeric issues: https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Jonathan Olsson

04/22/2021, 4:15 PM

Not sure I’d expect a standard average function to do something fancy at any cost of accuracy though. 🤔 Don’t think the jdk IntStream implementation uses update? I might be wrong.

Nir

04/22/2021, 4:16 PM

it's the other way around, it will be more accurate

Jonathan Olsson

04/22/2021, 4:19 PM

You mean in the overflow case?

Nir

04/22/2021, 4:23 PM

there are all kinds of cases

Nir

04/22/2021, 4:23 PM

not just overflow

Nir

04/22/2021, 4:24 PM

there's a whole body of prior art on these things, and usually the most naive approach is not used

Jonathan Olsson

04/22/2021, 7:07 PM

Even for Ints? I wouldn't expect an issue in most cases (except for overflow). Regardless, I agree that current implementation seems a bit on the sparse side. The one for Double is exactly the same... Also, shouldn't
sum
overflow way before
count
? Edit: right,

sum

Double

in current implementation

Jonathan Olsson

04/22/2021, 7:16 PM

(JDK IntPipeline doesn't seem to use compensation, DoublePipeline however does for sum. No overflow checks there at all)

Jonathan Olsson

04/22/2021, 7:16 PM

🤷‍♂️ Interesting nonetheless. Kaham summation added to reading list. 👍

👍 1

Michael Böiers

04/22/2021, 9:32 PM

Here’s a relevant thread. Interesting! https://stackoverflow.com/questions/7552443/whats-the-numerically-best-way-to-calculate-the-average

Nir

04/22/2021, 9:45 PM

yeah, it's quite the rabbit hole

Nir

04/22/2021, 9:46 PM

you can get into the weeds but I guess one thing that's fairly clear, you would want for example that if you simply feed the same number over and over and over, that you will get that number as the average, no matter how long the list is, that seems like a reasoanble requirement (to me)

Nir

04/22/2021, 9:47 PM

and just adding things up and dividing can fail that very easily

Michael Böiers

04/23/2021, 6:37 AM

I’m not sure. I hear what you’re saying, but another way to look at this is: “I would want a function like

average()

to do the naive computation that everybody knows, which is summing all the values and dividing by the count. Then we could have other functions that are named by algorithm, so that clients know what they can expect.”

Jonathan Olsson

04/23/2021, 7:08 AM

I think the point that Nir made is that if you are summing up Doubles the way the stdlib is doing you will end up with unexpected (read imprecise) results for larger data sets. The java stdlib uses long when summing Ints, and compensates for imprecision when summing up Doubles.

Jonathan Olsson

04/23/2021, 7:09 AM

In other words: the naive approach doesn't work for floats.

Jonathan Olsson

04/23/2021, 7:09 AM

Or at least not very well. 🙂

Nir

04/23/2021, 1:50 PM

Indeed. I think that a function called

average

should be a "reasonable" implementation of computing the average. That is, it should do a pretty good job, and perform pretty well. Looking at the prior art, you can hugely improve results at minor performance cost. So I don't see an intrinsic argument for the naive approach.

Nir

04/23/2021, 1:50 PM

It's a bit like arguing that

sort

should be bubble sort, or insertion sort. These are after all the simplest algorithms.

Nir

04/23/2021, 1:51 PM

In practice it's pretty much the opposite, most standard library sorts tend to use more complex algorithms than any of the basic ones, and/or hybrid algorithms.

Nir

04/23/2021, 1:52 PM

They don't use the absolute craziest algorithms that exist, perhaps, both because of implementation complexity, because you start to enter a regime where there are major trade-offs between approaches, etc. But they do use reasonably sophisticated approaches.

5 Views

Open in Slack

Previous Next