https://kotlinlang.org logo
Title
p

pakoito

03/25/2020, 3:34 AM
So I want to minimize both the necessary calculations as well as the traversals, creating no intermediate data structures.
Sequence it is then, or plain old imperative code
Sequence is the only data structure that doesn’t leave intermediates. Still it’s limited in its API IIRC
if performance is critical, then go for imperative code and loops. No need to be dogmatic about it. Local mutability is okay.
👍 1
j

Jakub Pi

03/26/2020, 3:52 AM
class matcher(private val similarity: Similarity, private vararg val f : BiPred) {
    fun bind() : (Data) -> (Data) -> EnumSet<Similarity> =
        { data ->
            { otherdata -> f
                .map{ partial(data, it) }
                .fold(true) { acc, func -> acc && func(otherdata)} //TODO can short-circuit here
                .let { if (it) EnumSet.of(similarity) else EnumSet.noneOf(Similarity::class.java)}}
        }

    companion object {
        fun bind (matchers : List<matcher>) : (Data) -> (Data) -> EnumSet<Similarity> {
            return when (matchers.size) {
                0 -> { _ -> { EnumSet.noneOf(Similarity::class.java) } }
                1 -> matchers.first().bind()
                else ->
                    { d: Data ->
                        { d2: Data ->
                            matchers
                                .map { matcher -> matcher.bind() }
                                .fold(EnumSet.noneOf(Similarity::class.java)) { acc, f ->
                                    acc.apply { addAll(f(d)(d2)) }
                                }
                        }
                    }
            }
        }
    }
}
Ended up with this, which gives me the most flexibility. I can decide later how to invoke it. Is there a better name than bind? That probably has a very specific meaning in FP
Also, I know what you meant by short circuiting the fold (exit early) but I think the && with a false on the left should already spare the function call, so worst case I evaluate 4-6 more boolean operations.
What I tried to run the whole thing as a sequence I got stackoverflow errors since everything was lazy until my entire data set was in memory and I actually tried to evaluate it. It all ended up being Sequences nested about 4000 deep with an emptySequence on the very inside for 95% of the elements 😕
p

pakoito

03/26/2020, 12:18 PM
fun bind() :
this is a specific version of a general pattern, which is currying
if you did
fun bind(d: Data, d1: Data) = TODO()

(::bind).curried()
you'll do the same
👍 1
as you have it today, you might as well make it a
val
instead
in the JVM having curried functions as values takes space on the heap, which is why keeping them in code and take references when used is preferred. Otherwise you're allocating on class load when it may not even be used.
j

Jakub Pi

03/26/2020, 2:47 PM
Just realised arrow also has
.uncurried()
which will convert my bind into a regular function. Don't quite understand what you mean by your last point. My curried function is a return value, so nothing should be allocated unless I invoke
bind()
which is just a regular function. I can see how having a curried function results in an extra allocation due to an intermediate object being returned on invocation with the first parameter. But what do you mean by making it a val? you mean instead of using fun? Looking at it another way, my
matcher
class is really a factory for comparison operations, so I could rename
bind()
to
comparator()
. Would it be more efficient to return
(Data, Data) -> EnumSet<Similarity>
to avoid currying altogether?
p

pakoito

03/26/2020, 2:52 PM
so my point was taking your example to the extreme
j

Jakub Pi

03/26/2020, 2:53 PM
So I could rewrite it to take two parameters via
{ d, d2 ->
then assign the return value to a
val
p

pakoito

03/26/2020, 2:53 PM
in OCaml, Haskell or Scala I could do
let myFun x y = x + y
let plusOne x = myFun 3
the compiler knows and understand partial application, the code lives in the
.code
part of the binary, so to speak
in Java or Kotlin if I do
val myFun = { x -> { y -> x + y } }
it allocates an object on the heap for the function, which when invoked calls into the
.code
part of it
in scala you can do (pseudocode I don’t remember the syntax)
class Bla {
  def x = 1
  def bla y = y + x 
}
and when you create a new Bla it’s only one object allocated
class Bla {
  val x = 1
  val bla = { y -> x }
}
that allocates two objects per instance
which means that in Scala if you allocate 100k Blas, you have 100k objects on the heap. In Kotlin you’d put 200k
which is why we don’t write functions on the
val
style
j

Jakub Pi

03/26/2020, 2:57 PM
Yep gotcha. I think it's unavoidable in my case. I really do have a factory of comparators. I coud probably do something with macros in clojure, but not sure how I can keep my comparators from being allocated dynamically in Kotlin. However, I should only have 6-8 comparators created and they will only be created by my factory as necessary.
👍🏼 1
p

pakoito

03/26/2020, 2:57 PM
instead, we write them as
fun
and take a function reference when it’s needed
cool 😄
j

Jakub Pi

03/26/2020, 2:58 PM
Thanks for all your help, learned a lot
p

pakoito

03/26/2020, 3:02 PM
What I tried to run the whole thing as a sequence I got stackoverflow errors since everything was lazy
loool I missed this
j

Jakub Pi

03/26/2020, 3:08 PM
Hehe. Yep, I don't think sequences was the right way to go. The cost of running the comparisons varies drastically. What I really want is a scheduler and coroutines.
p

pakoito

03/26/2020, 4:10 PM
why not using arrow.IO
it's stack safe, and IIRC couroutines aren't unless you write them tailrec