> So I want to minimize both the necessary calc...
# functional
p
So I want to minimize both the necessary calculations as well as the traversals, creating no intermediate data structures.
Sequence it is then, or plain old imperative code
Sequence is the only data structure that doesn’t leave intermediates. Still it’s limited in its API IIRC
if performance is critical, then go for imperative code and loops. No need to be dogmatic about it. Local mutability is okay.
👍 1
j
Copy code
class matcher(private val similarity: Similarity, private vararg val f : BiPred) {
    fun bind() : (Data) -> (Data) -> EnumSet<Similarity> =
        { data ->
            { otherdata -> f
                .map{ partial(data, it) }
                .fold(true) { acc, func -> acc && func(otherdata)} //TODO can short-circuit here
                .let { if (it) EnumSet.of(similarity) else EnumSet.noneOf(Similarity::class.java)}}
        }

    companion object {
        fun bind (matchers : List<matcher>) : (Data) -> (Data) -> EnumSet<Similarity> {
            return when (matchers.size) {
                0 -> { _ -> { EnumSet.noneOf(Similarity::class.java) } }
                1 -> matchers.first().bind()
                else ->
                    { d: Data ->
                        { d2: Data ->
                            matchers
                                .map { matcher -> matcher.bind() }
                                .fold(EnumSet.noneOf(Similarity::class.java)) { acc, f ->
                                    acc.apply { addAll(f(d)(d2)) }
                                }
                        }
                    }
            }
        }
    }
}
Ended up with this, which gives me the most flexibility. I can decide later how to invoke it. Is there a better name than bind? That probably has a very specific meaning in FP
Also, I know what you meant by short circuiting the fold (exit early) but I think the && with a false on the left should already spare the function call, so worst case I evaluate 4-6 more boolean operations.
What I tried to run the whole thing as a sequence I got stackoverflow errors since everything was lazy until my entire data set was in memory and I actually tried to evaluate it. It all ended up being Sequences nested about 4000 deep with an emptySequence on the very inside for 95% of the elements 😕
p
fun bind() :
this is a specific version of a general pattern, which is currying
if you did
Copy code
fun bind(d: Data, d1: Data) = TODO()

(::bind).curried()
you'll do the same
👍 1
as you have it today, you might as well make it a
val
instead
in the JVM having curried functions as values takes space on the heap, which is why keeping them in code and take references when used is preferred. Otherwise you're allocating on class load when it may not even be used.
j
Just realised arrow also has
.uncurried()
which will convert my bind into a regular function. Don't quite understand what you mean by your last point. My curried function is a return value, so nothing should be allocated unless I invoke
bind()
which is just a regular function. I can see how having a curried function results in an extra allocation due to an intermediate object being returned on invocation with the first parameter. But what do you mean by making it a val? you mean instead of using fun? Looking at it another way, my
matcher
class is really a factory for comparison operations, so I could rename
bind()
to
comparator()
. Would it be more efficient to return
(Data, Data) -> EnumSet<Similarity>
to avoid currying altogether?
p
so my point was taking your example to the extreme
j
So I could rewrite it to take two parameters via
{ d, d2 ->
then assign the return value to a
val
p
in OCaml, Haskell or Scala I could do
Copy code
let myFun x y = x + y
let plusOne x = myFun 3
the compiler knows and understand partial application, the code lives in the
.code
part of the binary, so to speak
in Java or Kotlin if I do
Copy code
val myFun = { x -> { y -> x + y } }
it allocates an object on the heap for the function, which when invoked calls into the
.code
part of it
in scala you can do (pseudocode I don’t remember the syntax)
Copy code
class Bla {
  def x = 1
  def bla y = y + x 
}
and when you create a new Bla it’s only one object allocated
Copy code
class Bla {
  val x = 1
  val bla = { y -> x }
}
that allocates two objects per instance
which means that in Scala if you allocate 100k Blas, you have 100k objects on the heap. In Kotlin you’d put 200k
which is why we don’t write functions on the
val
style
j
Yep gotcha. I think it's unavoidable in my case. I really do have a factory of comparators. I coud probably do something with macros in clojure, but not sure how I can keep my comparators from being allocated dynamically in Kotlin. However, I should only have 6-8 comparators created and they will only be created by my factory as necessary.
👍🏼 1
p
instead, we write them as
fun
and take a function reference when it’s needed
cool 😄
j
Thanks for all your help, learned a lot
p
What I tried to run the whole thing as a sequence I got stackoverflow errors since everything was lazy
loool I missed this
j
Hehe. Yep, I don't think sequences was the right way to go. The cost of running the comparisons varies drastically. What I really want is a scheduler and coroutines.
p
why not using arrow.IO
it's stack safe, and IIRC couroutines aren't unless you write them tailrec