gt So I want to minimize both the necessary calculations as kotlinlang #functional

> So I want to minimize both the necessary calc...

pakoito

03/25/2020, 3:34 AM

So I want to minimize both the necessary calculations as well as the traversals, creating no intermediate data structures.

Sequence it is then, or plain old imperative code

pakoito

03/25/2020, 3:37 AM

Sequence is the only data structure that doesn’t leave intermediates. Still it’s limited in its API IIRC

pakoito

03/25/2020, 3:37 AM

if performance is critical, then go for imperative code and loops. No need to be dogmatic about it. Local mutability is okay.

👍 1

Jakub Pi

03/26/2020, 3:52 AM

Copy code

class matcher(private val similarity: Similarity, private vararg val f : BiPred) {
    fun bind() : (Data) -> (Data) -> EnumSet<Similarity> =
        { data ->
            { otherdata -> f
                .map{ partial(data, it) }
                .fold(true) { acc, func -> acc && func(otherdata)} //TODO can short-circuit here
                .let { if (it) EnumSet.of(similarity) else EnumSet.noneOf(Similarity::class.java)}}
        }

    companion object {
        fun bind (matchers : List<matcher>) : (Data) -> (Data) -> EnumSet<Similarity> {
            return when (matchers.size) {
                0 -> { _ -> { EnumSet.noneOf(Similarity::class.java) } }
                1 -> matchers.first().bind()
                else ->
                    { d: Data ->
                        { d2: Data ->
                            matchers
                                .map { matcher -> matcher.bind() }
                                .fold(EnumSet.noneOf(Similarity::class.java)) { acc, f ->
                                    acc.apply { addAll(f(d)(d2)) }
                                }
                        }
                    }
            }
        }
    }
}

Ended up with this, which gives me the most flexibility. I can decide later how to invoke it. Is there a better name than bind? That probably has a very specific meaning in FP

Jakub Pi

03/26/2020, 3:54 AM

Also, I know what you meant by short circuiting the fold (exit early) but I think the && with a false on the left should already spare the function call, so worst case I evaluate 4-6 more boolean operations.

Jakub Pi

03/26/2020, 4:00 AM

What I tried to run the whole thing as a sequence I got stackoverflow errors since everything was lazy until my entire data set was in memory and I actually tried to evaluate it. It all ended up being Sequences nested about 4000 deep with an emptySequence on the very inside for 95% of the elements 😕

pakoito

03/26/2020, 12:18 PM

fun bind() :

this is a specific version of a general pattern, which is currying

pakoito

03/26/2020, 12:18 PM

if you did

pakoito

03/26/2020, 12:19 PM

Copy code

fun bind(d: Data, d1: Data) = TODO()

(::bind).curried()

you'll do the same

👍 1

pakoito

03/26/2020, 12:19 PM

as you have it today, you might as well make it a

val

instead

pakoito

03/26/2020, 12:21 PM

in the JVM having curried functions as values takes space on the heap, which is why keeping them in code and take references when used is preferred. Otherwise you're allocating on class load when it may not even be used.

Jakub Pi

03/26/2020, 2:47 PM

Just realised arrow also has

.uncurried()

which will convert my bind into a regular function. Don't quite understand what you mean by your last point. My curried function is a return value, so nothing should be allocated unless I invoke

bind()

which is just a regular function. I can see how having a curried function results in an extra allocation due to an intermediate object being returned on invocation with the first parameter. But what do you mean by making it a val? you mean instead of using fun? Looking at it another way, my

matcher

class is really a factory for comparison operations, so I could rename

bind()

comparator()

. Would it be more efficient to return

(Data, Data) -> EnumSet<Similarity>

to avoid currying altogether?

pakoito

03/26/2020, 2:52 PM

so my point was taking your example to the extreme

Jakub Pi

03/26/2020, 2:53 PM

So I could rewrite it to take two parameters via

{ d, d2 ->

then assign the return value to a

val

pakoito

03/26/2020, 2:53 PM

in OCaml, Haskell or Scala I could do

Copy code

let myFun x y = x + y
let plusOne x = myFun 3

pakoito

03/26/2020, 2:53 PM

the compiler knows and understand partial application, the code lives in the

.code

part of the binary, so to speak

pakoito

03/26/2020, 2:53 PM

in Java or Kotlin if I do

pakoito

03/26/2020, 2:54 PM

Copy code

val myFun = { x -> { y -> x + y } }

pakoito

03/26/2020, 2:54 PM

it allocates an object on the heap for the function, which when invoked calls into the

.code

part of it

pakoito

03/26/2020, 2:55 PM

in scala you can do (pseudocode I don’t remember the syntax)

pakoito

03/26/2020, 2:55 PM

Copy code

class Bla {
  def x = 1
  def bla y = y + x 
}

and when you create a new Bla it’s only one object allocated

pakoito

03/26/2020, 2:56 PM

Copy code

class Bla {
  val x = 1
  val bla = { y -> x }
}

that allocates two objects per instance

pakoito

03/26/2020, 2:57 PM

which means that in Scala if you allocate 100k Blas, you have 100k objects on the heap. In Kotlin you’d put 200k

pakoito

03/26/2020, 2:57 PM

which is why we don’t write functions on the

val

style

Jakub Pi

03/26/2020, 2:57 PM

Yep gotcha. I think it's unavoidable in my case. I really do have a factory of comparators. I coud probably do something with macros in clojure, but not sure how I can keep my comparators from being allocated dynamically in Kotlin. However, I should only have 6-8 comparators created and they will only be created by my factory as necessary.

👍🏼 1

pakoito

03/26/2020, 2:57 PM

instead, we write them as

fun

and take a function reference when it’s needed

pakoito

03/26/2020, 2:58 PM

cool 😄

Jakub Pi

03/26/2020, 2:58 PM

Thanks for all your help, learned a lot

pakoito

03/26/2020, 3:02 PM

What I tried to run the whole thing as a sequence I got stackoverflow errors since everything was lazy

loool I missed this

Jakub Pi

03/26/2020, 3:08 PM

Hehe. Yep, I don't think sequences was the right way to go. The cost of running the comparisons varies drastically. What I really want is a scheduler and coroutines.

pakoito

03/26/2020, 4:10 PM

why not using arrow.IO

pakoito

03/26/2020, 4:10 PM

it's stack safe, and IIRC couroutines aren't unless you write them tailrec

2 Views

Open in Slack

Previous Next