I am building an app maximizing my Arrow usage to try and le kotlinlang #arrow

I am building an app maximizing my Arrow usage to ...

kyleg

12/19/2019, 5:32 PM

I am building an app maximizing my Arrow usage to try and learn how to use it. I start with an

IO { File("/path/to…").walk()) }

and apply IO transforms to the sequence to end up with an

IO<Sequence<IO<Foo>>>

and then use

flatMap

and

sequence

to transform it to

IO<Sequence<Foo>>

. When I

unsafeRunSync

the result, I get a

StackOverflowError

, presumably from the part where I “lift” the nested IO out of the Sequence (has to iterate over all the elements in the sequence I think). Given that you’re ideally going to have a single place at the “edge” to run the IO, is there a better way I should structure it? Assume I’m starting with

IO<Sequence<IO<Foo>>>

and can do whatever to it. Is

myIO.unsafeRunSync().forEach { it.unsafeRunSync() }

the way I should go about doing it at the edge? Basically run the top-level IO to get the

Sequence

and then start taking values, whcih are themselves

IO

, and run them? Here is a simplified version of what I have now:

Copy code

fun buildFoo(file: File): IO<String> = IO { "test" }
val program: IO<Sequence<File>> = IO { File("/Users/kylegoetz/Desktop").walk() }
val transformedProgram: IO<Sequence<IO<String>>> = program.map {
    it.map {
        buildFoo(it)
    }
}
val noNestedIO: IO<Sequence<String>> = transformedProgram.flatMap {
    it.filterNotNull().k().sequence(IO.applicative()).fix()
}

noNestedIO.unsafeRunSync() // StackOverflowError

kyleg

12/19/2019, 5:35 PM

What I’m really doing in that inner IO is applying custom transforms to the file data based on filetype to get a fingerprint for dupe detection.

Jannis

12/19/2019, 5:58 PM

Can you try forcing the entire sequence the unsafe way with

filterNotNull().k().map { it.unsafeRunSync() }.toList()

inside

transformedProgram.flatMap { ... }

(just wrap it in

IO { ... }

to get types right. If that does not fail, there is something off with

.sequence()

. Just as a quick check to see where the problem actually is. If that does not overflow it's likely

sequence

, if it does it's somewhere else... Sequences behave pretty bad with

traverse

in terms of laziness, so it might be there 😕

kyleg

12/19/2019, 5:59 PM

OK, I’ll give that a try. Just a heads up, my program ran all night for who knows how long while I was asleep before finally overflowing, so I probably can’t get back to you soon with a result. Will have to let it run for a long time.

Jannis

12/19/2019, 6:01 PM

Ah ok, so how long is the sequence? I recently had trouble with folding/rebuilding a really long sequence, which is what traverse/sequence does

kyleg

12/19/2019, 6:02 PM

It’s insanely long. I’m not sure how long. The sequence length is however many files there are, recursively, in my Documents folder, which includes a lot of Node.js projects with an outlandish number of nested files in the dependencies. Tens of thousands, possibly more?

kyleg

12/19/2019, 6:04 PM

My goal is to learn how to use FP appropriately. This problem is just my first attempt to being strict on FP stuff. Trying to figure out how to handle

IO<Sequence<IO<CostlyOperation>>>

at the edge, like

top.unsafeRunSync().forEach { it.unsafeRunSync() }

Jannis

12/19/2019, 6:04 PM

Basically

sequence() = traverse(::identity) = foldRight(Applicative.just(emptyList())) { (acc, v) -> v.ap(acc.map { { el -> sequenceOf(el) + it } }) }

. The key point here is that in

foldRight

the sequence is rebuilt and the + operator on sequences is not stacksafe. I'll look through the sources to see if that is actually correct, but I think that may be it 😕

Jannis

12/19/2019, 6:05 PM

traverse

and

sequence

are the right choice, they just don't play super nicely with

Sequence

kyleg

12/19/2019, 6:05 PM

I’m not so experienced with sequences and lazily evaluated stuff, but my mental model of how sequence must work in order to “pull” the nested IO out of the Sequence is that the entire Sequence would have to be evaluated and then re-wrapped in an IO

Jannis

12/19/2019, 6:06 PM

^^ That is corrected. However it is done step by step using

IO.ap

Jannis

12/19/2019, 6:08 PM

sequence

is an alias for

traverse(::identity)

which for sequence you can see here: https://github.com/arrow-kt/arrow/blob/master/modules/core/arrow-core-data/src/main/kotlin/arrow/core/SequenceK.kt#L32 It is implemented as a fold which folds over the content, combines it with

ap

and puts the sequence back together after combining

Jannis

12/19/2019, 6:10 PM

And it does use

Sequence.plus

which is not stacksafe, which is a fault on kotlin's side I think. Basically

superLongSequence().fold(emptySequence()) { acc, v -> sequenceOf(v) + acc }.first() == Stackoverflow

. At least when I last tested it

Jannis

12/19/2019, 6:15 PM

Here is a small way to reproduce it:

Copy code

generateSequence(0) { it + 1 }.take(100000).asSequence().fold(emptySequence<Int>()) { acc, v ->
        acc + v
      }.first()

This sucks and should not happen, btw this is using kotlin in-builts only

kyleg

12/19/2019, 6:21 PM

This will be cooler once I get smart enough to understand this deep level 🙂

Jannis

12/19/2019, 6:22 PM

Btw I am not even sure this is the cause. The way

foldRight

and

traverse

are implemented for

Sequences

is just not great. There is also no easy way to improve it afaik. So tbh I think going for

transformedProgram.flatMap { IO { it.filterNotNull().map { it.unsafeRunSync() } } }

isn't too bad, you just loose all the concurrency guarantees and benefits arrow gives when actually using

IO

the intended way

kyleg

12/19/2019, 6:23 PM

So you’re saying JB’s implementation of the Sequence.plus operator fun is not stack safe but there’s some other option out there that would be, and it’s a fixable issue that should have a bug filed, or it’s something unfixable and inherent to how one would do Sequence.plus ?

Jannis

12/19/2019, 6:25 PM

JB's plus operator is definitly not stack-safe (at least not this overload

fun Sequence<A>.plus(a: A): Sequence<A>

) the same thing works with

fun Sequence<A>.plus(seq: Sequence<A>): Sequence<A>

btw 😄 Fun times. We are using the latter in

traverse

which is why I think there might be something else going on. This is all very weird, you are not the only one lost in this 🙈

🙂 1

kyleg

12/19/2019, 6:25 PM

OK. But if I understand FP theory from Arrow docs correctly, all the `run`s should take place at the “edge” in one place, so I should really be doing

Copy code

fun main() {
  val program: IO<Sequence<IO<Any>>> = TODO()
  program.unsafeRunSync().map { seq -> seq.forEach { it.unsafeRunSync() } }
}

right?

kyleg

12/19/2019, 6:25 PM

so all my unsafe runs are in the same place?

kyleg

12/19/2019, 6:26 PM

rather than trying to move that IO out of the Seq and then flatmapping it away wherever the top-level IO is constructed in myc ode

Jannis

12/19/2019, 6:27 PM

Well if you can sure. However the best way is to only have one call to any of the

unsafe

methods. But this can't really be avoided when working with

IO

in sequences because of these issues. If only stackoverflows weren't so cryptic

kyleg

12/19/2019, 6:29 PM

Because the way my program is structured, for unit testing everything, is I have an App class that takes a bunch of use cases returning IOs. App.execute returns all the calculations wrapped in a single IO. This is the

IO<Sequence<IO<Whatever>>>

. So I can unit test the App.execute by injecting mocks. In my main() that calls App(…).execute() I have full control. Regarding `unsafe`: is it called that because the contents should be considered impure/have side effects, or is it called that because it is not stack-safe inside?

Jannis

12/19/2019, 6:31 PM

The former. Calling an unsafe method performs side-effects.

IO<A>

is always pure because it does not perform side-effects, but when calling any of the

unsafe

methods that changes.

kyleg

12/19/2019, 6:32 PM

OK, I understand that part. IO<A> is basically a value representing a function that will be impure, but IO<A> itself is not because it’s a value not an actual function-being-run. To use non-technical language.

💯 3

Jannis

12/19/2019, 6:32 PM

IO

is stacksafe, functional programs cannot really afford not being stacksafe 😉 We have tests that explicitly check that. However for sequences thats a different story because the issues are likely somewhere else

kyleg

12/19/2019, 6:33 PM

So theoretically

assertTrue(expectedIO, actualIO)

could be done in a unit test instead of

assertTrue(expectedValue, actualIO.unsafeRunSync())

kyleg

12/19/2019, 6:33 PM

unless IO has some eq operator that would prohibit it

raulraja

12/19/2019, 6:37 PM

If you encapsulate running it then yes but IO values are not comparable because they are lazy

raulraja

12/19/2019, 6:37 PM

I mean you can achieve that DSL but you have to run IO to compare it

Jannis

12/19/2019, 6:37 PM

Eq

for

IO

is undefined, we have a custom

Eq

instance for

IO

only for tests which uses

unsafeRunTimed

internally

kyleg

12/19/2019, 6:48 PM

I guess I was thinking of IO { list of instructions } as something like an array of instructions, and you could compare two IOs as equal if they have the same set of instructions, but the results of those IOs’ unsafeRunSync() could still be different because the contents of the effect { … } would be different in each one.

kyleg

12/19/2019, 6:50 PM

IO { !effect { repository.getAll() } }

would be equal to

IO { !effect { repository.getAll() } }

because they’re the same instruction sets, but that doesn’t mean the

unsafeRunSync

results will be the same because then the effects are applied. But thats’ my flawed conception of what laziness actually is 🙂

Jannis

12/19/2019, 6:51 PM

Btw I am trying to reproduce something,

traverse

on sequences is really slow. Like the actual call to

traverse

not even running the resulting datatype. Not entirely sure what is causing it, but imo that is unacceptable levels of slowness. There is no real alternative, but this just sucks. @kyleg the best solution is to simply not use

traverse

on sequences and just straight up run whatever code you need to run. Just wrap the entire thing in

IO

so you get some control over your side-effects back 😞

kyleg

12/19/2019, 6:51 PM

Cool beans. Thanks.

Jannis

12/19/2019, 6:57 PM

You are not wrong with

IO { list of instructions }

. In the end

IO

is just a powerful wrapper around a function. That is also where the problem is, there is no practial way to implement function equality, especially not without running it and once you run a function from

IO

you have side-effects and those are even harder to catch and test

Jannis

12/20/2019, 2:29 AM

So after the pr with

lazyAp

is merged you can actually call

traverse

on sequences again, it will still be very very slow, but it will run your code. If you don't need the resulting sequence it's usually better to call

traverse_

which throws away the sequence and makes it just as fast as

traverse

on any other datatype. (it also has a

sequence_

version). I'll link the pr later, but earliest this will be in arrow is next snapshots, or 0.10.5 next release. In short: use `traverse_`/`sequence_` whenever possible with sequences and using

traverse

on sequences now does not hang and throw on without ever running any effect, it also only evaluates sequences as far as needed which means it can also work on infinite sequences. Will link the pr later

Jannis

12/20/2019, 2:46 AM

https://github.com/arrow-kt/arrow/pull/1868

Jannis

12/21/2019, 10:19 AM

With that pr merged there is no more stackoverflows, traverse on sequences is now fast and unless you care about memory usage `traverse_`/`traverse` does not matter. Basically all problems fixed

arrow 1

🔝 1

🎉 1

raulraja

12/21/2019, 10:25 AM

Thanks Jannis!

3 Views

Open in Slack

Previous Next