kyleg
12/19/2019, 5:32 PMIO { File("/path/to…").walk()) }
and apply IO transforms to the sequence to end up with an IO<Sequence<IO<Foo>>>
and then use flatMap
and sequence
to transform it to IO<Sequence<Foo>>
. When I unsafeRunSync
the result, I get a StackOverflowError
, presumably from the part where I “lift” the nested IO out of the Sequence (has to iterate over all the elements in the sequence I think).
Given that you’re ideally going to have a single place at the “edge” to run the IO, is there a better way I should structure it? Assume I’m starting with IO<Sequence<IO<Foo>>>
and can do whatever to it. Is myIO.unsafeRunSync().forEach { it.unsafeRunSync() }
the way I should go about doing it at the edge? Basically run the top-level IO to get the Sequence
and then start taking values, whcih are themselves IO
, and run them?
Here is a simplified version of what I have now:
fun buildFoo(file: File): IO<String> = IO { "test" }
val program: IO<Sequence<File>> = IO { File("/Users/kylegoetz/Desktop").walk() }
val transformedProgram: IO<Sequence<IO<String>>> = program.map {
it.map {
buildFoo(it)
}
}
val noNestedIO: IO<Sequence<String>> = transformedProgram.flatMap {
it.filterNotNull().k().sequence(IO.applicative()).fix()
}
noNestedIO.unsafeRunSync() // StackOverflowError
kyleg
12/19/2019, 5:35 PMJannis
12/19/2019, 5:58 PMfilterNotNull().k().map { it.unsafeRunSync() }.toList()
inside transformedProgram.flatMap { ... }
(just wrap it in IO { ... }
to get types right. If that does not fail, there is something off with .sequence()
. Just as a quick check to see where the problem actually is. If that does not overflow it's likely sequence
, if it does it's somewhere else...
Sequences behave pretty bad with traverse
in terms of laziness, so it might be there 😕kyleg
12/19/2019, 5:59 PMJannis
12/19/2019, 6:01 PMkyleg
12/19/2019, 6:02 PMkyleg
12/19/2019, 6:04 PMIO<Sequence<IO<CostlyOperation>>>
at the edge, like top.unsafeRunSync().forEach { it.unsafeRunSync() }
Jannis
12/19/2019, 6:04 PMsequence() = traverse(::identity) = foldRight(Applicative.just(emptyList())) { (acc, v) -> v.ap(acc.map { { el -> sequenceOf(el) + it } }) }
. The key point here is that in foldRight
the sequence is rebuilt and the + operator on sequences is not stacksafe. I'll look through the sources to see if that is actually correct, but I think that may be it 😕Jannis
12/19/2019, 6:05 PMtraverse
and sequence
are the right choice, they just don't play super nicely with Sequence
kyleg
12/19/2019, 6:05 PMJannis
12/19/2019, 6:06 PMIO.ap
Jannis
12/19/2019, 6:08 PMsequence
is an alias for traverse(::identity)
which for sequence you can see here: https://github.com/arrow-kt/arrow/blob/master/modules/core/arrow-core-data/src/main/kotlin/arrow/core/SequenceK.kt#L32
It is implemented as a fold which folds over the content, combines it with ap
and puts the sequence back together after combiningJannis
12/19/2019, 6:10 PMSequence.plus
which is not stacksafe, which is a fault on kotlin's side I think. Basically superLongSequence().fold(emptySequence()) { acc, v -> sequenceOf(v) + acc }.first() == Stackoverflow
. At least when I last tested itJannis
12/19/2019, 6:15 PMgenerateSequence(0) { it + 1 }.take(100000).asSequence().fold(emptySequence<Int>()) { acc, v ->
acc + v
}.first()
This sucks and should not happen, btw this is using kotlin in-builts onlykyleg
12/19/2019, 6:21 PMJannis
12/19/2019, 6:22 PMfoldRight
and traverse
are implemented for Sequences
is just not great. There is also no easy way to improve it afaik. So tbh I think going for transformedProgram.flatMap { IO { it.filterNotNull().map { it.unsafeRunSync() } } }
isn't too bad, you just loose all the concurrency guarantees and benefits arrow gives when actually using IO
the intended waykyleg
12/19/2019, 6:23 PMJannis
12/19/2019, 6:25 PMfun Sequence<A>.plus(a: A): Sequence<A>
) the same thing works with fun Sequence<A>.plus(seq: Sequence<A>): Sequence<A>
btw 😄 Fun times. We are using the latter in traverse
which is why I think there might be something else going on. This is all very weird, you are not the only one lost in this 🙈kyleg
12/19/2019, 6:25 PMfun main() {
val program: IO<Sequence<IO<Any>>> = TODO()
program.unsafeRunSync().map { seq -> seq.forEach { it.unsafeRunSync() } }
}
right?
kyleg
12/19/2019, 6:25 PMkyleg
12/19/2019, 6:26 PMJannis
12/19/2019, 6:27 PMunsafe
methods. But this can't really be avoided when working with IO
in sequences because of these issues. If only stackoverflows weren't so cryptickyleg
12/19/2019, 6:29 PMIO<Sequence<IO<Whatever>>>
. So I can unit test the App.execute by injecting mocks. In my main() that calls App(…).execute() I have full control.
Regarding `unsafe`: is it called that because the contents should be considered impure/have side effects, or is it called that because it is not stack-safe inside?Jannis
12/19/2019, 6:31 PMIO<A>
is always pure because it does not perform side-effects, but when calling any of the unsafe
methods that changes.kyleg
12/19/2019, 6:32 PMJannis
12/19/2019, 6:32 PMIO
is stacksafe, functional programs cannot really afford not being stacksafe 😉 We have tests that explicitly check that. However for sequences thats a different story because the issues are likely somewhere elsekyleg
12/19/2019, 6:33 PMassertTrue(expectedIO, actualIO)
could be done in a unit test instead of assertTrue(expectedValue, actualIO.unsafeRunSync())
kyleg
12/19/2019, 6:33 PMraulraja
12/19/2019, 6:37 PMraulraja
12/19/2019, 6:37 PMJannis
12/19/2019, 6:37 PMEq
for IO
is undefined, we have a custom Eq
instance for IO
only for tests which uses unsafeRunTimed
internallykyleg
12/19/2019, 6:48 PMkyleg
12/19/2019, 6:50 PMIO { !effect { repository.getAll() } }
would be equal to IO { !effect { repository.getAll() } }
because they’re the same instruction sets, but that doesn’t mean the unsafeRunSync
results will be the same because then the effects are applied. But thats’ my flawed conception of what laziness actually is 🙂Jannis
12/19/2019, 6:51 PMtraverse
on sequences is really slow. Like the actual call to traverse
not even running the resulting datatype. Not entirely sure what is causing it, but imo that is unacceptable levels of slowness. There is no real alternative, but this just sucks. @kyleg the best solution is to simply not use traverse
on sequences and just straight up run whatever code you need to run. Just wrap the entire thing in IO
so you get some control over your side-effects back 😞kyleg
12/19/2019, 6:51 PMJannis
12/19/2019, 6:57 PMIO { list of instructions }
. In the end IO
is just a powerful wrapper around a function. That is also where the problem is, there is no practial way to implement function equality, especially not without running it and once you run a function from IO
you have side-effects and those are even harder to catch and testJannis
12/20/2019, 2:29 AMlazyAp
is merged you can actually call traverse
on sequences again, it will still be very very slow, but it will run your code. If you don't need the resulting sequence it's usually better to call traverse_
which throws away the sequence and makes it just as fast as traverse
on any other datatype. (it also has a sequence_
version). I'll link the pr later, but earliest this will be in arrow is next snapshots, or 0.10.5 next release.
In short: use `traverse_`/`sequence_` whenever possible with sequences and using traverse
on sequences now does not hang and throw on without ever running any effect, it also only evaluates sequences as far as needed which means it can also work on infinite sequences.
Will link the pr laterJannis
12/20/2019, 2:46 AMJannis
12/21/2019, 10:19 AMraulraja
12/21/2019, 10:25 AM