:warning: Day 8 Solution Thread  :warning:
# advent-of-code
a
⚠️ Day 8 Solution Thread  ⚠️
d
At least I got sub 300 for part 1, that makes me happy!
when
to the rescue yet again!!
a
Ikr, when is really nice here!
Copy code
private val input = readInput("input8.txt")

data class Instruction(val name: String, val argument: Int)

fun main() {
    val instructions = input.split("\n").map { it.split(" ") }.map { Instruction(it[0], it[1].toInt()) }
    fun part1(): Int {
        var accumulator = 0
        var currentLine = 0
        val linesRun = mutableListOf<Int>()

        while (currentLine < instructions.size && currentLine !in linesRun) {
            linesRun += currentLine
            val instruction = instructions[currentLine]
            when (instruction.name) {
                "acc" -> {
                    accumulator += instruction.argument
                    currentLine++
                }
                "nop" -> currentLine++
                "jmp" -> currentLine += instruction.argument
            }
        }
        return accumulator
    }
    println("Part 1: ${part1()}")

    fun part2() {
        instructions
            .mapIndexed { index, instruction -> index to instruction }
            .filter { it.second.name in listOf("jmp", "nop") }
            .map { it.first }.forEach { a ->
                var accumulator = 0
                val testInstructions = instructions.toMutableList().apply {
                    val oldInstruction = this[a]
                    this[a] = oldInstruction.copy(name = if (oldInstruction.name == "jmp") "nop" else "jmp")
                }

                var currentLine = 0
                val linesRun = mutableListOf<Int>()
                while (currentLine < testInstructions.size && currentLine !in linesRun) {
                    val instruction = testInstructions[currentLine]
                    linesRun += currentLine
                    when (instruction.name) {
                        "acc" -> {
                            accumulator += instruction.argument
                            currentLine++
                        }
                        "nop" -> currentLine++
                        "jmp" -> currentLine += instruction.argument
                    }
                }
                if (currentLine == testInstructions.size) {
                    println("Part 2: $accumulator")
                    return@forEach
                }
            }
    }

    part2()

}
Did the naive solution of going through every jmp/nop, switching it, and trying. Still ran in under a second
d
I took a gamble and only did nop -> jmp, but that had no solution so I had to change it to jmp -> nop. Sigh
😥 1
a
Unlucky 😞
d
ikr. 😞 wait - do you have an inner
input
shadowing your outer
input
? lol -- naming is hard, naming fast is super hard
n
a
Yep @David Whittaker, I do 😂 I forgot that I named the actual file input “input”. I’m going to clean this one up though
e
My solution . Kept wondering if there is a more optimal way to do part 2 instead of replacing every `jmp`/`nop` with the opposite, checking for loops, trying the next instruction, etc. I suppose you could backtrack from the point where it loops back, but part 2 runs in under 2ms (avg. of 20k runs), so... good enough!
j
Yeah, brute forcing is more than fast enough. I considered launching simulators in parallel coroutines, but it probably wouldn't solve much in this case.
Copy code
| Platform         | Average (ms) | Measurements (ms) |
| -----------------| ------------:|------------------:|
| GraalVM          | 8.7±9.5      | `47, 15, 16, 7, 5, 5, 3, 4, 5, 6, 5, 5, 7, 5, 5, 5, 5, 6, 6, 4` |
| Node JS          | 23.3±14.1    | `81, 37, 20, 22, 24, 17, 20, 17, 18, 23, 15, 18, 17, 18, 15, 16, 16, 20, 19, 23` |
| Native           | 124±17       | `106, 134, 141, 112, 136, 105, 139, 106, 134, 107, 153, 108, 128, 135, 117, 119, 154, 90, 118, 124` |
separate module in case of reuse... not sure if we should expect that or not
n
Yeah I didn't see an obviously better way than brute force, would be interested to see
e
@Nir I think one could backtrack from the point where a loop "connects" and change, for example, the previous command, test it, if it's broken, change the one before that and so on. Basically, when you detect a loop, run the program in reverse, changing the instructions and trying again from that point.
n
I thought about that but it's a bit vague, you don't really know which instruction in the loop sequence to change
So at best it seems like more guided brute force
e
All of them, one at a time. Much like brute-forcing from the top. Except a bit different.
Like:
Copy code
detect loop
pointer--
replace instruction if necessary // decrement accumulator it's an acc
run code
if detects loop, put the original instruction back, pointer--, try again
else hooray!
Of course, you'd need to run the altered instructions in a different "computer". Or save which instruction you changed and what address you were at.
Though, I think the worst-case time complexity is still essentially the same.
n
There isn't really anyway to run it backwards, you'd need to keep all the state
So it's complicating the data structures and such considerably for maybe a better heuristic... Might make the worst case worse
e
Yeah. Alternatively, you could let it loop once more to see which commands are executed in the loop and then try those one by one going forwards. But then I guess it is possible that a
jmp
will jump you into a loop that you otherwise wouldn't have got to.
t
I brute forced part 2. Basically I generate new instruction sets one at a time and try them out until one terminates successfully rather than fails. Meh. I did write a data class for the instructions ("instruction, flip thine self!") and the computer ("run until you stop and tell me how you stopped"). From there it was just mapping data. I'll probably revise it before I post it, but work calls for now!
e
I implemented it in Haskell (was easier there), will port to Kotlin later - doing a graph search for part 2 is way faster than brute force
t
My brute force:
Copy code
fun solvePart2(): Int =
        instructions
            .indices
            .asSequence()
            .mapNotNull { index -> instructions.flipIndexOrNull(index) }
            .mapNotNull { inst ->
                Computer(inst).run {
                    if (runUntilTerminate() == Computer.ExecutionState.TERMINATED) accumulator
                    else null
                }
            }.first()
n
@Edgars I was actually wrong, it's almost certainly possible to do it in linear time, just the approach we were discussing isn't the way
Made another thread for it
e
I saw. Looks like way too much effort. Unless you want to run all AoC solutions in 9ms or something.
n
Yeah I took a stab at getting it to work, and my approach was wrong.
Trying to give up on it but the nagging voice in my head won't leave me alone, haha
e
Don't forget, all work and no play makes Jack a dull boy.
e
n
eh I'm more interested in chasing the algorithm than raw perf, in kotlin
reasoning about constants in performance in languages like kotlin, python, etc just hurts my brain 🙂
rewrite it in C++ (or Rust) is the best improvement
e
I'm writing and running equivalent benchmarks in 4 languages, so I think I've got a decent handle on the constant factors.
n
what ratios of constant factors are you typically seeing between these 4 languages
e
obviously, it'll depend on the task. for AoC thus far, it's around Haskell: ~2-8x slower than Kotlin/JVM (much worse when handling strings, due to the non-packed default representation; I know how to fix that but I'm lazy). Python: ~3-10x slower than JVM (Python 3.9 is faster than whatever version I was using last year). Rust: same-2x faster than JVM
n
interesting
b
I have some ideas for a bit less brute-forcy approach (kind of backtracking with keeping states when jumping or noping)
But my solution takes 42ms (including loading the file)
so meh
b
Day8Bench.naive sample 6372 0.784 ± 0.004 ms/op Day8Bench.naive:naive·p0.00 sample 0.685 ms/op Day8Bench.naive:naive·p0.50 sample 0.766 ms/op Day8Bench.naive:naive·p0.90 sample 0.825 ms/op Day8Bench.naive:naive·p0.95 sample 0.911 ms/op Day8Bench.naive:naive·p0.99 sample 1.332 ms/op Day8Bench.naive:naive·p0.999 sample 1.606 ms/op Day8Bench.naive:naive·p0.9999 sample 2.544 ms/op Day8Bench.naive:naive·p1.00 sample 2.544 ms/op
that's much faster that what I thought
(this include loading the file)
n
I wonder if JVM based programs have a bit of an unfair advantage in a microbenchmark situation like this
the JVM will be running continuously as the the benchmark loops, probably never returning the memory used in the solution to the OS
b
well in real life you don't leave your JVM after each operation anyway
except if you do some kind of serverless stuff
n
Sure, but in real life non-naive C++ code would never allocate arrays in a hot path, for example
b
I can try by asking the JVM to do a GC after
n
it just makes me wonder what the best way to really compare apples to apples is
b
I mean I can check how long it takes to run it
Benchmark Mode Cnt Score Error Units Day8Bench.naive sample 749 6.732 ± 0.312 ms/op Day8Bench.naive:naive·p0.00 sample 4.801 ms/op Day8Bench.naive:naive·p0.50 sample 5.603 ms/op Day8Bench.naive:naive·p0.90 sample 11.731 ms/op Day8Bench.naive:naive·p0.95 sample 12.362 ms/op Day8Bench.naive:naive·p0.99 sample 15.827 ms/op Day8Bench.naive:naive·p0.999 sample 21.856 ms/op Day8Bench.naive:naive·p0.9999 sample 21.856 ms/op Day8Bench.naive:naive·p1.00 sample 21.856 ms/op
that's with a GC cycle at each run
(this is part1 and part2 together)
e
Makes me wonder if I should gc after each benchmark as well then. Otherwise I get, like, 20k runs and 2ms for the whole thing.
b
well 6ms for the whole thing is not bad either 😄
and I didn't tweak the GC
also I'm storing a lot of useless data
this is absolutely not an optimized solution
e
Just wondering if running the same thing over and over again for a minute is a reasonable benchmark. I used to do just 5 runs, I bet I'd get, like, 20ms then. 😄
b
that's why you use jmh… to try to account for that
it did run it 749 times here for example
I could ask it to do more if I want
also the machine is pretty busy
e
TIL. Will check that out.
b
Copy code
% time ./gradlew run                                                                                                                                                                       
> Task :run
(1859, 1235)

BUILD SUCCESSFUL in 528ms
2 actionable tasks: 1 executed, 1 up-to-date
./gradlew run  0.84s user 0.05s system 98% cpu 0.902 total
that includes… gradle
Copy code
time ./bin/advent-of-code                                                                                                                                                             
(1859, 1235)
./bin/advent-of-code  0.17s user 0.04s system 127% cpu 0.170 total
After clearing all the caches (on Linux, dropping pages type 1 2 and 3 ) :
Copy code
time ./bin/advent-of-code                                                                                                                                                             
(1859, 1235)
./bin/advent-of-code  0.15s user 0.03s system 105% cpu 0.179 total
so yeah there is some JVM start time overhead, but really that's not as bad as people say
export JAVA_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xmx5m -Xlog:gc" ; time ./bin/advent-of-code
hah that even work with 5m
and it is not that slow either
I'm trying with aotc
that may help even more, already I win 20ms by using -Xshare:dump and -Xshare:on
won 20ms more with AOTC
n
yeah these startup times are really not bad
makes me think that command line utilities in kotlin (even JVM) could be a decent choice
b
yes
they absolutely are
and these are full run times 😄
n
it's funny because if you exclude the JVM there isn't really any other obvious choices for command line utilities in linux. python is super slow, Go is super.... well, Go.
And C/C++/Rust are overkill 99% of the time
b
I like rust
but I found that I write Kotlin so much faster than Rust
and that it is fast enough for 99.9% of my problems
jaotc is heavy :D
Copy code
export JAVA_OPTS="-Xshare:on -XX:SharedArchiveFile=classes.jsa  -XX:+UnlockExperimentalVMOptions -XX:AOTLibrary=./java_base.so " ; time ./bin/advent-of-code                          
Helloworld
./bin/advent-of-code  0.03s user 0.02s system 106% cpu 0.049 total
just displaying hello world
so there is quite an overhead still
Copy code
/tmp/o  0.00s user 0.00s system 77% cpu 0.002 total
a C hello world
Copy code
python /tmp/o.py  0.02s user 0.01s system 96% cpu 0.021 total
python is faster than I thought to start
n
yeah not having GC is a big productivity penalty
b
(I'm doing 10 runs, not clearing caches, and showing lowest result)
I disabled the GC in Java to see if it helped
doesn't change anything in term of speed
(using the Epsilon GC)
n
said as someone who's been writing C++ professionally for close to a decade. I'd never opt for a no-GC language in an application where I could tolerate it.
which is practically all of them
b
so it is not the GC thas slows that program
I think it has to do with class loading
I can reach 70ms on the command line
with Kotlin and AOT
142ms fresh start
that's still pretty bad
40ms for the hello world, that's pretty close to python 😄
it is heavy on processing though…
that would work for a command, but painful for a script
n
I finally fully understood an O(N) solution
was not easy for me
e
Copy code
$ cc -nostdlib -ohello hello.S
$ time ./hello
Hello, world!
0.00user 0.00system 0:00.00elapsed 66%CPU (0avgtext+0avgdata 344maxresident)k
0inputs+0outputs (0major+27minor)pagefaults 0swaps
b
With caches removed Python 56 ms Kotlin(AOTC and all) 80ms
also I'm using perf now to get real stats
That's the stats of today's parts1 and 2:
Copy code
114.79 msec task-clock                #    1.517 CPUs utilized            ( +-  2.38% )
               430      context-switches          #    0.004 M/sec                    ( +-  0.73% )
                12      cpu-migrations            #    0.104 K/sec                    ( +-  3.51% )
             6,018      page-faults               #    0.052 M/sec                    ( +-  0.29% )
       457,879,005      cycles                    #    3.989 GHz                      ( +-  0.51% )
       499,379,616      instructions              #    1.09  insn per cycle           ( +-  0.17% )
        96,479,627      branches                  #  840.454 M/sec                    ( +-  0.17% )
         4,003,302      branch-misses             #    4.15% of all branches          ( +-  0.24% )

           0.07567 +- 0.00199 seconds time elapsed  ( +-  2.63% )
the JVM is FAST