Hello, As I explained a few months ago in <discus...
# compiler
a
Hello, As I explained a few months ago in discuss I'm doing a MsC thesis about optimizations in Kotlin. I have been analyzing the source code of the official Kotlin compiler project. Unfortunately this was more complex than expected, mainly due to its complexity and lack of documentation regarding optimization aspects.Β  I think I managed to get the general view of how the compiler is structured. I attach the diagram of how I understand it works. Any comment/discussion about it would be really appreciated, either if something is wrong, incomplete, or suggestions to improve it.Β  Note: even though I don't think it is much different, the code I'm analyzing is the 1.4.20 version/tag. Thank you. [edit: forgot to mention I'm interested in the JVM part]
πŸ‘ 2
🀘 10
🀘🏼 1
r
This is great, thanks Abel. I think this is accurate maybe for that version but it’s continuously evolving specifically as they are introducing FIR and several other parts of the compiler. @dmitriy.novozhilov Does Jetbrains or someone keep a live version of something like this somewhere public? This would be a great community resource to have public and alive for people to better understand how the compiler works and evolves over time.
😊 1
βž• 1
d
@raulraja Yeah, I think we will write some documentation about FIR and probably go to some conference or smth. Actually there is couple of speeches made by Simon Ogorodnik (FIR team tech lead), but they all are in russian
πŸ‘ 1
@Abel Backend part of your diagram is correct, but there a mistakes in frontend part. If we told about combinations FE 1.0 + Old JVM backend and FE 1.0 + JVM IR (without FIR), than there is no fork of compiler pipeline after PSI stage. Whole frontend lays in
:compiler:resolution
and
:compiler:frontend
modules and
:compiler:frontend.java
contains only part of resolution which is special for java interop (e.g. scopes for java classes), so they all work together and produces BindingContext (big map which contains information about all program) in pair to PSI. After that old backend takes PSI and BindingContext and generates bytecode over it, and JVM IR backend transforms PSI + BindingContext to IR and after that works with it
πŸ‘ 1
And also compiler doesn't modify PSI at all
y
Good job! This is definitely a clear high level overview of the pipeline. I did something similar earlier last year, though it's more of a reading notes and not very straightforward to read: https://github.com/overminder/kitchen-sink/blob/master/writings/kotlin-compiler/notes/main.pdf
πŸ‘ 3
d
@Yux You made very good investigation. Feel free to ask me any specific question about compiler implementation or kotlin project infrastructure (e.g. which tests better to run to debug and investigate specific parts of the compiler)
❀️ 1
a
Thank you everyone for the responses so far. @dmitriy.novozhilov thanks for the clarifications. If I understood correctly you mean to add the compiler/resolution as a special part of the frontend. Which part of the pipeline is it reponsible for? The code transformations? Maybe the semantic analyzer? You also mention to add a BindingContext (which seems to be like a container of all the current data, as you say) next to the PSI tree, or perhaps as a whole in the frontend? I admit the frontend part was the most difficult to analyze (specially the connection with the old backend) and if the compiler doesn't modify the PSI, where/when are all the transformations made? For example ConstantExpressionEvaluator.kt (in the frontend package) seems to be responsible for the constant expression optimization, which seems to be applied in the old path, but not with the new path where it is applied by FoldConstantLowering (in ir/backend.common). Maybe it is applied directly when generating the bytecode? @Yux thanks! I tried to search for existing documentation, unfortunately I did not find yours. Will definitely analyze it and compare with my own notes.
d
:compiler:resolution
is just one of modules of frontend, alongside with
:compiler:frontend
(there was an idea to keep in different modules logic which depend and not depend on PSI, but we found later that core problem of old frontend not just PSI, but and descriptors also). Check simple diagram I made
if the compiler doesn't modify the PSI, where/when are all the transformations made
Frontend stores all information that it inferred (types, descriptors, diagnostics, constant evaluation, CFA results etc) in
BindingContext
, where (in most cases) PSI is a key to specific map in it which contains, for example, descriptors
πŸ‘ 1
a
I see, so on my diagram the code transformations on the left is applied right after the psi generation, and instead of modifying the psi it attaches a binding context to it. Let me quickly update it to see if it matches yours. I'm still a bit confused as to where the modifications are really performed (like the ConstantExpressionEvaluator). Does this means with the new IR branch modifications/optimizations are made twice (once in the frontend and then on the ir backend)? Or perhaps the IR backend takes only parts of the bindingContext to avoid this duplication?
d
I know only about two constant evaluators: -
ConstantExpressionEvaluator
which is used in FE 1.0, and its results are reused in both backends -
IrInterpreter
-- new mechanism for constant evaluation of simple constant expressions and prototype of evaluating functions calls in compile time. It is used in pair FIR + IR backend right after transformation from FIR to IR (FIR doesn't have analog of
ConstantExpressionEvaluator
by design)
But I'm not very familiar with details of our backends, so it's possible that there is something else mechanism for constant evaluation
a
OK, I'll check with more detail those evaluators. Thanks again
d
Yeah, this diagram is OK. But I suggest to rename
Code transformations
to
Code analysis
in frontend block.
☺️ 1
πŸ‘ 1
a
πŸ‘ 2
😍 2
m
For a reference, links to talks by Simon mentioned by @dmitriy.novozhilov β€’ September 2020

https://youtu.be/S2--aYB2GiE?t=15β–Ύ

β€’ a newer one from december 2020

https://www.youtube.com/watch?v=GEqgkaiBPdAβ–Ύ

πŸ‘ 2
r
Links to the slides since they are hidden in the description. They are useful even if I don’t understand Russian. Thanks @Michal Harakal! https://docs.google.com/presentation/d/e/2PACX-1vTzajwYJfmUi_Nn2nJBULi9bszNmjbO3c8K8dHRnK7vgz3AELunB6J7sfBodC2sKoaKAHibgEt_XjaQ/pub?slide=id.p
πŸ‘ 2
m
Thank you @raulraja
a
@dmitriy.novozhilov Now that I'm revising the diagram I noticed you talked about 'FIR', which supposedly is 'Frontend IR'. I though it was just the name of all the frontend (or the second part) but I'm starting to suspect it is a whole different path, like a third branch on my diagram. Where exactly is the FIR and how can I add it to the diagram? Thanks in advance
d
FIR fully replaces part which you named "Code analysis" It takes raw PSI as input, produces RawFir, which transforms in different stages, filling that tree with semantic information, and then resolved FIR is transformed to backend IR, which is input for backend Check this slide from presentation above: https://docs.google.com/presentation/d/e/2PACX-1vTzajwYJfmUi_Nn2nJBULi9bszNmjbO3c8K8dHRnK7vgz3AELunB6J7sfBodC2sKoaKAHibgEt_XjaQ/pub#slide=id.g955e8c1462_0_221
a
So, it is really a third branch? Or it is that 'Code analysis' box?
d
I'd say it's the second branch of
frontend
box
a
like this?
d
Yep. And there is another "Intermediate code generator" named
fir2ir
a
I see, thanks. I start to understand more that "problem of too many IR's" πŸ˜… I assume the output of the fir2ir converter is the same as the psi2ir, so the backend is the same no matter which fronted is used (and in that screenshot above the arrow should have pointed to the IR tree instead?). I'll update the diagram with this new knowledge and post it here too.
πŸ‘ 1
I'm not sure if I should put the BindingContext, IR and modified FIR trees outside the frontend like the diagrams of those slides, but I'll keep them inside for now. (I also changed the object colors to match the slides). Thank you again for all your help Dmitriy.
πŸ†’ 3
❀️ 7
d
This diagram is correct
πŸ‘ 4
a
How I wish those linked videos were in English πŸ˜“
h
Hey @Abel, is your thesis public available?
v
@andylamax Great change to learn Russian
@Abel @dmitriy.novozhilov @raulraja @Yux @Michal Harakal Thank you