Hello As I explained a few months ago in <https discuss kotl kotlinlang #compiler

Hello, As I explained a few months ago in <discus...

Abel

03/10/2021, 4:45 PM

Hello, As I explained a few months ago in discuss I'm doing a MsC thesis about optimizations in Kotlin. I have been analyzing the source code of the official Kotlin compiler project. Unfortunately this was more complex than expected, mainly due to its complexity and lack of documentation regarding optimization aspects. I think I managed to get the general view of how the compiler is structured. I attach the diagram of how I understand it works. Any comment/discussion about it would be really appreciated, either if something is wrong, incomplete, or suggestions to improve it. Note: even though I don't think it is much different, the code I'm analyzing is the 1.4.20 version/tag. Thank you. [edit: forgot to mention I'm interested in the JVM part]

👍 2

🤘 10

🤘🏼 1

raulraja

03/10/2021, 5:05 PM

This is great, thanks Abel. I think this is accurate maybe for that version but it’s continuously evolving specifically as they are introducing FIR and several other parts of the compiler. @dmitriy.novozhilov Does Jetbrains or someone keep a live version of something like this somewhere public? This would be a great community resource to have public and alive for people to better understand how the compiler works and evolves over time.

😊 1

➕ 1

dmitriy.novozhilov

03/10/2021, 7:13 PM

@raulraja Yeah, I think we will write some documentation about FIR and probably go to some conference or smth. Actually there is couple of speeches made by Simon Ogorodnik (FIR team tech lead), but they all are in russian

👍 1

dmitriy.novozhilov

03/10/2021, 7:17 PM

@Abel Backend part of your diagram is correct, but there a mistakes in frontend part. If we told about combinations FE 1.0 + Old JVM backend and FE 1.0 + JVM IR (without FIR), than there is no fork of compiler pipeline after PSI stage. Whole frontend lays in

:compiler:resolution

and

:compiler:frontend

modules and

:compiler:frontend.java

contains only part of resolution which is special for java interop (e.g. scopes for java classes), so they all work together and produces BindingContext (big map which contains information about all program) in pair to PSI. After that old backend takes PSI and BindingContext and generates bytecode over it, and JVM IR backend transforms PSI + BindingContext to IR and after that works with it

👍 1

dmitriy.novozhilov

03/10/2021, 7:18 PM

And also compiler doesn't modify PSI at all

Yux

03/10/2021, 7:20 PM

Good job! This is definitely a clear high level overview of the pipeline. I did something similar earlier last year, though it's more of a reading notes and not very straightforward to read: https://github.com/overminder/kitchen-sink/blob/master/writings/kotlin-compiler/notes/main.pdf

👍 3

dmitriy.novozhilov

03/10/2021, 9:06 PM

@Yux You made very good investigation. Feel free to ask me any specific question about compiler implementation or kotlin project infrastructure (e.g. which tests better to run to debug and investigate specific parts of the compiler)

❤️ 1

Abel

03/10/2021, 9:30 PM

Thank you everyone for the responses so far. @dmitriy.novozhilov thanks for the clarifications. If I understood correctly you mean to add the compiler/resolution as a special part of the frontend. Which part of the pipeline is it reponsible for? The code transformations? Maybe the semantic analyzer? You also mention to add a BindingContext (which seems to be like a container of all the current data, as you say) next to the PSI tree, or perhaps as a whole in the frontend? I admit the frontend part was the most difficult to analyze (specially the connection with the old backend) and if the compiler doesn't modify the PSI, where/when are all the transformations made? For example ConstantExpressionEvaluator.kt (in the frontend package) seems to be responsible for the constant expression optimization, which seems to be applied in the old path, but not with the new path where it is applied by FoldConstantLowering (in ir/backend.common). Maybe it is applied directly when generating the bytecode? @Yux thanks! I tried to search for existing documentation, unfortunately I did not find yours. Will definitely analyze it and compare with my own notes.

dmitriy.novozhilov

03/11/2021, 8:57 AM

:compiler:resolution

is just one of modules of frontend, alongside with

:compiler:frontend

(there was an idea to keep in different modules logic which depend and not depend on PSI, but we found later that core problem of old frontend not just PSI, but and descriptors also). Check simple diagram I made

if the compiler doesn't modify the PSI, where/when are all the transformations made

Frontend stores all information that it inferred (types, descriptors, diagnostics, constant evaluation, CFA results etc) in

BindingContext

, where (in most cases) PSI is a key to specific map in it which contains, for example, descriptors

👍 1

Abel

03/11/2021, 10:10 AM

I see, so on my diagram the code transformations on the left is applied right after the psi generation, and instead of modifying the psi it attaches a binding context to it. Let me quickly update it to see if it matches yours. I'm still a bit confused as to where the modifications are really performed (like the ConstantExpressionEvaluator). Does this means with the new IR branch modifications/optimizations are made twice (once in the frontend and then on the ir backend)? Or perhaps the IR backend takes only parts of the bindingContext to avoid this duplication?

dmitriy.novozhilov

03/11/2021, 10:15 AM

I know only about two constant evaluators: -

ConstantExpressionEvaluator

which is used in FE 1.0, and its results are reused in both backends -

IrInterpreter

-- new mechanism for constant evaluation of simple constant expressions and prototype of evaluating functions calls in compile time. It is used in pair FIR + IR backend right after transformation from FIR to IR (FIR doesn't have analog of

ConstantExpressionEvaluator

by design)

dmitriy.novozhilov

03/11/2021, 10:15 AM

But I'm not very familiar with details of our backends, so it's possible that there is something else mechanism for constant evaluation

Abel

03/11/2021, 10:17 AM

OK, I'll check with more detail those evaluators. Thanks again

Abel

03/11/2021, 10:45 AM

dmitriy.novozhilov

03/11/2021, 10:47 AM

Yeah, this diagram is OK. But I suggest to rename

Code transformations

Code analysis

in frontend block.

☺️ 1

👍 1

Abel

03/11/2021, 10:48 AM

👍 2

😍 2

Michal Harakal

03/11/2021, 11:53 AM

For a reference, links to talks by Simon mentioned by @dmitriy.novozhilov • September 2020

https://youtu.be/S2--aYB2GiE?t=15▾

• a newer one from december 2020

https://www.youtube.com/watch?v=GEqgkaiBPdA▾

👍 2

raulraja

03/11/2021, 12:24 PM

Links to the slides since they are hidden in the description. They are useful even if I don’t understand Russian. Thanks @Michal Harakal! https://docs.google.com/presentation/d/e/2PACX-1vTzajwYJfmUi_Nn2nJBULi9bszNmjbO3c8K8dHRnK7vgz3AELunB6J7sfBodC2sKoaKAHibgEt_XjaQ/pub?slide=id.p

👍 2

Michal Harakal

03/11/2021, 12:26 PM

Thank you @raulraja

Abel

04/08/2021, 4:53 PM

@dmitriy.novozhilov Now that I'm revising the diagram I noticed you talked about 'FIR', which supposedly is 'Frontend IR'. I though it was just the name of all the frontend (or the second part) but I'm starting to suspect it is a whole different path, like a third branch on my diagram. Where exactly is the FIR and how can I add it to the diagram? Thanks in advance

dmitriy.novozhilov

04/08/2021, 4:57 PM

FIR fully replaces part which you named "Code analysis" It takes raw PSI as input, produces RawFir, which transforms in different stages, filling that tree with semantic information, and then resolved FIR is transformed to backend IR, which is input for backend Check this slide from presentation above: https://docs.google.com/presentation/d/e/2PACX-1vTzajwYJfmUi_Nn2nJBULi9bszNmjbO3c8K8dHRnK7vgz3AELunB6J7sfBodC2sKoaKAHibgEt_XjaQ/pub#slide=id.g955e8c1462_0_221

Abel

04/08/2021, 5:10 PM

So, it is really a third branch? Or it is that 'Code analysis' box?

dmitriy.novozhilov

04/08/2021, 5:31 PM

I'd say it's the second branch of

frontend

box

Abel

04/08/2021, 5:34 PM

like this?

dmitriy.novozhilov

04/08/2021, 7:48 PM

Yep. And there is another "Intermediate code generator" named

fir2ir

Abel

04/09/2021, 2:47 PM

I see, thanks. I start to understand more that "problem of too many IR's" 😅 I assume the output of the fir2ir converter is the same as the psi2ir, so the backend is the same no matter which fronted is used (and in that screenshot above the arrow should have pointed to the IR tree instead?). I'll update the diagram with this new knowledge and post it here too.

👍 1

Abel

04/09/2021, 4:00 PM

I'm not sure if I should put the BindingContext, IR and modified FIR trees outside the frontend like the diagrams of those slides, but I'll keep them inside for now. (I also changed the object colors to match the slides). Thank you again for all your help Dmitriy.

🆒 3

❤️ 7

dmitriy.novozhilov

04/09/2021, 4:14 PM

This diagram is correct

👍 4

andylamax

12/30/2021, 1:21 AM

How I wish those linked videos were in English 😓

hfhbd

06/20/2022, 6:54 PM

Hey @Abel, is your thesis public available?

ventura

01/28/2023, 10:23 AM

@andylamax Great change to learn Russian

ventura

01/28/2023, 10:25 AM

[thread] *Kotlin IR Documentation (Stack Codes & Tuples)*

ventura

01/28/2023, 10:26 AM

@Abel @dmitriy.novozhilov @raulraja @Yux @Michal Harakal Thank you

19 Views

Open in Slack

Previous Next