[thread] Kotlin IR Documentation (Stack Code &...
# compiler
v
[thread] Kotlin IR Documentation (Stack Code & Tuples)
💪🏾 1
💪 2
💪🏼 1
@bnorm @SerVB @Piotr Krzemiński @Ryan Kay @dmitriy.novozhilov When I started to study Kotlin compiler, I was trying to find a file/documentation/table/etc with a consolidated intermediate representation
I am strongly studying @Piotr Krzemiński Python compiler and he mentioned that someone has already mentioned an initiative about writing the tuple code. May anyone confirm it? If there is no such document, I am willing to write it because it would really simplify my effort here 😉
p
I recall some effort of documenting or/and the IR or the compiler architecture...
v
TL; DR: I am evaluating how feasible it would be to produce CIL code to .NET CLR before chatting with my advisor based on current Kotlin JVM compiler
p
v
@Piotr Krzemiński thx. I am also tracking #16707
👍 1
@Piotr Krzemiński I commented at both tickets and added some references. I am trying to do a Mini C# proof of concept (like Mini Java from Modern Compiler Implementation in Java book). So I could check with my folks at the university if a junior researcher or a graduate student would be interested in participate at the project.
Maybe @kevinmost, @jvg or @svtk may give an insight about it
j
Kotlin .NET sounds really cool, but it also looks like an ginormous task, mapping Kotlin's generics to .NET's generics would be a master/phd project on its own, then there is .NET value types, very different enums, Com interop, PInvoke, unsafe code etc.. to consider, but perhaps a mini C# is feasible as long as you are willing to scope it down significantly
m
From my general research it seems like, besides generics and (nice) C# interop, it should not be that hard to roughly support .net target, i.e. produce code that runs on .NET. JVM and CLI instruction set is so similar I would maybe go to say it is sensible to copy-paste jvm backend and start from there. Features like Com interop, PInvoke, unsafe are mostly tied to backend, for frontend annotations and stdlib should suffice, and therefore can be quite easy. For value types, assuming only immutable ones are supported, one should extend kotlin value classes to be capable of having more than 1 property (which is going to be done anyway at sometime) and handle rest at backend too. The hard part is codegen - there are many libraries dealing with JVM code and metadata for JVM (such as
objectweb.asm
) but symmetrically there are only .NET once for .NET. So one would either have to find some way to utilize them (e.g. parts of Roslyn compiler, which is also written in C#)) from kotlin code for rewrite everything.
j
Ah yes, a classic compiler bootstrapping problem 🙂 A couple thoughts on potential routes to take: • implement/port a library like that (Cecil comes to mind) to run on JVM, • serialize the Kotlin IR to a portable format (e.g. protos/json/yaml) and then deserialize it in a piece of .NET code that can call those libraries • JNI to the native .NET Metadata API
👍 1
for the second option you'd have to do some awful hacks to represent the .NET API as JVM types so the frontend can do type resolution (e.g. generating stub jars)
v
@jvg that's why I am in contact with a previous PhD advisor of mine. Originally I started my grad research in parallel compiler optimizations for multi-core CPUs. If the project is big enough, it could be possible get support from the university
@jvg I am trying to identify the step immediately before the
objectweb.asm
generation, so I can discover how KIR (Kotlin Intermediate Representation) becomes not only JVM but also JS bytecode
m
@ventura I have worked recently in this area of the kotlin compiler so I may quickly guide you through: There is generally a strong separation between frontend (laxer, parser, AST, semantics, resolution, analysis) and backend (IR and codegen) in the kotlin design. Based on the your goals I assume you're only interested in the later. So the flow of the compiler backend is: • Backend IR is created from the frontend structures (which are also kind of IR, depending on how you define it, but are in the process of being replaced by a new tree-based one named FIR) - this happens in `compiler/ir/ir.psi2ir`module. • This (backend) IR is ownership-wise a tree structure, but really a graph, where references to non-child elements happen indirectly through `IrSymbol.`Most of the node classes are in
compiler/ir/ir.tree
. Its root is a
IrModuleFragment
node for each compiler module. • An ordered list of phases are executed (known as lowering), each of which transforms the IR tree in some way, so that at the end it looks as close to the to-be-emitted code as possible, while also remaining its structure. The list of phases differs for each backend, e.g. JVM once are defined in
compiler/ir/backend.jvm/lower/src/org/jetbrains/kotlin/backend/jvm/JvmLower.kt
. Many common bits of these phases are shared between backends (
compiler/ir/backend.common/src/org/jetbrains/kotlin/backend/common/lower
) though. • The lowered IR tree is then converted into backend-specific representation and transformed a bit further. E.g. for JS this is roughly an JS AST. AFAIK this part for JVM is the only place where you can find something like what you mention as a tuple representation. • Based on that representation a target code is emitted, e.g. using `objectweb.asm`for JVM. So we may say there are actually at least 4 apparently distinctive internal representations (AST, frontend representation, backend IR, codegen specific) in the whole compiler pipeline. In the source code though, the term IR refers most commonly to the 3-th one.
❤️ 3
v
@mcpiroman lowering, instruction selection, and instruction optimizations come for free in LLVM. So, except for the fact of being written in C++, I wonder why the Kotlin Valars chose to write from scratch in Java/Kotlin. I don't know if even JavaCC was used in the first language versions 🤔
It would be possible to convert JVM code to CLR using IKVM, but I would like to follow standard backend flow
@mcpiroman I think I just figured out why I was getting confused. I was searching for something like Instruction.kt (which is in the frontend) inside the backend, instead of IrSymbol.kt.
IrElementVisitor
⬅️
IrElementTransformer
⬅️
IrModuleFragment
Thank you @Abel@Yux@dmitriy.novozhilov @mcpiroman @jvg @svtk for the help so far
kotlin-k2-compiler.png
@jvg This is a screenshot from the

Kotlin K2 Compiler seminar

(thank you @svtk ❤️)
According to @mcpiroman, there are (...) 4 apparently distinctive internal representations: 1. AST (Abstract Syntax Tree) 2. FIR (Frontend Intermediate Representation) 3. IR (Intermediate Representation) 4. MIR (modified IR Codegen Specific) After K2 release, I thought that LLVM (i.e. Kotlin Native), JS, and JVM shared the same _"backend IR_" (i,.e,
backend.common
).
m
I thought that LLVM (i.e. Kotlin Native), JS, and JVM shared the same _"backend IR_" (i,.e,
backend.common
).
For the most part they do. Only near the end they are converted into backend specific one. Then there are some more operations applied on them, but less so than in IR.