How to Compile Your Language

This guide is intended to be a practical introduction to how to design your language and implement a modern compiler for it. The compiler's source code is available on GitHub.

When designing a language it helps if there is an idea of what the language will be used for. Is it intended to be making systems programming safer like Rust? Is it targeting AI developers like Mojo?

In this case, the goal of the language is to showcase various algorithms and techniques that are used in the implementation of some of the most popular languages like C++, Kotlin, or Rust.

The guide also covers how to create a platform-specific executable with the help of the LLVM compiler infrastructure, which all of the previously mentioned languages use for the same purpose. Yes, even Kotlin can be compiled to a native executable with the introduction of Kotlin/Native.

What Does Every Language Have in Common?

When creating a new language, the first question is how to get started. There is something that every existing language and your language must define too, which is the entry point from which the execution begins.

In scripting languages like JavaScript, the execution of the code usually starts from the first line of the source file, while most programming languages including your language treat the main() function as their entry point.

fn main(): void {}

When designing the syntax of the main() function one key goal was to make it easily recognizable to developers with a background in an already popular language.

In the past 50 years, the syntax of a function declaration was the name of the function followed by the list of arguments enclosed by ( and ). At first glance, it is tempting to introduce some new exotic syntax like main<> {}, but in many popular languages <> might mean something completely different, in this case, a generic argument list. Using such syntax for a function definition would probably confuse developers who are trying to get familiar with this new language, which is something to keep in mind.

How Is This Text Turned into an Executable?

Indeed, so far the main() function is just a few words of text stored in a file. A compiler usually consists of 3 major pieces. A frontend, an optimizer and a backend.

The frontend contains the actual implementation of the language, it is responsible for ensuring that the program written in the specific language doesn't contain any errors and reporting every issue it finds to the developer.

After validating the program, it turns it into an intermediate representation (IR) on which the optimizer performs a series of transformations that will result in a more efficient program.

After the program has been optimized, it is passed to the backend, which turns it into a series of instructions, which can be executed by a specific target. The steps the backend performs can vary based on the target. Register-based targets like x86, ARM or RISC-V assembly require different steps than stack-based targets like WebAssembly or JVM Bytecode.

Is It Possible to Learn All These Topics?

Yes, with enough time. However, there is no need to learn all of them to create a successful language. In fact, even a lot of modern popular languages like C++, Rust, Swift, Haskell or Kotlin/Native rely on LLVM for optimization and code generation.

This guide also uses LLVM to create an executable and focuses on implementing the frontend, which consists of 3 parts, the lexer, the parser and the semantic analyzer.