This guide is intended to be a practical introduction to how to design your language and implement a modern compiler for it. The compiler's source code is available on GitHub.
When designing a language it helps if there is an idea of what the language will be used for. Is it intended to be making systems programming safer like Rust? Is it targeting AI developers like Mojo?
In this case, the goal of the language is to showcase various algorithms and techniques that are used in the implementation of some of the most popular languages like C++, Kotlin, or Rust.
The guide also covers how to create a platform-specific executable with the help of the LLVM compiler infrastructure, which all of the previously mentioned languages use for the same purpose. Yes, even Kotlin can be compiled to a native executable with the introduction of Kotlin/Native.
When creating a new language, the first question is how to get started. There is something that every existing language and your language must define too, which is the entry point from which the execution begins.
In scripting languages like JavaScript, the execution of the
code usually starts from the first line of the source file,
while most programming languages including
your language treat the main()
function
as their entry point.
fn main(): void {}
When designing the syntax of the
main()
function one key goal was to make it
easily recognizable to developers with a background in an
already popular language.
In the past 50 years, the syntax of a function declaration
was the name of the function followed by the list of
arguments enclosed by (
and )
. At
first glance, it is tempting to introduce some new exotic
syntax like main<> {}
, but in many popular
languages <>
might mean something completely
different, in this case, a generic argument list. Using such
syntax for a function definition would probably confuse
developers who are trying to get familiar with this new
language, which is something to keep in mind.
Indeed, so far the main()
function is just a
few words of text stored in a file. A compiler usually
consists of 3 major pieces. A frontend
, an
optimizer
and a backend
.
The frontend
contains the actual implementation
of the language, it is responsible for ensuring that the
program written in the specific language doesn't contain any
errors and reporting every issue it finds to the developer.
After validating the program, it turns it into an
intermediate representation (IR)
on which the
optimizer
performs a series of transformations
that will result in a more efficient program.
After the program has been optimized, it is passed to the
backend
, which turns it into a series of
instructions, which can be executed by a specific target.
The steps the backend
performs can vary based
on the target. Register-based targets like x86
,
ARM
or RISC-V
assembly require
different steps than stack-based targets like
WebAssembly
or JVM Bytecode
.
Yes, with enough time. However, there is no need to learn
all of them to create a successful language. In fact, even a
lot of modern popular languages like C++
,
Rust
, Swift
,
Haskell
or Kotlin/Native
rely on
LLVM
for optimization and code generation.
This guide also uses LLVM
to create an
executable and focuses on implementing the frontend, which
consists of 3 parts, the lexer
, the
parser
and the semantic analyzer
.