Compile Time Known Values

The result of some expressions can be evaluated during compilation. This allows the compiler to reason better about the source code and to generate more efficient code.

fn returnNumber(): number {
    return 2 + 3 * 4;
}

In the above function, it can be calculated that the returned value is 14 during compilation, so instead of emitting the addition and the multiplication to the IR, it is enough to emit only a single constant.

A Tree-walk Interpreter

There are multiple solutions for constant expression evaluation, but one of the most widespread is the tree-walk interpreter. It is built on the observation that the post-order evaluation of the AST yields the result of the expression.

     ┌───┐                 
  ┌──│ + │──┐              Post-order traversal:
  │  └───┘  │              
┌───┐     ┌───┐            1.) 1
│ 1 │  ┌──│ * │──┐         2.) 3
└───┘  │  └───┘  │         3.) 4
     ┌───┐      ┌───┐      4.) *
     │ 3 │      │ 4 │      5.) +
     └───┘      └───┘

As a result, this interpreter traverses the AST nodes in post order and evaluates what it sees.

Inside the compiler, it can be invoked on resolved expressions on demand. Since valid expressions can only evaluate to numbers, the interpreter returns an std::optional<double>, which holds the result if it managed to evaluate the expression.

For the evaluation of some operators, it matters if side effects are allowed or not, so the interpreter also provides an option to change how side effects are treated.

class ConstantExpressionEvaluator {
public:
  std::optional<double> evaluate(const ResolvedExpr &expr,
                                 bool allowSideEffects);
};

The simplest expression to evaluate is the number literal, which simply evaluates to its already known value.

std::optional<double>
ConstantExpressionEvaluator::evaluate(const ResolvedExpr &expr,
                                      bool allowSideEffects) {
  if (const auto *numberLiteral =
          dynamic_cast<const ResolvedNumberLiteral *>(&expr))
    return numberLiteral->value;

  ...

  return std::nullopt;
}

The grouping expression is also easy to calculate, as it evaluates to the value of the expression it contains.

std::optional<double>
ConstantExpressionEvaluator::evaluate(const ResolvedExpr &expr,
                                      bool allowSideEffects) {
  ...

  if (const auto *groupingExpr =
          dynamic_castl<const ResolvedGroupingExpr *>(&expr))
    return evaluate(*groupingExpr->expr, allowSideEffects);

  ...
}

For unary operators, every different operator needs to be handled separately, so a dedicated method is created for that purpose.

std::optional<double>
ConstantExpressionEvaluator::evaluate(const ResolvedExpr &expr,
                                      bool allowSideEffects) {
  ...

  if (const auto *unaryOperator =
          dynamic_cast<const ResolvedUnaryOperator *>(&expr))
    return evaluateUnaryOperator(*unaryOperator, allowSideEffects);

  ...
}

First, the operand is evaluated, and if it is a known value, either its numeric or logical value is negated depending on the operator.

std::optional<double> ConstantExpressionEvaluator::evaluateUnaryOperator(
    const ResolvedUnaryOperator &unop, bool allowSideEffects) {
  std::optional<double> operand = evaluate(*unop.operand, allowSideEffects);
  if (!operand)
    return std::nullopt;

  if (unop.op == TokenKind::Excl)
    return !*toBool(operand);

  if (unop.op == TokenKind::Minus)
    return -*operand;

  llvm_unreachable("unexpected unary operator");
}

The toBool() helper converts a numeric value to a logical value. If the value is 0, it is considered false, otherwise the value is considered true.

std::optional<bool> toBool(std::optional<double> d) {
  if (!d)
    return std::nullopt;

  return d != 0.0;
}

Handling Side Effects

For each binary operator the LHS is evaluated first, but lazily evaluated operators like && and || need special handling before the RHS can be evaluated too.

While running the program side effects matter, so in case of return sideEffect() || 1; even though it's known at compile time that the condition is always true, replacing the statement with return 1; would eliminate the side effect.

In other cases such as if sideEffect() || 1 { ... } the compiler might want to reason about whether the condition is always true or not. In these cases, the side effect doesn't matter as the condition is not going to be replaced with a different expression.

If the LHS of || is not known in compile time and side effects are not allowed, the result of the operator is unknown. If the LHS is true in compile time, the result of the operator is also true.

std::optional<double> ConstantExpressionEvaluator::evaluateBinaryOperator(
    const ResolvedBinaryOperator &binop, bool allowSideEffects) {
  std::optional<double> lhs = evaluate(*binop.lhs);

  if (!lhs && !allowSideEffects)
    return std::nullopt;

  if (binop.op == TokenKind::PipePipe) {
    if (toBool(lhs) == true)
      return 1.0;

    ...
  }

  ...
}

If the LHS is false or side effects are allowed and the RHS is known to be true, the result is true.

std::optional<double> ConstantExpressionEvaluator::evaluateBinaryOperator(
    const ResolvedBinaryOperator &binop, bool allowSideEffects) {
  ...

  if (binop.op == TokenKind::PipePipe) {
    ...

    std::optional<double> rhs = evaluate(*binop.rhs, allowSideEffects);
    if (toBool(rhs) == true)
      return 1.0;

    ...
  }

  ...
}

If both the LHS and the RHS are known, but none of them is true, the result is false. In every other case, the result cannot be calculated in compile time.

std::optional<double> ConstantExpressionEvaluator::evaluateBinaryOperator(
    const ResolvedBinaryOperator &binop, bool allowSideEffects) {
  ...

  if (binop.op == TokenKind::PipePipe) {
    ...

    if (lhs && rhs)
      return 0.0;

    return std::nullopt;
  }

  ...
}

The && operator is handled the same way, except with the opposite logical values. If the LHS or RHS is known to be false when side effects don't matter, the result is false. If both of them are known to be true, the result is true and the result is unknown in every other case.

std::optional<double> ConstantExpressionEvaluator::evaluateBinaryOperator(
    const ResolvedBinaryOperator &binop) {
  ...

  if (binop.op == TokenKind::AmpAmp) {
    if (toBool(lhs) == false)
      return 0.0;

    std::optional<double> rhs = evaluate(*binop.rhs, allowSideEffects);
    if (toBool(rhs) == false)
      return 0.0;

    if (lhs && rhs)
      return 1.0;

    return std::nullopt;
  }

  ...
}

For the rest of the binary operators, if both of their sides are known in compile time, the respective operation is performed on their values.

std::optional<double> ConstantExpressionEvaluator::evaluateBinaryOperator(
    const ResolvedBinaryOperator &binop) {
  ...

  if (!lhs)
    return std::nullopt;

  std::optional<double> rhs = evaluate(*binop.rhs);
  if (!rhs)
    return std::nullopt;

  switch (binop.op) {
  case TokenKind::Asterisk:
    return *lhs * *rhs;
  case TokenKind::Slash:
    return *lhs / *rhs;
  case TokenKind::Plus:
    return *lhs + *rhs;
  case TokenKind::Minus:
    return *lhs - *rhs;
  case TokenKind::Lt:
    return *lhs < *rhs;
  case TokenKind::Gt:
    return *lhs > *rhs;
  case TokenKind::EqualEqual:
    return *lhs == *rhs;
  default:
    llvm_unreachable("unexpected binary operator");
  }
}

Storing the Result

The ResolvedExpr node is equipped with the capability of storing its compile-time known value so that it can be reused later without recalculation for further reasoning about the semantics of the source file and more efficient code generation.

Instead of directly storing the value in the node, a ConstantValueContainer utility is created.

template <typename Ty> class ConstantValueContainer {
  std::optional<Ty> value = std::nullopt;

public:
  void setConstantValue(std::optional<Ty> val) { value = std::move(val); }
  std::optional<Ty> getConstantValue() const { return value; }
};

This utility then becomes the base class of the ResolvedExpr node.

struct ResolvedExpr : public ConstantValueContainer<double>,
                      public ResolvedStmt {
  ...
};

To be able to visualize the calculated value of an expression, the dump() method of every resolved expression needs to be extended.

void ResolvedNumberLiteral::dump(size_t level) const {
  ...
  if (auto val = getConstantValue())
    std::cerr << indent(level) << "| value: " << *val << '\n';
}

// The rest of the dump() method extensions are omitted.
...

With the ability to store the result, the evaluate() method can be made more efficient by checking whether the given expression already has its value calculated and returning that value immediately.

std::optional<double>
ConstantExpressionEvaluator::evaluate(const ResolvedExpr &expr) {
  if (std::optional<double> val = expr.getConstantValue())
    return val;

  ...
}

The same optimization can also be used during code generation so that instead of a series of instructions, only a constant value is generated.

llvm::Value *Codegen::generateExpr(const ResolvedExpr &expr) {
  if (auto val = expr.getConstantValue())
    return llvm::ConstantFP::get(builder.getDoubleTy(), *val);

  ...
}

Calling the Interpreter

The constant expression evaluator is instantiated with Sema and called on demand in a set of cases.

class Sema {
  ConstantExpressionEvaluator cee;
  ...
};

One such case is calculating the value of an argument passed to a function. Since the expression is expected to be replaced with a constant value, the allowSideEffects option is set be false.

std::unique_ptr<ResolvedCallExpr> Sema::resolveCallExpr(const CallExpr &call) {
  ...
  for (auto &&arg : call.arguments) {
    ...
      return report(resolvedArg->location, "unexpected type of argument");

    resolvedArg->setConstantValue(cee.evaluate(*resolvedArg, false));

    ...
  }
  ...
}

The other case is optimizing the value in a return statement the same way, without allowing side effects.

std::unique_ptr<ResolvedReturnStmt>
Sema::resolveReturnStmt(const ReturnStmt &returnStmt) {
  ...
  if (returnStmt.expr) {
    ...

    resolvedExpr->setConstantValue(cee.evaluate(*resolvedExpr, false));
  }
  ...
}