Fueled Evaluation for Decidable Type Checking

Term	Beta normal form
`(\x -> x y) z`	`z y`
`\x -> (\y -> x y) z`	`\x -> x z`
`x y`	`x y`

Vanilla Normalization by Evaluation

Let me first lay out the foundation upon which I will demonstrate fueled evaluation. We are going to use an approach of normalizing higher-order terms called Normalization by Evaluation (NbE). Simply speaking, we are going to have two functions, eval and quote: the former evaluates the input term up to lambdas (i.e., it does look into lambda bodies), whereas the latter function pushes evaluation under lambdas, thereby obtaining the normal form. By combining the two functions, roughly quote . eval, we would obtain a full lambda calculus normalizer.

The following is the definition of a lambda term, in OCaml:

type term = Lam of term | Var of int | Appl of term * term
[@@deriving show { with_path = false }]

We use the first-order representation of lambda abstractions, forcing Var to contain a De Bruijn index (starting at 0) following the well-known scheme.

To be able to write a sequence of applications conveniently, we define the following shortcut function:

let appl (f, list) = List.fold_left (fun m n -> Appl (m, n)) f list

Next, here is the representation of values:

type value = VClosure of value list * term | VNt of neutral
and neutral = NVar of int | NAppl of neutral * value

Here we use VClosure to represent a closure function; it contains a lambda body that closes over the environment of values. We also use VNt to represent computations blocked on a value of some unknown variable. This is an essential piece of NbE: while ordinary computation requires variables to hold concrete values at any point, NbE permits variables to be unknown during evaluation.

We define some shortcut functions for convenience:

let vvar lvl = VNt (NVar lvl)
let vappl (m, n) = VNt (NAppl (m, n))

Now we define the evaluator itself, which throws term under rho:value list into value:

let rec eval ~rho = function
  | Lam m -> VClosure (rho, m)
  | Var idx -> List.nth rho idx
  | Appl (m, n) -> (
      let m_val = eval ~rho m in
      let n_val = eval ~rho n in
      match m_val with
      | VClosure (rho, m) -> eval ~rho:(n_val :: rho) m
      | VNt neut -> vappl (neut, n_val))

The code is pretty self-explanatory. The only thing to note is that values in the environment rho (pronounced as “row” ²) are indexed via De Bruijn indices (see the case of Var idx), making access as easy as possible.

The next piece of code is the quote function, which throws value under lvl:int back into term:

let rec quote ~lvl = function
  | VClosure (rho, m) ->
      let m_nf = normalize_at ~lvl ~rho m in
      Lam m_nf
  | VNt neut -> quote_neut ~lvl neut

and quote_neut ~lvl = function
  | NVar var -> Var (lvl - var - 1)
  | NAppl (neut, n_val) ->
      let m_nf = quote_neut ~lvl neut in
      let n_nf = quote ~lvl n_val in
      Appl (m_nf, n_nf)

In the case of NVar var, we convert a De Bruijn level var into a corresponding De Bruijn index ³; this is what the formula lvl - var - 1 does. In the case of VClosure (rho, m), we explicitly push evaluation under binders by calling normalize_at ~lvl ~rho m. The function normalize_at is defined as follows:

and normalize ~lvl ~rho term = quote ~lvl (eval ~rho term)

and normalize_at ~lvl ~rho term =
  normalize ~lvl:(lvl + 1) ~rho:(vvar lvl :: rho) term

Hooray, we have our vanilla NbE ready! Let us test it with some trivial examples.

Applying id to itself:

(* (Lam (Var 0)) *)
let id = Lam (Var 0) in
normalize ~lvl:0 ~rho:[] @@ Appl (id, id)

Computing a boolean expression if true then false else true:

(* (Lam (Lam (Var 0))) *)
let t = Lam (Lam (Var 1)) in
let f = Lam (Lam (Var 0)) in
let if_then_else = Lam (Lam (Lam (Appl (Appl (Var 2, Var 1), Var 0)))) in
normalize ~lvl:0 ~rho:[] @@ appl (if_then_else, [ t; f; t ])

Computing SKSK (from the SKI calculus):

(* (Lam (Lam (Var 1))) *)
let k = Lam (Lam (Var 1)) in
let s = Lam (Lam (Lam (Appl (Appl (Var 2, Var 0), Appl (Var 1, Var 0))))) in
normalize ~lvl:0 ~rho:[] @@ appl (s, [ k; s; k ])

All the results are correct. Now, let us extend the code to support fueled evaluation.

Fueled evaluation

Before expounding on the idea, let me show the actual code ⁴.

As per term representation, we need to augment every term (including all of its subterms) with a mutable integer called fuel. We could do something like this:

type term = term_def * int ref

and term_def = Lam of term | Var of int | Appl of term * term
[@@deriving show { with_path = false }]

However, this approach involves a few maintenance problems. First, besides the type checker and the evaluator, there may be more moving parts in the compiler that manipulate terms – and they do not care a little about fuel! Second, if we already have these moving parts, and we want to extend our evaluation mechanism to support fueled evaluation, we need to scrupulously modify all of them. Third, in a real implementation, we may want to write a lot of unit tests, and manually inserting fuel into every term and subterm is going to be a very tedious task.

The solution is to use ML modules – they really shine in this case. We begin by defining the following functor ⁵:

module Make_term (S : sig
  type 'a t [@@deriving show]
end) =
struct
  type t = def S.t

  and def = Lam of t | Var of int | Appl of t * t
  [@@deriving show { with_path = false }]

  let _ = S.show
end

(The let _ = S.show line just suppresses a compiler warning, ignore it.)

The functor Make_term takes a module with a polymorphic type component t and outputs a new module that contains a (different) type t that specializes S.t to def, which is the term definition itself. Inside def, we refer to the previously defined t so that not only the upper level of a term is annotated but also all of its subterms.

We can now define two versions of terms, the first being called T:

module T = Make_term (struct
  type 'a t = 'a [@@deriving show]
end)

And the second being called Nbe_term:

module Nbe_term = Make_term (struct
  type 'a t = 'a * int ref [@@deriving show]
end)

The difference between these two should be pretty clear: while T is the regular term representation we had in the previous section, Nbe_term possesses the additional fuel component. Besides the evaluator, code that used to deal with terms can be left untouched – save a few lines of auxiliary code.

Moving further, we need a way to turn T.def into Nbe_term.def. This can be accomplished quite straightforwardly:

let limit = 1000

let tank =
  let open Nbe_term in
  let rec go term = (go_def term, ref limit)
  and go_def = function
    | T.Lam m -> Lam (go m)
    | T.Var idx -> Var idx
    | T.Appl (m, n) -> Appl (go m, go n)
  in
  go

The limit constant is the maximum number of times a certain term can be passed to the evaluator, as we will see shortly. The tank function just recurses over the structure of an input term, annotating every subterm with ref limit.

The key change lies in the evaluator code:

exception Out_of_fuel of Nbe_term.def

let check_limit_exn (term, fuel) =
  if !fuel > 0 then decr fuel else raise (Out_of_fuel term)

let rec eval ~rho (term, fuel) =
  let open Nbe_term in
  check_limit_exn (term, fuel);
  match term with
  | Lam m -> VClosure (rho, m)
  | Var idx -> List.nth rho idx
  | Appl (m, n) -> (
      let m_val = eval ~rho m in
      let n_val = eval ~rho n in
      match m_val with
      | VClosure (rho, m) -> eval ~rho:(n_val :: rho) m
      | VNt neut -> vappl (neut, n_val))

The only needed change is to add the extra check_limit_exn (term, fuel) line to the beginning of the function. If the fuel variable is zero, we raise an exception called Out_of_fuel; otherwise, we merely decrement the fuel and proceed with the evaluation.

The rest of the code is concerned with quotation and full normalization. Unlike eval, it need not be changed at all, but let me include it for reference:

let rec quote ~lvl = function
  | VClosure (rho, m) ->
      let m_nf = normalize_at ~lvl ~rho m in
      T.Lam m_nf
  | VNt neut -> quote_neut ~lvl neut

and quote_neut ~lvl = function
  | NVar var -> T.Var (lvl - var - 1)
  | NAppl (neut, n_val) ->
      let m_nf = quote_neut ~lvl neut in
      let n_nf = quote ~lvl n_val in
      T.Appl (m_nf, n_nf)

and normalize ~lvl ~rho term = quote ~lvl (eval ~rho term)

and normalize_at ~lvl ~rho term =
  normalize ~lvl:(lvl + 1) ~rho:(vvar lvl :: rho) term

One thing to note is that the inferred types are going to be different. Previously, normalize and normalize_at both had the following function type:

lvl:int -> rho:value list -> term -> term

But now, the type of normalize and normalize_at is:

lvl:int -> rho:value list -> Nbe_term.t -> T.def

As you can observe, the new input term type is Nbe_term.t instead of T.t. This is quite correct because, in quote, we still need to call normalize_at with m : Nbe_term.t coming from VClosure (rho, m); however, for the sake of convenience of usage, we should be able to redefine normalize as follows ⁶:

let normalize term = normalize ~lvl:0 ~rho:[] (tank term)

As expected, the final type of normalize is T.def -> T.def.

Testing termination

The new implementation is able to detect potentially non-terminating computation. The following function tests a term:

let test term =
  try
    let _nf = normalize term in
    print_endline "Ok."
  with Out_of_fuel term ->
    Printf.printf "The computational limit is reached for %s.\n"
      (Nbe_term.show_def term)

Let us define some Church numerals for testing:

let zero = T.(Lam (Lam (Var 0)))
let succ = T.(Lam (Lam (Lam (Appl (Var 1, Appl (Appl (Var 2, Var 1), Var 0))))))

let mul =
  T.(Lam (Lam (Lam (Lam (Appl (Appl (Var 3, Appl (Var 2, Var 1)), Var 0))))))

let appl (f, list) = List.fold_left (fun m n -> T.Appl (m, n)) f list
let one = appl (succ, [ zero ])
let two = appl (succ, [ one ])
let three = appl (succ, [ two ])
let four = appl (succ, [ three ])
let five = appl (succ, [ four ])
let ten = appl (mul, [ five; two ])
let hundred = appl (mul, [ ten; ten ])
let thousand = appl (mul, [ hundred; ten ])

If we apply id to itself, the console output will be Ok.:

let id_appl = T.(Appl (Lam (Var 0), Lam (Var 0))) in
test id_appl

If we try to compute 5000 using Church numerals, the output will be Ok. as well:

let n5k = appl (mul, [ thousand; five ]) in
test n5k

However, normalization of the omega combinator yields a failure:

let self_appl = T.(Lam (Appl (Var 0, Var 0))) in
let omega = T.Appl (self_appl, self_appl) in
test omega

The output is The computational limit is reached for (Appl (((Var 0), ref (0)), ((Var 0), ref (0)))), as expected.

Informal reasoning

Why is fueled evaluation implemented exactly as it is?

The informal reasoning behind this approach is as follows. Usually, time complexity of an algorithm depends on the size of input (the so-called “Big O notation”); in this case, the bigger the input, the more computation steps the algorithm will perform. For example, if the time complexity is O(n^2), the number of steps for some input of size 5 is 25, whereas if the input size is 10, the number of steps increases to 100.

Of course, for a language to be Turing-complete, it must permit terms whose time complexity cannot be described as a function of an input size. Therefore, we cannot determine the complexity of eval, but we can instead approximate the number of invocations of eval by augmenting each input term with some suitable integer: the bigger the term, the more times eval can be (recursively) invoked ⁷.

Termination of eval ~rho term for any rho and term : Nbe_term.t follows from the following observations:

All terms passed to eval are drawn from the finite set of all subterms of term.
The number of times a certain term can be passed to eval is limited.

Since quote, normalize, and normalize_at only call eval and reduce their arguments structurally, they terminate as well.

However, this does not guarantee termination (i.e., decidability ⁸) of type checking. For example, if you do some sophisticated machinery during type checking, such as certain forms of unification, you may still end up with an undecidable type system. Thus, the only thing that the aforestated property guarantees is that the type system will not be undecidable due to the evaluation mechanism.

There are several advantages of fueled evaluation over more traditional approaches:

In contrast to the Calculus of Inductive Constructions (CIC), which only allows arguments to be reduced structurally to ensure termination, the advantage of our approach is that it does not exclude any programming techniques – it is always possible to adjust the fuel constant for specific needs.
In contrast to termination checking with homeomorphic embedding ⁹ ¹⁰, which requires accumulating a history of computation and performing a structural check for every previous term in the history, our checking conditions possess very little performance penalty. This is very crucial for any dependently typed language. In addition, the homeomorphic embedding relation requires a form of small-step operational semantics in order to have a traceable history of computation, whereas our NbE algorithm more resembles big-step semantics.
We can extend the language with more constructions without worrying about introducing potential non-termination. This is because fueled evaluation considers only the size metric of terms, not their actual contents.

Limiting potentially infinite computations with a configurable constant is widely used in practical compiler implementations:

GCC and Clang have the -ftemplate-depth option (resp. GCC and Clang) to “set the maximum instantiation depth for template classes”.
Rust has the recursion_limit attribute to “set the maximum depth for potentially infinitely-recursive compile-time operations like macro expansion or auto-dereference”.
Scala 3 has the -Xmax-inlines option described as the “maximal number of successive inlines”.
Lean 4 has the maxRecDepth option described as the “maximum recursion depth” (for macros and regular computation).

This provides empirical evidence that the suggested approach should work well in practice. Note that we could just as well fix globally the limit for eval, but such a solution would be strictly less flexible. The key idea behind fueled evaluation is to expand and shrink the freedom of evaluation according to the given input – the bigger the term, the more freedom is granted to eval ¹¹.

The type checker sketch

Although the type checking algorithm is out of the scope of the current writing, I briefly sketch how it could be adapted to fueled evaluation.

It is reasonable to give the end user the right to configure the limit constant at will. Therefore, the Typing module should contain the Make functor that accepts limit as a run-time parameter. This situation can roughly look as follows ¹²:

module type Opts = sig
  val fuel : int
end

module Make (_ : Opts) : sig
  val infer
    :  rho:Value.t list
    -> gamma:Gamma.t
    -> Raw_term.t
    -> (Term.t * Value.t, Report.t) result

  val check
    :  rho:Value.t list
    -> gamma:Gamma.t
    -> Raw_term.t * Value.t
    -> (Term.t, Report.t) result
end

The implementation of Typing would then look as:

module type Opts = sig
  val fuel : int
end

module Make (Opts : Opts) = struct
  let eval ~rho term = Nbe.eval ~rho (Value.Term'.tank ~fuel:Opts.fuel term)

  let rec infer ~(rho : Value.t list) ~(gamma : Gamma.t) = function (* ... *)
  and check ~(rho : Value.t list) ~(gamma : Gamma.t) = function (* ... *)
end

Whenever we elaborate a new term, we can submit it to the inner eval function (from within Make) to obtain a value. The Term' submodule of the module Value is a functor application of the following form:

module Term' = struct
  include Term.Make (struct
      type 'a t = 'a * int ref [@@deriving show]
    end)

  let tank ~fuel = (* ... *)
end

With the following signature:

module Term' : sig
  include module type of Term.Make (struct
      type 'a t = 'a * int ref [@@deriving show]
    end)

  val tank : fuel:int -> Term.t -> t
end

In turn, the Term module signature must include the Make functor that defines the term:

module Make (S : sig
    type 'a t [@@deriving show]
  end) : sig
  type t = def S.t

  and def = (* ... *)
  [@@deriving show]
end

And apply it as follows (in the same signature):

include module type of Make (struct
    type 'a t = 'a [@@deriving show]
  end)

The implementation of Term would be:

module Make (S : sig
    type 'a t [@@deriving show]
  end) =
struct
  type t = def S.t

  and def = (* ... *)
  [@@deriving show]

  let _ = S.show
end

include Make (struct
    type 'a t = 'a [@@deriving show]
  end)

To use the Typing module, all we need is to instantiate the functor Typing.Make with the constant cli_fuel coming as a run-time CLI parameter ¹³:

let module Typing =
  Typing.Make (struct
    let fuel = cli_fuel
  end)
in
(* ... *)

Then, writing Typing.infer ~rho ~gamma prog (same with check) would launch the type checking process for prog : Raw_term.t.

Finally, it would make practical sense to do these two things:

Exception handling. Do not forget to handle the Out_of_fuel exception and properly signal the error to the end user.
Source code tracking. Add the SrcPos variant to Term.Make.def. Include a term and a source segment for it (can be a pair of Lexing.position). If Out_of_fuel is caught up, highlight the source text that raised the exception.

Final words

Although the demonstrated approach is concerned with pure untyped lambda calculus, the same reasoning applies to various typed lambda calculi, which can be non-terminating for many reasons, starting from unrestricted recursion using a built-in fixed-point combinator and ending with Girard’s paradox.

There is a paper from Neil D. Jones called “Call-by-value Termination in the Untyped lambda-calculus” ¹⁴. The paper suggests an offline approach to termination checking, i.e., checking termination before actually executing the program. In contrast, fueled evaluation is an online checking algorithm – it checks termination during execution. The approach from the paper rests on the fact from pure lambda calculus that the set of subexpressions of the evaluated term is a subset of subexpressions of the input program (see lemma 3.6, definition 3.7, and lemma 3.8). Unfortunately, this surprising property breaks after adding built-ins to the language: consider an expression !true resulting in false, which is not a subexpression of !true (same for integers, strings, etc.) Whether the approach demonstrated in the paper can be generalized to a more practical language than pure lambda calculus is a matter of further research.

What is left to do is to apply the technique to a real-world implementation of a dependently typed programming language. I am currently working on one, which is not made public yet. As far as things go, the technique works well enough – it is sufficiently simple, stupid to reason about, test, and maintain.

If I do not forget, I will link an open-sourced repository someday here.

References

Note that it is typically not implemented in such a brutal way. A more practical approach is to write a beta conversion checker that gradually evaluates a pair of terms until either there is no more left to evaluate or equality fails. However, full normalization is still used when inferring dependent function types or printing terms. Update: As people have pointed out, full normalization should be avoided in production-ready implementations. For example, see Agda’s opaque definitions for controlling unfolding.↩︎
The naming is borrowed from Thierry Coquand’s paper “An algorithm for type-checking dependent types” (1996).↩︎
Contrary to a De Bruijn index, a De Bruijn level is a natural number indicating the number of binders between the variable’s binder and the term’s root. For example, the term \x -> \y -> x is represented as \_ -> \_ -> 0, where 0 is the De Bruijn level of x. The De Bruijn level of y would be 1 if we used it instead of x.↩︎
The complete code for fueled evaluation can be accessed as a GitHub gist.↩︎
You can think of a functor as a module-level function: a function that maps a module expression to another module expression. A more thorough explanation can be found in the corresponding chapter of “Real World OCaml”.↩︎
The redefinition of normalize is a good example of name shadowing. We could name the first version of normalize as normalize_nbe, but 1) the code would need to be changed, 2) the longer name would be less convenient to use, and 3) it would pollute the whole namespace, causing confusion between the two versions.↩︎
According to the halting problem, it is generally impossible to tell if an arbitrary program will finish running – so we are doomed to use one or another approximation in any case. However, it is not that bad as it may seem to be: type systems are already approximations of operationally sound programs.↩︎
Update: As people have pointed out, the word “decidability” is for an algorithmic problem, whereas termination is a property of a concrete algorithm. Therefore, it is more correct to refer to a type system as decidable or undecidable, and refer to a type checker as terminating or non-terminating. By restricting evaluation with fuel, we make the (hypothetical) type system decidable and the type checker terminating.↩︎
Robert Glück and Morten Heine Sørensen. 1996. A Roadmap to Metacomputation by Supercompilation. In Selected Papers from the International Seminar on Partial Evaluation. Springer-Verlag, Berlin, Heidelberg, 137–160.↩︎
Sørensen, M.H.B. (1998). Convergence of program transformers in the metric space of trees. In: Jeuring, J. (eds) Mathematics of Program Construction. MPC 1998. Lecture Notes in Computer Science, vol 1422. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054297↩︎
As I have mentioned earlier, this is only an approximation. For example, omega from the previous section is pretty small, but it requires an infinite number of steps to normalize. On the other hand, a huge term can be quickly normalizable if most of its subterms are never executed (i.e., it contains many “dead paths”).↩︎
The infer and check functions manifest a bidirectional typing algorithm – a de facto standard for dependent type checking.↩︎
The instantiation of Typing.Make on cli_fuel provides a good example of run-time configuration of ML modules. Rudimentary module systems of most programming languages make this pattern impossible. The best we could do in Haskell, Rust, Java, or C# is to have a sort of record or class that accepts fuel as a parameter, together with the methods infer and check. With ML modules, we did not need to alter the structure of our code – the transition from static modules to configurable functors happened naturally.↩︎
Jones, N., & Bohr, N. (2008). Call-by-value Termination in the Untyped lambda-calculus. Logical Methods in Computer Science, Volume 4, Issue 1.↩︎