Designing a Theorem Prover, Part 4

Terms Link to heading

One of the first design steps is choosing how to represent terms and formulas.
We have 3 kinds of variables – meta-variables, parameters and bound variables.

Meta-variables, which we’ll now simply call “variables,” are represented by names like \(x\), \(y\).
Parameters have a name and a list of variables that represent quantifier rule provisos.
For bound variables we’ll use De Bruijn indexes that eliminate all bound variable names. This makes formulas like \(\forall x. \forall y. P(x)\) and \(\forall y. \forall x. P(y)\) equivalent.

What Are De Bruijn Indexes?

de Bruijn indexes are a way to represent variables in expressions without using explicit names. The key Idea is that instead of naming variables like \(x\), \(y\), or \(z\), we number them based on their distance (or “depth”) from the binding site. The binding site is where a variable is introduced, such as in a lambda abstraction (\(\lambda\)) or a quantifier (\(\forall\), \(\exists\)). Variables are numbered starting from 0, with 0 referring to the most recently introduced binding. The next binding is 1, and so on.

Examples of equivalent formulas:

\(\forall x. \forall y. P(x)\) ~ \(\forall \forall P(1)\) (Because \(y\) has depth \(0\) and \(x\) has depth \(1\))

\(\forall y. \forall x. P(y)\) ~ \(\forall \forall P(1)\)

\(\forall x. \forall y. P(y)\) ~ \(\forall \forall P(0)\)

\(\forall x. \forall x. P(x)\) ~ \(\forall \forall P(0)\)

\(\exists x. ((\exists y. Q(y)) \land S(x))\) ~ \(\exists((\exists Q(0)) \land S(0))\) - Different variables, same index.

\(\forall x. (Q(x) \lor \exists z. R(z, x))\) ~ \(\forall(Q(0) \lor \exists R(0, 1))\) - Same variables, different indices.

We also have a function application, which represented by the name of the function paired with the list of its arguments.

Here is how type definition in OCaml:

(** term.mli *)

type t =
  | Var of string  (** Meta-variable. *)
  | Param of string * string list
      (** Parameters like [b[?a1, ..., ?an]].
      Each parameter has a name and a list of variables that represent provisos of quantifier rules. *)
  | Bound of int  (** Bound variable. *)
  | Function of string * t list  (** Function application. *)
[@@deriving eq, show]

Formulas Link to heading

There are 3 kinds of formulas:

Predicate / Atomic Formula

\(P(t_1, \dots, t_n)\)
Represented by the name and the list of arguments (terms).

Connective

\(A \land B, A \lor B, A \rightarrow B, A \leftrightarrow B\), \(\neg A\)
In the examples above \(A\) and \(B\) are formulas.
We’re going to represent this by the connective and the list of subformulas.

Quantified formula

\(\forall x.A\), \(\exists x. A\)
Represented by the type of quantifier, the name of bound variable and the body.
Note that we keep the name of a bound variable only for showing results, since we use De Bruijn notation.

This leads to the following type definition in OCaml:

(** formula.mli *)

type t =
  | Pred of string * Term.t list
      (** Atomic formulas (predicates): [P(t1, ..., tn)].
      Represented by the name and the list of arguments (terms). *)
  | Conn of string * t list
      (** Connectives like [A & B], [A | B], [A -> B], [A <-> B], [!B] and so on.
      Represented by the connetive name and the list of its subformulas. *)
  | Quant of string * string * t
      (** Quantified formulae like [\exists x.A] and [\forall x.A].
      Represented by the type of quantifier,
      the name of bound variable (used only for printing) and the body. *)
[@@deriving eq, show]

Abstraction and Substitution Link to heading

We need an abstraction because, in de Bruijn notation, no formula should contain free variables. If \(x\) is a free variable in \(A\), we cannot simply make it quantified by attaching \(\forall x\). Instead, we use an abstraction. Abstraction replaces occurrences of \(t\) by a bound variable. if \(A\) is a formula, \(t\) is a term, and \(x\) is a bound variable not present in \(A\), then \(\forall x. A [x/t]\) and \(\exists x. A [x/t]\) are formulas.

Example

Abstract \(\forall z. R(z, a) \leftrightarrow P(a)\) over \(a\) to get \(\exists x. (\forall z. R(z, x)) \leftrightarrow P(x)\).

Step 1 (abstract): \[ abs(0)[\forall R(0, a) \leftrightarrow P(a)] = \\ abs(0)[\forall R(0, a) \leftrightarrow abs(0)[P(a)]] = \\ (\forall abs(1)[R(0, a)] \leftrightarrow P(0)) = \\ (\forall R(0, 1)) \leftrightarrow P(0) \]

Step 2 (attach \(\exists\) quantifier): \[ \exists (\forall R(0, 1)) \leftrightarrow P(0) \]

We need substitution as well to replace occurrences of the bound variable by \(t\).
So basically it does the opposite thing.

Implementation Link to heading

For now my dune file have only a few dependencies:

(library
 (name folderol)
 (inline_tests)
 (libraries core)
 (preprocess (pps ppx_deriving.show ppx_deriving.eq ppx_inline_test)))

(env (dev (flags (:standard -warn-error -A))))

I added one extra dependency ppx_inline_test which is used for testing.

In OCaml the type signatures for abstract and subst_bound_var functions might look like :

(** formula.mli *)

val abstract : Term.t -> t -> t
(** Replaces occurences of [Term.t] by a bound variable. *)

val subst_bound_var : Term.t -> t -> t
(** Replaces occurences of a bound variable by [Term.t]. *)

And the implementation:

(** formula.ml *)

let abstract term formula =
  let rec abs ix = function
    | Pred (name, args) ->
        Pred (name, List.map args ~f:(Term.replace (term, Term.Bound ix)))
    | Conn (name, subformulas) -> Conn (name, List.map subformulas ~f:(abs ix))
    | Quant (quantifier, var_name, body) ->
        Quant (quantifier, var_name, abs (ix + 1) body)
  in
  abs 0 formula

let subst_bound_var term formula =
  let rec subst ix = function
    | Pred (name, args) ->
        Pred (name, List.map args ~f:(Term.replace (Term.Bound ix, term)))
    | Conn (name, subformulas) -> Conn (name, List.map subformulas ~f:(subst ix))
    | Quant (quantifier, var_name, body) ->
        Quant (quantifier, var_name, subst (ix + 1) body)
  in
  subst 0 formula

Notice that we use an extra function to recursively traverse the given term and replace all occurrences of the specified term with a new one:

(** term.mli *)

val replace : t * t -> t -> t
(** Traverses the given term recursively
    and replaces all occurrences of [old_term] with [new_term].

    @param old_term The term to be replaced.
    @param new_term The term to replace [old_term] with.
    @param term The term to traverse and modify.

    If [old_term] is not found in [term],
    the original term is returned unmodified. *)

(** term.ml *)

let rec replace (old_term, new_term) term =
  if term = old_term then new_term
  else
    match term with
    | Function (name, args) ->
        Function (name, List.map args ~f:(replace (old_term, new_term)))
    | _ -> term

I added 8 basic “inline” tests to ensure everything functions as expected. Here’s an example test:

let%test "abstract with nested formula" =
let term = Term.Var "x" in
let formula =
  Conn
    ( "&",
      [
        Pred ("P", [ Term.Var "x" ]);
        Quant ("forall", "y", Pred ("Q", [ Term.Var "x" ]));
      ] )
in
let expected =
  Conn
    ( "&",
      [
        Pred ("P", [ Term.Bound 0 ]);
        Quant ("forall", "y", Pred ("Q", [ Term.Bound 1 ]));
      ] )
in
expected = abstract term formula

That’s it for now! Next time, we’ll focus on lexing and parsing formulas.