Designing a Theorem Prover, Part 1

Intro Link to heading

In this series, I’ll be working through the folderol (1990) by Lawrence C Paulson and trying to use its insights to design a theorem prover in OCaml. He calls the prover folderol and I will do the same. His prover follows automatic strategy but interactive and prints trace of its rules as evidence.

To start off, let’s see it in action. Here is how proof might look like when using folderol:

> read_goal "(P | Q) & (P | R) --> P | (Q & R)";
empty |- (P | Q) & (P | R) --> P | Q & R

> step();
-->:right
(P | Q) & (P | R) |- P | (Q & R)

> step();
|:right
(P | Q) & (P | R) |- Q & R, P

> step();
&:left
P | R, P | Q |- Q & R, P

> step();
|:left
P P | Q, R |- Q & R, P

> step();
|:left
P Q, R |- Q & R, P

> step();
&:right
Q R
No more goals: proof finished

Looks cool, right? So let’s go – it’s going to be a long journey ahead!

Automated Theorem Proving Link to heading

When we think about theorem provers, we need to distinguish between two key concepts:

Proof Assistant / Interactive Theorem Prover (wiki) – These systems assists with the development of formal proofs by human-machine collaboration. Usually in the form of keeping track of what you need to prove and what you have proved.
For example: Coq, Lean, Isabelle, Agda (?), F* (?).
Automated Theorem Prover – Uses AI (or other similar, relatively opaque algorithms) to come up with the steps to the proof.

There is no practical automatic proof procedure, even for most complete theories. Even if there were, interaction with the machine would still be necessary for exploring mathematics.

Framework Link to heading

A formal system is a framework for deriving theorems from axioms using specific rules.

Paulson’s work relies on First-Order Logic (ru) and uses Gentzen’s Sequent Calculus (ru) (LK) formal system. I don’t know anything about this shit, so let’s begin by familiarizing ourselves with Sequent Calculus.

Gentzen’s Sequent Calculus (LK) Link to heading

There are two main styles of proof calculus for expressing line-by-line logical arguments:

Hilbert Style

Uses unconditional tautology
Rich axiom set
Minimum number of inference rules

Hilbert systems are known for their elegance but can be less intuitive due to the extensive use of axioms.

Gentzen Style

Uses conditional tautology (more on that below)
Rich inference rules, minimum number of axioms
Represents proofs as derivation trees
Preferred for proof theory research

A proof calculus is like a blueprint for formal reasoning. It defines a general way to structure inferences, and by customizing its rules, you can create specific formal systems.

In sequent calculus, the focus shifts from proving absolute truths to demonstrating how one statement leads to another. This method centers on the logical flow from assumptions to conclusions.

As I mentioned above in sequent calculus every line of proof is a conditional tautology or a sequent.
Yeah, there are 2 types of tautologies:

Unconditional tautology – A statement that is true in all possible cases, regardless of the values of its variables:
- $A \lor \neg A$ (law of excluded middle)
- $A \rightarrow A$ (self-implication)
Both of these tautologies are true without any assumptions.
So they can be written as $\vdash A \lor \neg A$ and $\vdash A \rightarrow A$ respectively.
Conditional tautology (aka Sequent) – A statement of the form “If (certain assumptions), then (a certain conclusion)”. We write these conditions as $\Gamma \vdash \Delta$ ("$\Delta$ is derivable from $\Gamma$"), where
- $\vdash$ – Turnstyle – $P \vdash Q$ Means “$Q$ is provable from assumption $P$”.
- $\Gamma$ – Set of assumptions (premises) $\displaystyle {A_{1},\ldots ,A_n}$ – “Ancedent”.
- $\Delta$ – Conclusion ${B_{1},\ldots ,B_{k},}$ – “Succedent” or Consequent.
Both $\Gamma$ and $\Delta$ are unordered lists of formulae.

$\Gamma \vdash \Delta$ is equivalent to $\displaystyle {A_{1},\ldots ,A_{n}\vdash B_{1},\ldots,B_{k}}$ or ${\displaystyle \vdash (A_{1}\land \cdots \land A_{n})\rightarrow (B_{1}\lor \cdots \lor B_{k})}$
(Whenever every ${A_i}$ is true, at least one $B_i$ will also be true).

If it’s still not clear, here’s a simple example of a sequent: $A, B \vdash A \land B$.

So again, our Gentzen-style formal system deals with “sequents” of the form $\Gamma \vdash \Delta$.

$\Gamma \vdash \Delta$ is true – If every formula in $\Gamma$ is true then some formula in $\Delta$ is true: ${\displaystyle \vdash (A_{1}\land \cdots \land A_{n})\rightarrow (B_{1}\lor \cdots \lor B_{k})}$.
In terms of refutation: $\Gamma \vdash \Delta$ is false – if all formulas in $\Gamma$ are true then every formula in $\Delta$ is false.

If conditions $1.$ and $2.$ are contradictory then the sequent $\Gamma \vdash \Delta$ is true.
$A \vdash A$ basic sequent is trivially true.

Rules of inference Link to heading

In sequent form rules of inference are written as:

$\displaystyle \frac{S_1}{T}$ and $\displaystyle \frac{S_1 \quad S_2}{T}$

They are interpreted as statements about deriving the lower sequent $T$ from the upper sequents $S_1, S_2$.
A proof is written in a tree-like form from top to bottom:

$\displaystyle \cfrac {\cfrac {S_{1} ; S_{2}}{T_{1}} \quad \cfrac {S_{3}}{T_{2}}}{U}$

Each line represents a direct inference — the transition from the upper sequents to the lower one according to one of the accepted rules of inference in the system.

Two most popular types of sequent calculus are Gentzen-style systems LK and LJ. We’re interested in LK.

LK Link to heading

LK system has 1 axiom and a rich set of inference rules.

Axiom

$A \rightarrow A$

Structural inference rules

Let’s start with a quick review of the structural inference rules:

Rule L	Rule R	Name
$\displaystyle \frac{\Gamma \vdash \Delta}{A, \Gamma \vdash \Delta} \, (\text{WL})$	$\displaystyle \frac{\Gamma \vdash \Delta}{\Gamma \vdash \Delta, A} \, (\text{WR})$	Weakening (Thinning)
$\displaystyle \frac{A, A, \Gamma \vdash \Delta}{A, \Gamma \vdash \Delta} \, (\text{CL})$	$\displaystyle \frac{\Gamma \vdash \Delta, A, A}{\Gamma \vdash \Delta, A} \, (\text{CR})$	Contraction
$\displaystyle \frac{\Gamma, A, B, \Theta \vdash \Delta}{\Gamma, B, A, \Theta \vdash \Delta} \, (\text{PL})$	$\displaystyle \frac{\Gamma \vdash \Delta, A, B, \Lambda}{\Gamma \vdash \Delta, B, A, \Lambda} \, (\text{PR})$	Exchange (Permutation)

And finally the cut rule which allows you to remove (“cut off”) an intermediate statement $A$:

$\displaystyle \frac{\Gamma \vdash \Delta, A \quad A, \Theta \vdash \Lambda}{\Gamma, \Theta \vdash \Delta, \Lambda} \, (\text{C})$

Structural rules, namely exchange/permutation, thinning, and contraction, make sequents behave like a consequence relation.
The weakening (thinning) inference rules like $\displaystyle \frac{\Gamma \vdash \Delta}{A, \Gamma \vdash \Delta} \, (\text{WL})$, in backwards reasoning, delete formula $A$ from the goal. Since a formula should not be deleted unless it is known to be redundant, we will postpone all deletions.

A goal is trivially true if it has the form $\Gamma \vdash \Delta$ where there exists a common formula $A \in \Gamma \cap \Delta$.
Accepting this as a basic sequent makes thinning rules like $\displaystyle \frac{\Gamma \vdash \Delta}{A, \Gamma \vdash \Delta} \, (\text{WL})$ unnecessary.
Also we don’t need exchange rules like $\displaystyle \frac{\Gamma, A, B, \Theta \vdash \Delta}{\Gamma, B, A, \Theta \vdash \Delta}$, we can simply ignore the order of formulae.

I wrote a quick explanation of forwards reasoning versus backwards reasoning.

In backwards reasoning we read derivation trees from bottom to top $\displaystyle \uparrow$

Logical inference rules

In Gentzen’s LK we have the following logical inference rules:

Rule L	Rule R	Name
$\displaystyle \cfrac{\Gamma, A \vdash \Delta}{\Gamma, A \land B \vdash \Delta} \quad (\land L_1)$	$\displaystyle \cfrac{\Gamma \vdash A, \Delta}{\Gamma \vdash A \lor B, \Delta} \quad (\lor R_1)$	Left introduction for conjunction, Right introduction for disjunction
$\displaystyle \cfrac{\Gamma, A \vdash \Delta \qquad \Gamma, B \vdash \Delta}{\Gamma, A \lor B \vdash \Delta} \quad (\lor L)$	$\displaystyle \cfrac{\Gamma \vdash A, \Delta \qquad \Gamma \vdash B, \Delta}{\Gamma \vdash A \land B, \Delta} \quad (\land R)$	Left introduction for disjunction, Right introduction for conjunction
$\displaystyle \cfrac{\Gamma \vdash A, \Delta \qquad \Sigma, B \vdash \Pi}{\Gamma, \Sigma, A \rightarrow B \vdash \Delta, \Pi} \quad (\rightarrow L)$	$\displaystyle \cfrac{\Gamma, A \vdash B, \Delta}{\Gamma \vdash A \rightarrow B, \Delta} \quad (\rightarrow R)$	Introduction for implication
$\displaystyle \cfrac{\Gamma \vdash A, \Delta}{\Gamma, \lnot A \vdash \Delta} \quad (\lnot L)$	$\displaystyle \cfrac{\Gamma, A \vdash \Delta}{\Gamma \vdash \lnot A, \Delta} \quad (\lnot R)$	Introduction for negation
$\displaystyle {\cfrac {\Gamma ,A[t/x]\vdash \Delta }{\Gamma ,\forall x.A\vdash \Delta }}\quad (\forall L)$	$\displaystyle {\cfrac {\Gamma \vdash A[y/x],\Delta }{\Gamma \vdash \forall x. A,\Delta }}\quad (\forall R)$	Introduction for universal quantifier
$\displaystyle {\cfrac {\Gamma ,A[y/x]\vdash \Delta }{\Gamma ,\exists x. A\vdash \Delta }}\quad (\exists L)$	$\displaystyle {\cfrac {\Gamma \vdash A[t/x],\Delta }{\Gamma \vdash \exists x. A,\Delta }}\quad (\exists R)$	Introduction for existential quantifier

The introduction rule for universal quantifier ($\forall$) tells us that if we can show that a statement is true for any arbitrary value of $x$, then it is true for all $x$. This is because checking an arbitrary value covers all possible values of the variable, as an arbitrary value is not tied to a specific number.

Conjunction

In the table above, we have the right introduction for conjunction rule:
$\displaystyle \cfrac{\Gamma \vdash A, \Delta \qquad \Gamma \vdash B, \Delta}{\Gamma \vdash A \land B, \Delta} \quad (\land R)$

For backwards proof, if the goal contains $A \land B$ anywhere in the right part then we can produce 2 subgoals $A$ and $B$.
The more concise rule we can use for that is $\displaystyle \frac{\vdash A \quad \vdash B}{\vdash A \land B}$.

The contraction rules like $\displaystyle \frac{\Gamma \vdash \Delta, A, A}{\Gamma \vdash \Delta, A}$ in its simplified form $\displaystyle \frac{\vdash A, A}{\vdash A}$ duplicate a formula.

If we duplicate $A \land B$ and apply our new $\displaystyle \frac{\vdash A \quad \vdash B}{\vdash A \land B}$ we get:

Duplicate by applying the contraction rule:
$\displaystyle \frac{\vdash A \land B, A \land B}{\vdash A \land B}$
Apply the right introduction for conjunction rule: $\displaystyle \cfrac{\vdash A \quad \vdash B}{\vdash A \land B}$
Applying the rule above we get
$\vdash A, A \land B \quad \vdash B, A \land B$
which is equivalent to
$\vdash A \quad \vdash B$

Quantifiers

In logic, we use quantifiers to talk about things like “for all” and “there exists”.

For example:

$\forall x. P(x)$ means $P$ is true for every $x$.

$\exists y. P(y)$ means there’s at least one $y$ that makes $P$ true.

The only formulae worth duplicating are:

$\displaystyle \forall x. A$ (on the left side of $\forall L$)
$\displaystyle \exists x. A$ (on the right side of $\exists R$)

$\displaystyle {\cfrac {\Gamma ,A[t/x]\vdash \Delta }{\Gamma ,\forall x. A\vdash \Delta}}\quad (\forall L)$, $\displaystyle {\cfrac {\Gamma \vdash A[t/x],\Delta }{\Gamma \vdash \exists x. A,\Delta }}\quad (\exists R)$

This is because using these formulas may require generating multiple different instances by substituting different terms. Paulson suggests adding contraction to the rules for both $\forall L$ and $\exists R$. This allows the formula $\forall x. A$ or $\exists x. A$ to be retained during each step, so that the rule can be applied again for different terms.

The $\forall L$ rule

This rule is applied when the formula $\forall x. A$ is on the left side of the sequent (in the assumptions).

It works as follows:

From a sequent containing $\forall x. A$, a subgoal is generated where the formula $A[t/x]$ is added to the left side of the sequent for some term $t$.
A copy of $\forall x. A$ is retained in the subgoal so that the rule can be applied again with different terms.

Example

To illustrate how $\forall L$ can be used repeatedly, let’s say we have the formula $\forall x. P(x) \rightarrow P(f(x))$.

To prove $P(a) \rightarrow P(f(f(\dots f(a) \dots)))$, the $\forall L$ rule would be applied $n$ times, substituting first $a$, then $f(a)$, then $f(f(a))$, and so on.

Explanation

We know: $\forall x. P(x) \rightarrow P(f(x))$
The goal: $P(a)→P(f(f(…f(a)…)))$

Apply $\displaystyle {\cfrac {\Gamma ,A[t/x]\vdash \Delta }{\Gamma ,\forall x. A\vdash \Delta }}\quad (\forall L)$:
Substituting $x$ with $a$ results in a new goal $P(a) \rightarrow P(f(a))$.
Repeat the $\forall L$ rule:
Now we have $P(a) \rightarrow P(f(a))$ and we need to show $P(f(a)) \rightarrow P(f(f(a)))$
Continue:
Each application of the $\forall L$ rule substitutes $x$ with the result of applying $f$ to the previous term. For instance, after substituting $f(a)$, you get $P(f(a)) \rightarrow P(f(f(a)))$. Applying the rule again gives $P(f(f(a))) \rightarrow P(f(f(f(a))))$, and so forth.
Repeat $n$ times:
By continuing this process $n$ times, you generate a chain of implications, starting from $P(a)$ and ending with $P(f(f(\dots f(a) \dots)))$, where $f$ is applied $n$ times.

The $\exists R$ rule

Similarly, this rule is applied to existential quantifiers on the right side of the sequent. The $\exists R$ is retained in the subgoal so that it can be reused.

The problem of reusing quantifiers.

Our theorem prover uses these quantified formulae in rotation. It never throws them away. Reusing quantified formulas can lead to infinite loops.

For example, formula $\forall x. P(x) \vdash Q$ could cause the prover to run forever because the universal quantifier is applied to different terms repeatedly. Because our prover keeps reusing $\forall x$, the search space grows and becomes infinite, since it keeps generating new possibilities for $x$. When a theorem prover tries to handle this, it might need to check if $P(x)$ holds for many (or even infinitely many) possible values of $x$. It could keep applying the rule that substitutes different terms for $x$ over and over again, without ever reaching a conclusion. If there’s no clear stopping condition, this process could continue indefinitely.

Now that we have a foundational framework, in the next post, we’ll dive into propositional logic.
Stay tuned 😉

Rule L	Rule R	Name
\(\displaystyle \cfrac{\Gamma, A \vdash \Delta}{\Gamma, A \land B \vdash \Delta} \quad (\land L_1)\)	\(\displaystyle \cfrac{\Gamma \vdash A, \Delta}{\Gamma \vdash A \lor B, \Delta} \quad (\lor R_1)\)	Left introduction for conjunction, Right introduction for disjunction
\(\displaystyle \cfrac{\Gamma, A \vdash \Delta \qquad \Gamma, B \vdash \Delta}{\Gamma, A \lor B \vdash \Delta} \quad (\lor L)\)	\(\displaystyle \cfrac{\Gamma \vdash A, \Delta \qquad \Gamma \vdash B, \Delta}{\Gamma \vdash A \land B, \Delta} \quad (\land R)\)	Left introduction for disjunction, Right introduction for conjunction
\(\displaystyle \cfrac{\Gamma \vdash A, \Delta \qquad \Sigma, B \vdash \Pi}{\Gamma, \Sigma, A \rightarrow B \vdash \Delta, \Pi} \quad (\rightarrow L)\)	\(\displaystyle \cfrac{\Gamma, A \vdash B, \Delta}{\Gamma \vdash A \rightarrow B, \Delta} \quad (\rightarrow R)\)	Introduction for implication
\(\displaystyle \cfrac{\Gamma \vdash A, \Delta}{\Gamma, \lnot A \vdash \Delta} \quad (\lnot L)\)	\(\displaystyle \cfrac{\Gamma, A \vdash \Delta}{\Gamma \vdash \lnot A, \Delta} \quad (\lnot R)\)	Introduction for negation
\(\displaystyle {\cfrac {\Gamma ,A[t/x]\vdash \Delta }{\Gamma ,\forall x.A\vdash \Delta }}\quad (\forall L)\)	\(\displaystyle {\cfrac {\Gamma \vdash A[y/x],\Delta }{\Gamma \vdash \forall x. A,\Delta }}\quad (\forall R)\)	Introduction for universal quantifier
\(\displaystyle {\cfrac {\Gamma ,A[y/x]\vdash \Delta }{\Gamma ,\exists x. A\vdash \Delta }}\quad (\exists L)\)	\(\displaystyle {\cfrac {\Gamma \vdash A[t/x],\Delta }{\Gamma \vdash \exists x. A,\Delta }}\quad (\exists R)\)	Introduction for existential quantifier

Rule L	Rule R	Name
\(\displaystyle \frac{\Gamma \vdash \Delta}{A, \Gamma \vdash \Delta} \, (\text{WL})\)	\(\displaystyle \frac{\Gamma \vdash \Delta}{\Gamma \vdash \Delta, A} \, (\text{WR})\)	Weakening (Thinning)
\(\displaystyle \frac{A, A, \Gamma \vdash \Delta}{A, \Gamma \vdash \Delta} \, (\text{CL})\)	\(\displaystyle \frac{\Gamma \vdash \Delta, A, A}{\Gamma \vdash \Delta, A} \, (\text{CR})\)	Contraction
\(\displaystyle \frac{\Gamma, A, B, \Theta \vdash \Delta}{\Gamma, B, A, \Theta \vdash \Delta} \, (\text{PL})\)	\(\displaystyle \frac{\Gamma \vdash \Delta, A, B, \Lambda}{\Gamma \vdash \Delta, B, A, \Lambda} \, (\text{PR})\)	Exchange (Permutation)