Designing a Theorem Prover, Part 3

Quantifiers and Unification Link to heading

In this post, we’ll dive into the mechanics of quantifiers and unification in theorem proving.

Meta-Variables Link to heading

In the backward rules \(\forall L\) and \(\exists R\), the term \(t\) is left unspecified, meaning it can represent any expression. We will use special meta-variables like \(?a\), \(?b\), \(?c\), etc., which act as placeholders for such terms. The backward rules will insert a fresh meta-variable, such as \(?b\), in place of \(t\) (term) in the subgoal.

\(?a\), \(?b\), \(?c\) – meta-variables.

Unification Link to heading

Folderol attempts to unify the left formula \(A_1\) and the right formula \(A_2\) for each new subgoal. Unification works by replacing the meta-variables in \(A_1\) and \(A_2\) to both sides such that both sides become identical, resulting in a single unified formula \(A\).

Example 1

\(A_1 := R(?a, f(x))\)
\(A_2 := R(g(?b), b)\)

Both become:
\(A := R(g(f(x)), f(x))\)

By replacing:
\(?a \mapsto g(f(x))\)
\(?b \mapsto f(x)\)

Example 2

\(A_1 := R(?a, f(?c))\)
\(A_2 := R(g(?d), ?b)\)

Become:
\(A := R(g(?d), f(?c))\)

Parameters and Quantifier Rules Link to heading

We need to avoid variable capture during substitution, which occurs when a free variable unintentionally becomes bound. This can lead to invalid inferences, like:

\(\displaystyle {\cfrac {\vdash \forall x. \exists y. y \neq x}{\vdash \exists y. y \neq y}}\quad (\forall R)\)

When using the rule \(\forall R\), we say: To prove \(\forall x. A(x)\), the formula \(A(x)\) must hold for any arbitrary \(x\). However, during substitution, if we choose \(x\) as the arbitrary term, variable capture occurs:

In \(\forall x. \exists y. y \neq x\) variable \(y\) is free.
Substituting \(x = y\) causes \(y \neq x\) to turn into \(y \neq y\).

And we end up \(\vdash \exists y. y \neq y\). This is clearly a contradiction, as \(y \neq y\) is always false.
To prevent this, we define two distinct sets: parameters (\(a, b, c, \dots\)) and variables (\(x, y, z, \dots\)).

\(a, b, c, \dots\) – parameters
\(x, y, z, \dots\) – variables

Parameters remain unbound, while variables must always be bound.

When applying the \(\forall R\) or \(\exists L\) rule we should replace the bound variables like \(x\), \(y\), \(z\) with fresh parameters like \(a\), \(b\), \(c\).

Btw under this distinction subformula may not necessarily be a valid formula by itself.
For a example in \(\forall x. \exists y. y \neq x\), the subformula \(\exists y. y \neq x\) has variable \(x\) unbound.

When applying the rule \((\forall R)\) we replace the bound variable \(x\) with a fresh parameter, say \(a\), yielding the conclusion: \(\displaystyle \vdash \exists y. y \neq a\). But we’ll soon see that it’s more complicated than it appears…

Enforcement of Provisos of Quantifier Rules Link to heading

In this chapter Paulson uses the term “Proviso” a lot.

Proviso is a statement in an agreement, saying that a particular thing must happen before another can.

When we’re using quantifier rules, we have to be careful and make sure that we don’t accidentally mess things up by reusing or misusing the variables. Even though we try to introduce fresh new variables each time we apply a rule, sometimes things get tricky. When we replace meta-variables, it might accidentally change the thing we’re trying to prove.

The danger of instantiation of meta-variables is that it can change the conclusion.

Let’s consider the following formula:
\(\displaystyle \forall x. R(x, x) \vdash \exists y. \forall x. R(x, y)\)

To see why it is not valid think of \(R\) as some kind of relation. For example, let \(R(x, y)\) mean \(x = y\).
Why it is not valid? If \(R(x, y)\) means \(x\) equals \(y\), then you can’t make that work for all \(x\) and one \(y\). That’s like saying everyone is equal to the same person, which doesn’t make sense.

A derivation using meta-variables (and deleting some quantifiers for a better readability) might look like:

\(\displaystyle {\cfrac {\cfrac {\cfrac {R(?c, ?c) \vdash R(b, ?a)}{\forall x. R(x, x) \vdash R(b, ?a)} \quad (\forall L)}{\forall x. R(x, x) \vdash \forall x. R(x, ?a)} \quad (\forall R)}{\forall x. R(x, x) \vdash \exists y. \forall x. R(x, y)} \quad (\exists R)}\)

When applying rules \((\forall L)\) and \((\exists R)\) we replace terms \(t\) with meta-variables \(?a\), \(?b\), \(?c\).
When applying rules \((\forall R)\) and \((\exists L)\) we replace variables \(x\), \(y\), \(z\) with parameters \(a\), \(b\), \(c\).

From the first sight it may seem like replacing \(?c\) and \(?a\) by \(b\) completes the proof: \(\displaystyle R(b, b) \vdash R(b, b)\)

However, this introduces a circularity: assigning \(b\) to \(?a\) means that \(b\) appears in the conclusion of the rule \((\forall R)\), which violates the proviso that \(b\) must not appear in the conclusion \(R(b, ?a)\).
Hence the second inference is a nonsense: \(\displaystyle \cfrac {R(b, b) \vdash R(b, b)}{\forall x. R(x, x) \vdash R(b, ?a)}\)

We’re using meta-variables as placeholders so we could swap them out later. If we’re not careful, when we swap out these placeholders, we might accidentally create a loop — where the thing we’re trying to avoid ends up in the conclusion anyway.

To address this issue, we introduce the concept of dependency. If a parameter \(b\) depends on a meta-variable \(?a\), we write this as \(b[?a]\). This means that \(b\) must be distinct from any parameters in terms substituted for \(?a\).

Variable \(?a\) is unifiable with term \(t\) only if \(?a\) does not occur in \(t\). If a parameter \(b\) depends on a meta-variable \(?a\), we write this as \(b[?a]\). Assigning \(b[?a]\) to \(?a\) creates a clear loop. If some term \(t\) includes \(b[?a]\), the system will see that \(?a\) depends on itself, the occurs check will prevent this circularity. The assignment \(?a \mapsto t\), which would cause \(b\) to show up in the conclusion, isn’t allowed.

To apply \((\exists L)\) or \((\forall R)\) we choose a fresh parameter \(b\) that depends on all meta-variables in the goal, say \(?a_1, \dots, ?a_n\). The emphasize the dependency we write \(b[?a_1, \dots, ?a_n]\).

For comparison here is an example of a valid sequent:

\(\displaystyle \forall x. R(x, x) \vdash \forall x. \exists y. R(x, y)\)

The derivation tree looks like this:

\(\displaystyle {\cfrac {\cfrac {\cfrac {R(?c, ?c) \vdash R(a, ?b)}{\forall x. R(x, x) \vdash R(a, ?b)} \quad (\forall L)}{\forall x. R(x, x) \vdash \exists y. R(a, y)} \quad (\exists R)}{\forall x. R(x, x) \vdash \forall x. \exists y. R(x, y)} \quad (\forall R)}\)

Here the parameter \(a\) does not depend on meta-variables \(?b\) and \(?c\) since these appear above it.
By assigning \(?c \mapsto a\) and \(?b \mapsto a\), we get a valid proof:

\(\displaystyle {\cfrac {\cfrac {\cfrac {R(a, a) \vdash R(a, a)}{\forall x. R(x, x) \vdash R(a, a)} \quad (\forall L)}{\forall x. R(x, x) \vdash \exists y. R(a, y)} \quad (\exists R)}{\forall x. R(x, x) \vdash \forall x. \exists y. R(x, y)} \quad (\forall R)}\)

No Second-Order Dependence Link to heading

Let’s start with an example:

\(\displaystyle {\cfrac {\cfrac {\vdash P(?a, b[?a]) \quad \cfrac {\vdash Q(b[?a], c)}{\vdash \forall y. Q(b[?a], y)} \quad (\forall R)}{\vdash P(?a, b[?a]) \land (\forall y. Q(b[?a], y))} \quad (\land R)}{\vdash \forall x. P(?a, x) \land (\forall y. Q(x, y))} \quad (\forall R)}\)

Even though \(b[?a]\) depends on \(?a\), it doesn’t mean that \(?a\) should directly appear in the proof. In the derivation tree above, the formula \(\forall y.\, Q(b[?a], y)\) shows that \(b\) depends on \(?a\), but we don’t see \(?a\) in the formula itself. The dependency is there, but it’s just not always explicit.

We can do \(?a \mapsto c\) without breaking any rules. As long as \(c\) isn’t present the conclusion we shouldn’t make \(c\) depend on \(?a\).

Summary Link to heading

When using these rules, choose a new parameter that hasn’t been used anywhere else in the proof. This new parameter should depend on all the meta-variables in the conclusion.
We can only replace \(?a\) with \(t\) if \(t\) doesn’t depend on \(?a\). This way, every parameter \(b\) in \(t\) also doesn’t depend on \(?a\), which keeps the rule’s conditions intact.
To replace \(?a\) with \(t\), change every instance of \(?a\) to \(t\) in the proof. Also, change any parameter that depends on \(?a\) to depend on the meta-variables in \(t\).

Even though we don’t need to separate meta-variables from parameters logically, doing so helps us and Folderol know which variables can be substituted.

Alright, now that we have a solid foundation in the next post we’ll start translating these theoretical concepts into implementation. Stay tuned!