Quelle AB.thy

Sprache: Isabelle

(*<*)theory AB imports Main begin(*>*)

section‹Case Study: A Context Free Grammar›

text‹\label{sec:CFG}
index{grammars!defining inductively|(}%
are nothing but shorthands for inductive definitions of nonterminals
represent sets of strings. For example, the production
A \to B c$ is short for
[ w \in B \Longrightarrow wc \in A \]
section demonstrates this idea with an example
to Hopcroft and Ullman, a grammar for generating all words with an
number of $a$'s and~$b$'s:
begin{eqnarray}
&\to& \epsilon \mid b A \mid a B \nonumber\\
&\to& a S \mid b A A \nonumber\\
&\to& b S \mid a B B \nonumber
end{eqnarray}
the end we say a few words about the relationship between
original proof cite‹‹p.\ts81› in HopcroftUllman› and our formal version.

start by fixing the alphabet, which consists only of term‹a›'s
~term‹b›'s:
›

datatype alfa = a | b

text‹\noindent
convenience we include the following easy lemmas as simplification rules:
›

lemma [simp]: "(x ≠ a) = (x = b) ∧ (x ≠ b) = (x = a)"
by (case_tac x, auto)

text‹\noindent
over this alphabet are of type 🍋‹alfa list›, and
three nonterminals are declared as sets of such words.
productions above are recast as a \emph{mutual} inductive
\index{inductive definition!simultaneous}
term‹S›, term‹A› and~term‹B›:
›

inductive_set
  S :: "alfa list set" and
  A :: "alfa list set" and
  B :: "alfa list set"
where
  "[] ∈ S"
| "w ∈ A ==> b#w ∈ S"
| "w ∈ B ==> a#w ∈ S"

| "w ∈ S ==> a#w ∈ A"
| "[ v∈A; w∈A ] ==> b#v@w ∈ A"

| "w ∈ S ==> b#w ∈ B"
| "[ v ∈ B; w ∈ B ] ==> a#v@w ∈ B"

text‹\noindent
we show that all words in term‹S› contain the same number of term‹a›'s and term‹b›'s. Since the definition of term‹S› is by mutual
, so is the proof: we show at the same time that all words in
term‹A› contain one more term‹a› than term‹b› and all words in term‹B› contain one more term‹b› than term‹a›.
›

lemma correctness:
  "(w ∈ S ⟶ size[x←w. x=a] = size[x←w. x=b]) ∧
   (w ∈ A ⟶ size[x←w. x=a] = size[x←w. x=b] + 1) ∧
   (w ∈ B ⟶ size[x←w. x=b] = size[x←w. x=a] + 1)"

txt‹\noindent
propositions are expressed with the help of the predefined term‹filter› function on lists, which has the convenient syntax ‹[x←xs. P
]›, the list of all elements term‹x› in term‹xs› such that prop‹P x›
. Remember that on lists ‹size› and ‹length› are synonymous.

proof itself is by rule induction and afterwards automatic:
›

by (rule S_A_B.induct, auto)

text‹\noindent
may seem surprising at first, and is indeed an indication of the power
inductive definitions. But it is also quite straightforward. For example,
the production $A \to b A A$: if $v,w \in A$ and the elements of $A$
one more $a$ than~$b$'s, then $bvw$ must again contain one more $a$
~$b$'s.

usual, the correctness of syntactic descriptions is easy, but completeness
hard: does term‹S› contain \emph{all} words with an equal number of
term‹a›'s and term‹b›'s? It turns out that this proof requires the
lemma: every string with two more term‹a›'s than term‹b›'s can be cut somewhere such that each half has one more term‹a› than
term‹b›. This is best seen by imagining counting the difference between the
of term‹a›'s and term‹b›'s starting at the left end of the
. We start with 0 and end (at the right end) with 2. Since each move to the
increases or decreases the difference by 1, we must have passed through
on our way from 0 to 2. Formally, we appeal to the following discrete
value theorem @{thm[source]nat0_intermed_int_val}
{thm[display,margin=60]nat0_intermed_int_val[no_vars]}
term‹f› is of type 🍋‹nat ==> int›, 🍋‹int› are the integers,
‹∣.∣› is the absolute value function\footnote{See
~\ref{tab:ascii} in the Appendix for the correct \textsc{ascii}
.}, and term‹1::int› is the integer 1 (see \S\ref{sec:numbers}).

we show that our specific function, the difference between the
of term‹a›'s and term‹b›'s, does indeed only change by 1 in every
to the right. At this point we also start generalizing from term‹a›'s
term‹b›'s to an arbitrary property term‹P›. Otherwise we would have
prove the desired lemma twice, once as stated above and once with the
of term‹a›'s and term‹b›'s interchanged.
›

lemma step1: "∀i < size w.
  ∣(int(size[x←take (i+1) w. P x])-int(size[x←take (i+1) w. ¬P x]))
   - (int(size[x←take i w. P x])-int(size[x←take i w. ¬P x]))∣ ≤ 1"

txt‹\noindent
lemma is a bit hard to read because of the coercion function
‹int :: nat ==> int›. It is required because term‹size› returns
natural number, but subtraction on type~🍋‹nat› will do the wrong thing.
term‹take› is predefined and term‹take i xs› is the prefix of
term‹i› of term‹xs›; below we also need term‹drop i xs›, which
what remains after that prefix has been dropped from term‹xs›.

proof is by induction on term‹w›, with a trivial base case, and a not
trivial induction step. Since it is essentially just arithmetic, we do not
it.
›

apply(induct_tac w)
apply(auto simp add: abs_if take_Cons split: nat.split)
done

text‹
we come to the above-mentioned lemma about cutting in half a word with two more elements of one sort than of the other sort:
›

lemma part1:
"size[x←w. P x] = size[x←w. ¬P x]+2 ==>
  ∃i≤size w. size[x←take i w. P x] = size[x←take i w. ¬P x]+1"

txt‹\noindent
is proved by ‹force› with the help of the intermediate value theorem,
appropriately and with its first premise disposed of by lemma
{thm[source]step1}:
›

apply(insert nat0_intermed_int_val[OF step1, of "P" "w" "1"])
by force

text‹\noindent

@{thm[source]part1} tells us only about the prefix term‹take i w›.
easy lemma deals with the suffix term‹drop i w›:
›

lemma part2:
  "[size[x←take i w @ drop i w. P x] =
    size[x←take i w @ drop i w. ¬P x]+2;
    size[x←take i w. P x] = size[x←take i w. ¬P x]+1]
   ==> size[x←drop i w. P x] = size[x←drop i w. ¬P x]+1"
by(simp del: append_take_drop_id)

text‹\noindent
the proof we have disabled the normally useful lemma
begin{isabelle}
{thm append_take_drop_id[no_vars]}
rulename{append_take_drop_id}
end{isabelle}
allow the simplifier to apply the following lemma instead:
{text[display]"[x∈xs@ys. P x] = [x∈xs. P x] @ [x∈ys. P x]"}

dispose of trivial cases automatically, the rules of the inductive
are declared simplification rules:
›

declare S_A_B.intros[simp]

text‹\noindent
could have been done earlier but was not necessary so far.

completeness theorem tells us that if a word has the same number of
term‹a›'s and term‹b›'s, then it is in term‹S›, and similarly
term‹A› and term‹B›:
›

theorem completeness:
  "(size[x←w. x=a] = size[x←w. x=b] ⟶ w ∈ S) ∧
   (size[x←w. x=a] = size[x←w. x=b] + 1 ⟶ w ∈ A) ∧
   (size[x←w. x=b] = size[x←w. x=a] + 1 ⟶ w ∈ B)"

txt‹\noindent
proof is by induction on term‹w›. Structural induction would fail here
, as we can see from the grammar, we need to make bigger steps than
appending a single letter at the front. Hence we induct on the length
term‹w›, using the induction rule @{thm[source]length_induct}:
›

apply(induct_tac w rule: length_induct)
apply(rename_tac w)

txt‹\noindent
‹rule› parameter tells ‹induct_tac› explicitly which induction
to use. For details see \S\ref{sec:complete-ind} below.
this case the result is that we may assume the lemma already
for all words shorter than term‹w›. Because the induction step renames
induction variable we rename it back to ‹w›.

proof continues with a case distinction on term‹w›,
whether term‹w› is empty or not.
›

apply(case_tac w)
apply(simp_all)
(*<*)apply(rename_tac x v)(*>*)

txt‹\noindent
disposes of the base case and leaves only a conjunction
two step cases to be proved:
prop‹w = a#v› and @{prop[display]"size[x∈v. x=a] = size[x∈v. x=b]+2"} then
prop‹b#v ∈ A›, and similarly for prop‹w = b#v›.
only consider the first case in detail.

breaking the conjunction up into two cases, we can apply
{thm[source]part1} to the assumption that term‹w› contains two more term‹a›'s than term‹b›'s.
›

apply(rule conjI)
apply(clarify)
apply(frule part1[of "λx. x=a", simplified])
apply(clarify)
txt‹\noindent
yields an index prop‹i ≤ length v› such that
{prop[display]"length [x←take i v . x = a] = length [x←take i v . x = b] + 1"}
the help of @{thm[source]part2} it follows that
{prop[display]"length [x←drop i v . x = a] = length [x←drop i v . x = b] + 1"}
›

apply(drule part2[of "λx. x=a", simplified])
  apply(assumption)

txt‹\noindent
it is time to decompose term‹v› in the conclusion prop‹b#v ∈ A›
term‹take i v @ drop i v›,
›

apply(rule_tac n1=i and t=v in subst[OF append_take_drop_id])

txt‹\noindent
the variables term‹n1› and term‹t› are the result of composing the
@{thm[source]subst} and @{thm[source]append_take_drop_id})
which the appropriate rule of the grammar reduces the goal
the two subgoals prop‹take i v ∈ A› and prop‹drop i v ∈ A›:
›

apply(rule S_A_B.intros)

txt‹
subgoals follow from the induction hypothesis because both term‹take i
› and term‹drop i v› are shorter than term‹w›:
›

  apply(force simp add: min_less_iff_disj)
apply(force split: nat_diff_split)

txt‹
case prop‹w = b#v› is proved analogously:
›

apply(clarify)
apply(frule part1[of "λx. x=b", simplified])
apply(clarify)
apply(drule part2[of "λx. x=b", simplified])
apply(assumption)
apply(rule_tac n1=i and t=v in subst[OF append_take_drop_id])
apply(rule S_A_B.intros)
apply(force simp add: min_less_iff_disj)
by(force simp add: min_less_iff_disj split: nat_diff_split)

text‹
conclude this section with a comparison of our proof with
\index{Hopcroft, J. E.} and Ullman's\index{Ullman, J. D.}
cite‹‹p.\ts81› in HopcroftUllman›.
a start, the textbook
, for no good reason, excludes the empty word, thus complicating
just a little bit: they have 8 instead of our 7 productions.

importantly, the proof itself is different: rather than
the two directions, they perform one induction on the
of a word. This deprives them of the beauty of rule induction,
in the easy direction (correctness) their reasoning is more
than our ‹auto›. For the hard part (completeness), they
just one of the cases that our ‹simp_all› disposes of
. Then they conclude the proof by saying about the
cases: ``We do this in a manner similar to our method of
for part (1); this part is left to the reader''. But this is
the part that requires the intermediate value theorem and
is not at all similar to the other cases (which are automatic in
). The authors are at least cavalier about this point and may
have overlooked the slight difficulty lurking in the omitted
. Such errors are found in many pen-and-paper proofs when they
scrutinized formally.%
index{grammars!defining inductively|)}
›

(*<*)end(*>*)

Messung V0.5 in Prozent

¤ Dauer der Verarbeitung: 0.12 Sekunden (vorverarbeitet am 2026-06-10) ¤

Wurzel

Suchen

Beweissystem der NASA

Beweissystem Isabelle

NIST Cobol Testsuite

Cephes Mathematical Library

Wiener Entwicklungsmethode

Haftungshinweis

Die Informationen auf dieser Webseite wurden nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit, noch Qualität der bereit gestellten Informationen zugesichert.

Bemerkung:

Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.