(* Title: An Isabelle/HOL Formalization of the Textbook Proof of Huffman's Algorithm Author:JasminChristianBlanchette<blanchetteatin.tum.de>,2008 Maintainer:JasminChristianBlanchette<blanchetteatin.tum.de>
*)
(*<*) theory Huffman imports Main begin (*>*)
section‹Introduction›
subsection‹Binary Codes \label{binary-codes}›
text‹
we want to encode strings over a finite source alphabet to sequences
bits. The approach used by ASCII and most other charsets is to map each
symbol to a distinct $k$-bit code word, where $k$ is fixed and is
8 or 16. To encode a string of symbols, we simply encode each symbol
turn. Decoding involves mapping each $k$-bit block back to the symbol it
.
-length codes are simple and fast, but they generally waste space. If
know the frequency $w_a$ of each source symbol $a$, we can save space
using shorter code words for the most frequent symbols. We
that a (variable-length) code is {\sl optimum\/} if it minimizes the sum \sum_a w_a \vthinspace\delta_a$, where $\delta_a$ is the length of the binary
word for $a$. Information theory tells us that a code is optimum if
each source symbol $c$ the code word representing $c$ has length
$\textstyle\delta_c = \log_2 {1 \over p_c}, \qquad \hbox{where}\enskip p_c = {w_c \over\sum_a w_a}.$$
number is generally not an integer, so we cannot use it directly.
, the above criterion is a useful yardstick and paves the way for
coding \cite{rissanen-1976}, a generalization of the method
here.
def\xabacabad{\xa\xb\xa\xc\xa\xb\xa\xd}%
an example, consider the source string `$\xabacabad$'. We have
$p_{\xa} = \tfrac{1}{2},\,\; p_{\xb} = \tfrac{1}{4},\,\;
p_{\xc} = \tfrac{1}{8},\,\; p_{\xd} = \tfrac{1}{8}.$$
optimum lengths for the binary code words are all integers, namely
$\delta_{\xa} = 1,\,\;\delta_{\xb} = 2,\,\;\delta_{\xc} = 3,\,\; \delta_{\xd} = 3,$$
they are realized by the code
$C_1 = \{\xa\mapsto 0,\,\xb\mapsto 10,\,\xc\mapsto 110,\, \xd\mapsto 111 \}.$$
`$\xabacabad$' produces the 14-bit code word 01001100100111. The code
C_1$ is optimum: No code that unambiguously encodes source symbols one at a
could do better than $C_1$ on the input `$\xa\xb\xa\xc\xa\xb\xa\xd$'. In
, with a fixed-length code such as
$C_2 = \{\xa\mapsto 00,\,\xb\mapsto 01,\,\xc\mapsto 10,\, \xd\mapsto 11 \}$$
need at least 16~bits to encode `$\xabacabad$'. ›
subsection‹Binary Trees›
text‹
a program, binary codes can be represented by binary trees. For example,
trees\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-unbalanced.pdf}}} \qquad\hbox{and} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-balanced.pdf}}}$$
to $C_1$ and $C_2$. The code word for a given
can be obtained as follows: Start at the root and descend toward the leaf
associated with the symbol one node at a time; generate a 0 whenever the
child of the current node is chosen and a 1 whenever the right child is
. The generated sequence of 0s and 1s is the code word.
avoid ambiguities, we require that only leaf nodes are labeled with symbols.
ensures that no code word is a prefix of another, thereby eliminating the
of all ambiguities.%
footnote{Strictly speaking, there is another potential source of ambiguity.
the alphabet consists of a single symbol $a$, that symbol could be mapped
the empty code word, and then any string $aa\ldots a$ would map to the
bit sequence, giving the decoder no way to recover the original string's
. This scenario can be ruled out by requiring that the alphabet has
2 or more.}
that have this property are called {\sl prefix codes}. As an example of a
that does not have this property, consider the code associated with the \strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-non-prefix.pdf}}}$$
observe that `$\xb\xb\xb$', `$\xb\xd$', and `$\xd\xb$' all map to the
word 111.
node in a code tree is assigned a {\sl weight}. For a leaf node, the
is the frequency of its symbol; for an inner node, it is the sum of the
of its subtrees. Code trees can be annotated with their weights:\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]%
{tree-abcd-unbalanced-weighted.pdf}}} \qquad\qquad \vcenter{\hbox{\includegraphics[scale=1.25]%
{tree-abcd-balanced-weighted.pdf}}}$$
our purposes, it is sufficient to consider only full binary trees (trees
inner nodes all have two children). This is because any inner node with
one child can advantageously be eliminated; for example,\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-non-full.pdf}}} \qquad\hbox{becomes} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-full.pdf}}}$$ ›
subsection‹Huffman's Algorithm›
text‹
Huffman \cite{huffman-1952} discovered a simple algorithm for
an optimum code tree for specified symbol frequencies:
a forest consisting of only leaf nodes, one for each symbol in the
, taking the given symbol frequencies as initial weights for the nodes.
pick the two trees
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}} \qquad\hbox{and} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$
noindent\strut
the lowest weights and replace them with the tree
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2.pdf}}}$$
this process until only one tree is left.
an illustration, executing the algorithm for the frequencies
$f_{\xd} = 3,\,\; f_{\xe} = 11,\,\; f_{\xf} = 5,\,\; f_{\xs} = 7,\,\;
f_{\xz} = 2$$
rise to the following sequence of states:\strut
smallskip
noindent
~(5) is an optimum tree for the given frequencies. ›
subsection‹The Textbook Proof›
text‹
does the algorithm work? In his article, Huffman gave some motivation but
real proof. For a proof sketch, we turn to Donald Knuth
cite[p.~403--404]{knuth-1997}:
begin{quote}
is not hard to prove that this method does in fact minimize the weighted
length (i.e., $\sum_a w_a \vthinspace\delta_a$), by induction on $m$.
we have $w_1 \le w_2 \le w_3 \le\cdots\le w_m$, where $m \ge 2$, and
that we are given a tree that minimizes the weighted path length.
Such a tree certainly exists, since only finitely many binary trees with $m$
nodes are possible.) Let $V$ be an internal node of maximum distance
the root. If $w_1$ and $w_2$ are not the weights already attached to the
of $V$, we can interchange them with the values that are already
; such an interchange does not increase the weighted path length. Thus
is a tree that minimizes the weighted path length and contains the \strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$
it is easy to prove that the weighted path length of such a tree is
if and only if the tree with
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}} \qquad\hbox{replaced by} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$
minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$.
end{quote}
noindent
is, however, a small oddity in this proof: It is not clear why we must
the existence of an optimum tree that contains the subtree
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$
, the formalization works without it.
et al.\ \cite[p.~385--391]{cormen-et-al-2001} provide a very similar
, articulated around the following propositions:
begin{quote}
textsl{\textbf{Lemma 16.2}} \\
$C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$.
$x$ and $y$ be two characters in $C$ having the lowest frequencies. Then
exists an optimal prefix code for $C$ in which the codewords for $x$ and
y$ have the same length and differ only in the last bit.
medskip
textsl{\textbf{Lemma 16.3}} \\
$C$ be a given alphabet with frequency $f[c]$ defined for each character
c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let
C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character
z$ added, so that $C' = C - \{x, y\}\cup {\{z\}}$; define $f$ for $C'$ as for
C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing an
prefix code for the alphabet $C'$. Then the tree $T$, obtained from
T'$ by replacing the leaf node for $z$ with an internal node having $x$ and
y$ as children, represents an optimal prefix code for the alphabet $C$.
text‹
document presents a formalization of the proof of Huffman's algorithm
using Isabelle/HOL \cite{nipkow-et-al-2008}. Our proof is based on the
proofs given by Knuth and Cormen et al. The development was done
of Laurent Th\'ery's Coq proof \cite{thery-2003,thery-2004},
through its ``cover'' concept represents a considerable departure from
textbook proof.
development consists of a little under 100 lemmas and theorems. Most of
have very short proofs thanks to the extensive use of simplification
and custom induction rules. The remaining proofs are written using the
proof format Isar \cite{wenzel-2008}. ›
section‹Definition of Prefix Code Trees and Forests \label{trees-and-forests}›
subsection‹Tree Type›
text‹
{\sl prefix code tree\/} is a full binary tree in which leaf nodes are of the
@{term "Leaf w a"}, where @{term a} is a symbol and @{term w} is the
associated with @{term a}, and inner nodes are of the form
{term "Node w t1 t2"}, where @{term t1} and @{term t2} are the left and
subtrees and @{term w} caches the sum of the weights of @{term t1} and
{term t2}. Prefix code trees are polymorphic on the symbol datatype~@{typ 'a}. ›
datatype 'a tree =
Leaf nat 'a
| Node nat "('a tree)""('a tree)"
subsection‹Forest Type›
text‹
intermediate steps of Huffman's algorithm involve a list of prefix code
, or {\sl prefix code forest}. ›
type_synonym 'a forest = "'a tree list"
subsection‹Alphabet›
text‹
{\sl alphabet\/} of a code tree is the set of symbols appearing in the
's leaf nodes. ›
primrec alphabet :: "'a tree ==> 'a set"where "alphabet (Leaf w a) = {a}" | "alphabet (Node w t1 t2) = alphabet t1∪ alphabet t2"
text‹
sets and predicates, Isabelle gives us the choice between inductive
(\isakeyword{inductive\_set} and \isakeyword{inductive}) and
functions (\isakeyword{primrec}, \isakeyword{fun}, and
isakeyword{function}). In this development, we consistently favor recursion
induction, for two reasons:
begin{myitemize}
item Recursion gives rise to simplification rules that greatly help automatic
tactics. In contrast, reasoning about inductively defined sets and
involves introduction and elimination rules, which are more clumsy
simplification rules.
item Isabelle's counterexample generator \isakeyword{quickcheck}
cite{berghofer-nipkow-2004}, which we used extensively during the top-down
of the proof (together with \isakeyword{sorry}), has better support
recursive definitions.
end{myitemize}
alphabet of a forest is defined as the union of the alphabets of the trees
compose it. Although Isabelle supports overloading for non-overlapping
, we avoid many type inference problems by attaching an \raise.3ex\hbox{‹F›}' subscript to the forest generalizations of
defined on trees. ›
text‹
are central to our proofs, and we need the following basic facts
them. ›
lemma finite_alphabet[simp]: "finite (alphabet t)" by (induct t) auto
lemma exists_in_alphabet: "∃a. a ∈ alphabet t" by (induct t) auto
subsection‹Consistency›
text‹
tree is {\sl consistent\/} if for each inner node the alphabets of the two
are disjoint. Intuitively, this means that every symbol in the
occurs in exactly one leaf node. Consistency is a sufficient condition
$\delta_a$ (the length of the {\sl unique\/} code word for $a$) to be
. Although this well\-formed\-ness property is not mentioned in algorithms \cite{aho-et-al-1983,cormen-et-al-2001,knuth-1997}, it is essential
appears as an assumption in many of our lemmas. ›
primrec consistent :: "'a tree ==> bool"where "consistent (Leaf w a) = True" | "consistent (Node w t1 t2) = (alphabet t1∩ alphabet t2 = {} ∧ consistent t1∧ consistent t2)"
text‹
of our proofs are by structural induction on consistent trees $t$ and
one symbol $a$. These proofs typically distinguish the following cases.
begin{myitemize}
item[] {\sc Base case:}\enspace $t = @{term "Leaf w b"}$.
item[] {\sc Induction step:}\enspace $t = @{term "Node w t1 t2"}$.
item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to
@{term t1} but not to @{term t2}.
item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to
@{term t2} but not to @{term t1}.
item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to
neither @{term t1} nor @{term t2}.
end{myitemize}
noindent
to the consistency assumption, we can rule out the subcase where $a$
to both subtrees.
of performing the above case distinction manually, we encode it in a
induction rule. This saves us from writing repetitive proof scripts and
Isabelle's automatic proof tactics. ›
lemma tree_induct_consistent[consumes 1, case_names base step1 step2 step3]: "[consistent t; ∧wb b a. P (Leaf wb b) a; ∧w t1 t2 a. [consistent t1; consistent t2; alphabet t1∩ alphabet t2 = {}; a ∈ alphabet t1; a ∉ alphabet t2; P t1 a; P t2 a]==> P (Node w t1 t2) a; ∧w t1 t2 a. [consistent t1; consistent t2; alphabet t1∩ alphabet t2 = {}; a ∉ alphabet t1; a ∈ alphabet t2; P t1 a; P t2 a]==> P (Node w t1 t2) a; ∧w t1 t2 a. [consistent t1; consistent t2; alphabet t1∩ alphabet t2 = {}; a ∉ alphabet t1; a ∉ alphabet t2; P t1 a; P t2 a]==> P (Node w t1 t2) a]==> P t a"
txt‹
proof relies on the \textit{induction\_schema} and
textit{lexicographic\_order} tactics, which automate the most tedious
of deriving induction rules. The alternative would have been to perform
standard structural induction on @{term t} and proceed by cases, which is
but long-winded. ›
text‹ \textit{induction\_schema} tactic reduces the putative induction rule to
proof obligations.
, it reuses the machinery that constructs the default induction
. The resulting proof obligations concern (a)~case completeness,
b)~invariant preservation (in our case, tree consistency), and
c)~wellfoundedness. For @{thm [source] tree_induct_consistent}, the obligations
a)~and (b) can be discharged using
's simplifier and classical reasoner, whereas (c) requires a single
of \textit{lexicographic\_order}, a tactic that was originally
to prove termination of recursive functions
cite{bulwahn-et-al-2007,krauss-2007,krauss-2009}. ›
subsection‹Symbol Depths›
text‹
{\sl depth\/} of a symbol (which we denoted by $\delta_a$ in
~\ref{binary-codes}) is the length of the path from the root to the
node labeled with that symbol, or equivalently the length of the code word
the symbol. Symbols that do not occur in the tree or that occur at the root
a one-node tree have depth 0. If a symbol occurs in several leaf nodes (which
happen with inconsistent trees), the depth is arbitrarily defined in terms
the leftmost node labeled with that symbol. ›
primrec depth :: "'a tree ==> 'a ==> nat"where "depth (Leaf w b) a = 0" | "depth (Node w t1 t2) a = (if a ∈ alphabet t1 then depth t1 a + 1 else if a ∈ alphabet t2 then depth t2 a + 1 else 0)"
text‹
definition may seem very inefficient from a functional programming
of view, but it does not matter, because unlike Huffman's algorithm, the
{const depth} function is merely a reasoning tool and is never actually
. ›
subsection‹Height›
text‹
{\sl height\/} of a tree is the length of the longest path from the root to
leaf node, or equivalently the length of the longest code word. This is
generalized to forests by taking the maximum of the trees' heights. Note
a tree has height 0 if and only if it is a leaf node, and that a forest has
0 if and only if all its trees are leaf nodes. ›
primrec height :: "'a tree ==> nat"where "height (Leaf w a) = 0" | "height (Node w t1 t2) = max (height t1) (height t2) + 1"
text‹
depth of any symbol in the tree is bounded by the tree's height, and there
a symbol with a depth equal to the height. ›
lemma depth_le_height: "depth t a ≤ height t" by (induct t) auto
lemma exists_at_height: "consistent t ==>∃a ∈ alphabet t. depth t a = height t" proof (induct t) case Leaf thus ?caseby simp next case (Node w t1 t2) note hyps = Node let ?t = "Node w t1 t2" from hyps obtain b where b: "b ∈ alphabet t1""depth t1 b = height t1"by auto from hyps obtain c where c: "c ∈ alphabet t2""depth t2 c = height t2"by auto let ?a = "if height t1≥ height t2 then b else c" from b c have"?a ∈ alphabet ?t""depth ?t ?a = height ?t" using‹consistent ?t›by auto thus"∃a ∈ alphabet ?t. depth ?t a = height ?t" .. qed
text‹
following elimination rules help Isabelle's classical prover, notably the
textit{auto} tactic. They are easy consequences of the inequation
{thm depth_le_height[no_vars]}. ›
lemma depth_max_heightE_left[elim!]: "[depth t1 a = max (height t1) (height t2); [depth t1 a = height t1; height t1≥ height t2]==> P]==> P" by (cut_tac t = t1and a = a in depth_le_height) simp
lemma depth_max_heightE_right[elim!]: "[depth t2 a = max (height t1) (height t2); [depth t2 a = height t2; height t2≥ height t1]==> P]==> P" by (cut_tac t = t2and a = a in depth_le_height) simp
text‹
also need the following lemma. ›
lemma height_gt_0_alphabet_eq_imp_height_gt_0: assumes"height t > 0""consistent t""alphabet t = alphabet u" shows"height u > 0" proof (cases t) case Leaf thus ?thesis using assms by simp next case (Node w t1 t2) note t = Node from exists_in_alphabet obtain b where b: "b ∈ alphabet t1" .. from exists_in_alphabet obtain c where c: "c ∈ alphabet t2" .. from b c have bc: "b ≠ c"using t ‹consistent t›by fastforce show ?thesis proof (cases u) case Leaf thus ?thesis using b c bc t assms by auto next case Node thus ?thesis by simp qed qed
subsection‹Symbol Frequencies›
text‹
{\sl frequency\/} of a symbol (which we denoted by $w_a$ in
~\ref{binary-codes}) is the sum of the weights attached to the
nodes labeled with that symbol. If the tree is consistent, the sum
at most one nonzero term. The frequency is then the weight of the leaf
labeled with the symbol, or 0 if there is no such node. The generalization
forests is straightforward. ›
primrec freq :: "'a tree ==> 'a ==> nat"where "freq (Leaf w a) b = (if b = a then w else 0)" | "freq (Node w t1 t2) b = freq t1 b + freq t2 b"
primrec freqF :: "'a forest ==> 'a ==> nat"where "freqF [] b = 0" | "freqF (t # ts) b = freq t b + freqF ts b"
text‹
and symbol frequencies are intimately related. Simplification rules
that sums of the form @{term "freq t1 a + freq t2 a"} collapse to a
term when we know which tree @{term a} belongs to. ›
lemma notin_alphabet_imp_freq_0[simp]: "a ∉ alphabet t ==> freq t a = 0" by (induct t) simp+
lemma notin_alphabetF_imp_freqF_0[simp]: "a ∉ alphabetF ts ==> freqF ts a = 0" by (induct ts) simp+
lemma freq_0_right[simp]: "[alphabet t1∩ alphabet t2 = {}; a ∈ alphabet t1]==> freq t2 a = 0" by (auto intro: notin_alphabet_imp_freq_0 simp: disjoint_iff_not_equal)
lemma freq_0_left[simp]: "[alphabet t1∩ alphabet t2 = {}; a ∈ alphabet t2]==> freq t1 a = 0" by (auto simp: disjoint_iff_not_equal)
text‹
trees are {\em comparable} if they have the same alphabet and symbol
. This is an important concept, because it allows us to state not
that the tree constructed by Huffman's algorithm is optimal but also that
has the expected alphabet and frequencies.
close this section with a more technical lemma. ›
lemma heightF_0_imp_Leaf_freqF_in_set: "[consistentF ts; heightF ts = 0; a ∈ alphabetF ts]==> Leaf (freqF ts a) a ∈ set ts" proof (induct ts) case Nil thus ?caseby simp next case (Cons t ts) show ?caseusing Cons by (cases t) auto qed
subsection‹Weight›
text‹
@{term weight} function returns the weight of a tree. In the
{const Node} case, we ignore the weight cached in the node and instead
the tree's weight recursively. This makes reasoning simpler because we
then avoid specifying cache correctness as an assumption in our lemmas. ›
primrec weight :: "'a tree ==> nat"where "weight (Leaf w a) = w" | "weight (Node w t1 t2) = weight t1 + weight t2"
text‹
weight of a tree is the sum of the frequencies of its symbols.
myskip
noindent
isacommand{lemma} ‹weight_eq_Sum_freq›: \\ \isachardoublequoteopen}$\displaystyle‹consistent t ==> weight t› =
!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\! @{term "freq t a"}$% \isachardoublequoteclose}
vskip-\myskipamount ›
(*<*) lemma weight_eq_Sum_freq: "consistent t ==> weight t = (∑a ∈ alphabet t. freq t a)" (*>*) by (induct t) (auto simp: sum.union_disjoint)
text‹
assumption @{term "consistent t"} is not necessary, but it simplifies the
by letting us invoke the lemma @{thm [source] sum.union_disjoint}:
$‹[finite A; finite B; A ∩ B = {}]==>›~\!\sum_{x\in A} @{term "g x"}
vthinspace \mathrel{+} \sum_{x\in B} @{term "g x"}\vthinspace = % \!\!\sum_{x\in A \cup B}\! @{term "g x"}.$$ ›
subsection‹Cost›
text‹
{\sl cost\/} of a consistent tree, sometimes called the {\sl weighted path
}, is given by the sum $\sum_{a \in @{term "alphabet t"}\,}
{term "freq t a"} \mathbin{‹*›} @{term "depth t a"}$
which we denoted by $\sum_a w_a \vthinspace\delta_a$ in
~\ref{binary-codes}). It obeys a simple recursive law. ›
primrec cost :: "'a tree ==> nat"where "cost (Leaf w a) = 0" | "cost (Node w t1 t2) = weight t1 + cost t1 + weight t2 + cost t2"
text‹
interpretation of this recursive law is that the cost of a tree is the sum
the weights of its inner nodes \cite[p.~405]{knuth-1997}. (Recall that
@{term "weight (Node w t1 t2)"} = @{term "weight t1 + weight t2"}$.) Since
cost of a tree is such a fundamental concept, it seems necessary to prove
the above function definition is correct.
myskip
noindent
isacommand{theorem} ‹cost_eq_Sum_freq_mult_depth›: \\ \isachardoublequoteopen}$\displaystyle‹consistent t ==> cost t› =
!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\!
{term "freq t a * depth t a"}$% \isachardoublequoteclose}
myskip
noindent
proof is by structural induction on $t$. If $t = @{term "Leaf w b"}$, both
of the equation simplify to 0. This leaves the case $@{term t} =
{term "Node w t1 t2"}$. Let $A$, $A_1$, and $A_2$ stand for
{term "alphabet t"}, @{term "alphabet t1"}, and @{term "alphabet t2"},
. We have
$\begin{tabularx}{\textwidth}{@% \hskip\leftmargin}cX@%
}}
& @{term "cost t"} \\
eq & \justif{definition of @{const cost}} \\
& $@{term "weight t1 + cost t1 + weight t2 + cost t2"}$ \\
eq & \justif{induction hypothesis} \\
& $@{term "weight t1"} \mathrel{+} \sum_{a\in A_1\,} @{term "freq t1 a * depth t1 a"} \mathrel{+} {}$ \\
& $@{term "weight t2"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t2 a * depth t2 a"}$ \\
eq & \justif{definition of @{const depth}, consistency} \\[\extrah]
& $@{term "weight t1"} \mathrel{+} \sum_{a\in A_1\,} @{term "freq t1 a * (depth t a - 1)"} \mathrel{+}
{}$ \\
& $@{term "weight t2"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t2 a * (depth t a - 1)"}$ \\[\extrah]
eq & \justif{distributivity of ‹*› and $\sum$ over $-$} \\[\extrah]
& $@{term "weight t1"} \mathrel{+} \sum_{a\in A_1\,} @{term "freq t1 a * depth t a"} \mathrel{-} \sum_{a\in A_1\,} @{term "freq t1 a"} \mathrel{+} {}$ \\
& $@{term "weight t2"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t2 a * depth t a"} \mathrel{-} \sum_{a\in A_2\,} @{term "freq t2 a"}$ \\[\extrah]
eq & \justif{@{thm [source] weight_eq_Sum_freq}} \\[\extrah]
& $\sum_{a\in A_1\,} @{term "freq t1 a * depth t a"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t2 a * depth t a"}$ \\[\extrah]
eq & \justif{definition of @{const freq}, consistency} \\[\extrah]
& $\sum_{a\in A_1\,} @{term "freq t a * depth t a"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t a * depth t a"}$ \\[\extrah]
eq & \justif{@{thm [source] sum.union_disjoint}, consistency} \\
& $\sum_{a\in A_1\cup A_2\,} @{term "freq t a * depth t a"}$ \\
eq & \justif{definition of @{const alphabet}} \\
& $\sum_{a\in A\,} @{term "freq t a * depth t a"}$.
end{tabularx}$$
noindent
structured proof closely follows this argument. ›
(*<*) theorem cost_eq_Sum_freq_mult_depth: "consistent t ==> cost t = (∑a ∈ alphabet t. freq t a * depth t a)" (*>*) proof (induct t) case Leaf thus ?caseby simp next case (Node w t1 t2) let ?t = "Node w t1 t2" let ?A = "alphabet ?t"and ?A1 = "alphabet t1"and ?A2 = "alphabet t2" note c = ‹consistent ?t› note hyps = Node have d2: "∧a. [?A1∩ ?A2 = {}; a ∈ ?A2]==> depth ?t a = depth t2 a + 1" by auto have"cost ?t = weight t1 + cost t1 + weight t2 + cost t2"by simp alsohave"… = weight t1 + (∑a ∈ ?A1. freq t1 a * depth t1 a) + weight t2 + (∑a ∈ ?A2. freq t2 a * depth t2 a)" using hyps by simp alsohave"… = weight t1 + (∑a ∈ ?A1. freq t1 a * (depth ?t a - 1)) + weight t2 + (∑a ∈ ?A2. freq t2 a * (depth ?t a - 1))" using c d2by simp alsohave"… = weight t1 + (∑a ∈ ?A1. freq t1 a * depth ?t a) - (∑a ∈ ?A1. freq t1 a) + weight t2 + (∑a ∈ ?A2. freq t2 a * depth ?t a) - (∑a ∈ ?A2. freq t2 a)" using c d2by (simp add: sum.distrib) alsohave"… = (∑a ∈ ?A1. freq t1 a * depth ?t a) + (∑a ∈ ?A2. freq t2 a * depth ?t a)" using c by (simp add: weight_eq_Sum_freq) alsohave"… = (∑a ∈ ?A1. freq ?t a * depth ?t a) + (∑a ∈ ?A2. freq ?t a * depth ?t a)" using c by auto alsohave"… = (∑a ∈ ?A1∪ ?A2. freq ?t a * depth ?t a)" using c by (simp add: sum.union_disjoint) alsohave"… = (∑a ∈ ?A. freq ?t a * depth ?t a)"by simp finallyshow ?case . qed
text‹
, it should come as no surprise that trees with height 0 have cost 0. ›
lemma height_0_imp_cost_0[simp]: "height t = 0 ==> cost t = 0" by (case_tac t) simp+
subsection‹Optimality›
text‹
tree is optimum if and only if its cost is not greater than that of any
tree. We can ignore inconsistent trees without loss of generality. ›
definition optimum :: "'a tree ==> bool"where "optimum t = (∀u. consistent u ⟶ alphabet t = alphabet u ⟶ freq t = freq u ⟶ cost t ≤ cost u)"
section‹Functional Implementation of Huffman's Algorithm \label{implementation}›
subsection‹Cached Weight›
text‹
{\sl cached weight\/} of a node is the weight stored directly in the node.
arguments rely on the computed weight (embodied by the @{const weight}
) rather than the cached weight, but the implementation of Huffman's
uses the cached weight for performance reasons. ›
primrec cachedWeight :: "'a tree ==> nat"where "cachedWeight (Leaf w a) = w" | "cachedWeight (Node w t1 t2) = w"
text‹
cached weight of a leaf node is identical to its computed weight. ›
lemma height_0_imp_cachedWeight_eq_weight[simp]: "height t = 0 ==> cachedWeight t = weight t" by (case_tac t) simp+
subsection‹Tree Union›
text‹
implementation of Huffman's algorithm builds on two additional auxiliary
. The first one, ‹uniteTrees›, takes two trees
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}} \qquad\hbox{and} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$
noindent
returns the tree\strut
$\includegraphics[scale=1.25]{tree-w1-w2.pdf}$$
vskip-.5\myskipamount ›
definition uniteTrees :: "'a tree ==> 'a tree ==> 'a tree"where "uniteTrees t1 t2 = Node (cachedWeight t1 + cachedWeight t2) t1 t2"
text‹
alphabet, consistency, and symbol frequencies of a united tree are easy to
to the homologous properties of the subtrees. ›
lemma freq_uniteTrees[simp]: "freq (uniteTrees t1 t2) a = freq t1 a + freq t2 a" by (simp add: uniteTrees_def)
subsection‹Ordered Tree Insertion›
text‹
auxiliary function ‹insortTree› inserts a tree into a forest sorted
cached weight, preserving the sort order. ›
primrec insortTree :: "'a tree ==> 'a forest ==> 'a forest"where "insortTree u [] = [u]" | "insortTree u (t # ts) = (if cachedWeight u ≤ cachedWeight t then u # t # ts else t # insortTree u ts)"
text‹
resulting forest contains one more tree than the original forest. Clearly,
cannot be empty. ›
lemma length_insortTree[simp]: "length (insortTree t ts) = length ts + 1" by (induct ts) simp+
lemma insortTree_ne_Nil[simp]: "insortTree t ts ≠ []" by (case_tac ts) simp+
text‹
alphabet, consistency, symbol frequencies, and height of a forest after
are easy to relate to the homologous properties of the original
and the inserted tree. ›
lemma alphabetF_insortTree[simp]: "alphabetF (insortTree t ts) = alphabet t ∪ alphabetF ts" by (induct ts) auto
lemma consistentF_insortTree[simp]: "consistentF (insortTree t ts) = consistentF (t # ts)" by (induct ts) auto
lemma freqF_insortTree[simp]: "freqF (insortTree t ts) = (λa. freq t a + freqF ts a)" by (induct ts) (simp add: ext)+
lemma heightF_insortTree[simp]: "heightF (insortTree t ts) = max (height t) (heightF ts)" by (induct ts) auto
subsection‹The Main Algorithm›
text‹
's algorithm repeatedly unites the first two trees of the forest it
as argument until a single tree is left. It should initially be
with a list of leaf nodes sorted by weight. Note that it is not defined
the empty list. ›
text‹
time complexity of the algorithm is quadratic in the size of the forest.
we eliminated the inner node's cached weight component, and instead
the weight each time it is needed, the complexity would remain
, but with a larger constant. Using a binary search in @{const
}, the corresponding imperative algorithm is $O(n \log n)$ if we keep
weight cache and $O(n^2)$ if we drop it. An $O(n)$ imperative implementation
possible by maintaining two queues, one containing the unprocessed leaf nodes
the other containing the combined trees \cite[p.~404]{knuth-1997}.
tree returned by the algorithm preserves the alphabet, consistency, and
frequencies of the original forest. ›
theorem alphabet_huffman[simp]: "ts ≠ [] ==> alphabet (huffman ts) = alphabetF ts" by (induct ts rule: huffman.induct) auto
theorem freq_huffman[simp]: "ts ≠ [] ==> freq (huffman ts) a = freqF ts a" by (induct ts rule: huffman.induct) auto
section‹Definition of Auxiliary Functions Used in the Proof \label{auxiliary}›
subsection‹Sibling of a Symbol›
text‹
{\sl sibling\/} of a symbol $a$ in a tree $t$ is the label of the node that
the (left or right) sibling of the node labeled with $a$ in $t$. If the
$a$ is not in $t$'s alphabet or it occurs in a node with no sibling
, we define the sibling as being $a$ itself; this gives us the nice property
if $t$ is consistent, then $@{term "sibling t a"} \not= a$ if and only if
a$ has a sibling. As an illustration, we have
@{term "sibling t a"} = b$,\vthinspace{} $@{term "sibling t b"} = a$,
$@{term "sibling t c"} = c$ for the tree\strut
$t \,= \vcenter{\hbox{\includegraphics[scale=1.25]{tree-sibling.pdf}}}$$ ›
fun sibling :: "'a tree ==> 'a ==> 'a"where "sibling (Leaf wb b) a = a" | "sibling (Node w (Leaf wb b) (Leaf wc c)) a = (if a = b then c else if a = c then b else a)" | "sibling (Node w t1 t2) a = (if a ∈ alphabet t1 then sibling t1 a else if a ∈ alphabet t2 then sibling t2 a else a)"
text‹
@{const sibling} is defined using sequential pattern matching
cite{krauss-2007,krauss-2009}, reasoning about it can become tedious.
rules therefore play an important role. ›
lemma notin_alphabet_imp_sibling_id[simp]: "a ∉ alphabet t ==> sibling t a = a" by (cases rule: sibling.cases[where x = "(t, a)"]) simp+
lemma height_0_imp_sibling_id[simp]: "height t = 0 ==> sibling t a = a" by (case_tac t) simp+
lemma height_gt_0_in_alphabet_imp_sibling_left[simp]: "[height t1 > 0; a ∈ alphabet t1]==> sibling (Node w t1 t2) a = sibling t1 a" by (case_tac t1) simp+
lemma height_gt_0_in_alphabet_imp_sibling_right[simp]: "[height t2 > 0; a ∈ alphabet t1]==> sibling (Node w t1 t2) a = sibling t1 a" by (case_tac t2) simp+
lemma height_gt_0_notin_alphabet_imp_sibling_left[simp]: "[height t1 > 0; a ∉ alphabet t1]==> sibling (Node w t1 t2) a = sibling t2 a" by (case_tac t1) simp+
lemma height_gt_0_notin_alphabet_imp_sibling_right[simp]: "[height t2 > 0; a ∉ alphabet t1]==> sibling (Node w t1 t2) a = sibling t2 a" by (case_tac t2) simp+
lemma either_height_gt_0_imp_sibling[simp]: "height t1 > 0 ∨ height t2 > 0 ==> sibling (Node w t1 t2) a = (if a ∈ alphabet t1 then sibling t1 a else sibling t2 a)" by auto
text‹
following rules are also useful for reasoning about siblings and alphabets. ›
lemma in_alphabet_imp_sibling_in_alphabet: "a ∈ alphabet t ==> sibling t a ∈ alphabet t" by (induct t a rule: sibling.induct) auto
lemma sibling_ne_imp_sibling_in_alphabet: "sibling t a ≠ a ==> sibling t a ∈ alphabet t" using in_alphabet_imp_sibling_in_alphabet by force
text‹
default induction rule for @{const sibling} distinguishes four cases.
begin{myitemize}
item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.
item[] {\sc Induction step 1:}\enskip
$t = @{term "Node w (Leaf wb b) (Leaf wc c)"}$.
item[] {\sc Induction step 2:}\enskip
$t = @{term "Node w (Node w1 t11 t12) t2"}$.
item[] {\sc Induction step 3:}\enskip
$t = @{term "Node w t1 (Node w2 t21 t22)"}$.
end{myitemize}
noindent
rule leaves much to be desired. First, the last two cases overlap and
normally be handled the same way, so they should be combined. Second, the ‹Node› constructors in the last two cases reduce readability.
, under the assumption that $t$ is consistent, we would like to perform
same case distinction on $a$ as we did for
{thm [source] tree_induct_consistent}, with the same benefits for automation.
observations lead us to develop a custom induction rule that
the following cases.
begin{myitemize}
item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.
item[] {\sc Induction step 1:}\enskip
$t = @{term "Node w (Leaf wb b) (Leaf wc c)"}$ with
@{prop "b ≠ c"}.
item[] \begin{flushleft}
{\sc Induction step 2:}\enskip
$t = @{term "Node w t1 t2"}$ and either @{term t1} or @{term t2}
has nonzero height. \end{flushleft}
item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to
@{term t1} but not to @{term t2}.
item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to
@{term t2} but not to @{term t1}.
item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to
neither @{term t1} nor @{term t2}.
end{myitemize}
statement of the rule and its proof are similar to what we did for
trees, the main difference being that we now have two induction
instead of one. ›
lemma sibling_induct_consistent[consumes 1,
case_names base step1 step21 step22 step23]: "[consistent t; ∧w b a. P (Leaf w b) a; ∧w wb b wc c a. b ≠ c ==> P (Node w (Leaf wb b) (Leaf wc c)) a; ∧w t1 t2 a. [consistent t1; consistent t2; alphabet t1∩ alphabet t2 = {}; height t1 > 0 ∨ height t2 > 0; a ∈ alphabet t1; sibling t1 a ∈ alphabet t1; a ∉ alphabet t2; sibling t1 a ∉ alphabet t2; P t1 a]==> P (Node w t1 t2) a; ∧w t1 t2 a. [consistent t1; consistent t2; alphabet t1∩ alphabet t2 = {}; height t1 > 0 ∨ height t2 > 0; a ∉ alphabet t1; sibling t2 a ∉ alphabet t1; a ∈ alphabet t2; sibling t2 a ∈ alphabet t2; P t2 a]==> P (Node w t1 t2) a; ∧w t1 t2 a. [consistent t1; consistent t2; alphabet t1∩ alphabet t2 = {}; height t1 > 0 ∨ height t2 > 0; a ∉ alphabet t1; a ∉ alphabet t2]==> P (Node w t1 t2) a]==> P t a" apply rotate_tac apply induction_schema apply atomize_elim apply (case_tac t, simp) apply clarsimp apply (metis One_nat_def add_is_0 alphabet.simps(1) bot_nat_0.not_eq_extremum disjoint_iff
exists_at_height height.simps(2) in_alphabet_imp_sibling_in_alphabet nat.simps(3)
tree.exhaust) by lexicographic_order
text‹
custom induction rule allows us to prove new properties of @{const sibling}
little effort. ›
lemma sibling_sibling_id[simp]: "consistent t ==> sibling t (sibling t a) = a" by (induct t a rule: sibling_induct_consistent) simp+
lemma sibling_reciprocal: "[consistent t; sibling t a = b]==> sibling t b = a" by auto
lemma depth_height_imp_sibling_ne: "[consistent t; depth t a = height t; height t > 0; a ∈ alphabet t]==> sibling t a ≠ a" by (induct t a rule: sibling_induct_consistent) auto
lemma depth_sibling[simp]: "consistent t ==> depth t (sibling t a) = depth t a" by (induct t a rule: sibling_induct_consistent) simp+
subsection‹Leaf Interchange›
text‹ ‹swapLeaves› function takes a tree $t$ together with two symbols
a$, $b$ and their frequencies $@{term wa}$, $@{term wb}$, and returns the tree
t$ in which the leaf nodes labeled with $a$ and $b$ are exchanged. When ‹swapLeaves›, we normally pass @{term "freq t a"} and
{term "freq t b"} for @{term wa} and @{term wb}.
that we do not bother updating the cached weight of the ancestor nodes
performing the interchange. The cached weight is used only in the
of Huffman's algorithm, which does not invoke ‹swapLeaves›. ›
primrec swapLeaves :: "'a tree ==> nat ==> 'a ==> nat ==> 'a ==> 'a tree"where "swapLeaves (Leaf wc c) wa a wb b = (if c = a then Leaf wb b else if c = b then Leaf wa a else Leaf wc c)" | "swapLeaves (Node w t1 t2) wa a wb b = Node w (swapLeaves t1 wa a wb b) (swapLeaves t2 wa a wb b)"
text‹
a symbol~$a$ with itself leaves the tree $t$ unchanged if $a$ does not
to it or if the specified frequencies @{term wa} and @{term wb} equal
{term "freq t a"}. ›
lemma swapLeaves_id_when_notin_alphabet[simp]: "a ∉ alphabet t ==> swapLeaves t w a w' a = t" by (induct t) simp+
lemma swapLeaves_id[simp]: "consistent t ==> swapLeaves t (freq t a) a (freq t a) a = t" by (induct t a rule: tree_induct_consistent) simp+
text‹
alphabet, consistency, symbol depths, height, and symbol frequencies of the
@{term "swapLeaves t wa a wb b"} can be related to the homologous
of $t$. ›
lemma alphabet_swapLeaves: "alphabet (swapLeaves t wa a wb b) = (if a ∈ alphabet t then if b ∈ alphabet t then alphabet t else (alphabet t - {a}) ∪ {b} else if b ∈ alphabet t then (alphabet t - {b}) ∪ {a} else alphabet t)" by (induct t) auto
lemma consistent_swapLeaves[simp]: "consistent t ==> consistent (swapLeaves t wa a wb b)" by (induct t) (auto simp: alphabet_swapLeaves)
lemma depth_swapLeaves_neither[simp]: "[consistent t; c ≠ a; c ≠ b]==> depth (swapLeaves t wa a wb b) c = depth t c" by (induct t a rule: tree_induct_consistent) (auto simp: alphabet_swapLeaves)
lemma height_swapLeaves[simp]: "height (swapLeaves t wa a wb b) = height t" by (induct t) simp+
lemma freq_swapLeaves[simp]: "[consistent t; a ≠ b]==> freq (swapLeaves t wa a wb b) = (λc. if c = a then if b ∈ alphabet t then wa else 0 else if c = b then if a ∈ alphabet t then wb else 0 else freq t c)" apply (rule ext) apply (induct t) by auto
text‹
the lemmas concerned with the resulting tree's weight and cost, we avoid
on natural numbers by rearranging terms. For example, we write
$@{prop "weight (swapLeaves t wa a wb b) + freq t a = weight t + wb"}$$
noindent
than the more conventional
$@{prop "weight (swapLeaves t wa a wb b) = weight t + wb - freq t a"}.$$
Isabelle/HOL, these two equations are not equivalent, because by definition
m - n = 0$ if $n > m$. We could use the second equation and additionally
that @{prop "weight t ≥ freq t a"} (an easy consequence of
{thm [source] weight_eq_Sum_freq}), and then apply the \textit{arith}
, but it is much simpler to use the first equation and stay with
textit{simp} and \textit{auto}. Another option would be to use
instead of natural numbers. ›
lemma weight_swapLeaves: "[consistent t; a ≠ b]==> if a ∈ alphabet t then if b ∈ alphabet t then weight (swapLeaves t wa a wb b) + freq t a + freq t b = weight t + wa + wb else weight (swapLeaves t wa a wb b) + freq t a = weight t + wb else if b ∈ alphabet t then weight (swapLeaves t wa a wb b) + freq t b = weight t + wa else weight (swapLeaves t wa a wb b) = weight t" proof (induct t a rule: tree_induct_consistent) ― ‹{\sc Base case:}\enspace $t = @{term "Leaf w b"}$› case base thus ?case by clarsimp next ―secti‹ ―Lemma 8.1 (the actual Zigzag Lemma)› term t\^>2}🚫 case (step1 w t1 t+eps k) powr (-2 *eps k k powr (-1/2)" proof cases assume b: "b ∈ alphabet t1" hence "b ∉ alphabet t2" using step) + 2 * k pow(-1/" thus ?case using b step1 by simp next assume "b ∉ alphabet t1" thus ?case using step1 by auto qed next ― "Big42b ≡ @{term t1}› / k + 2* k powr (78) / (1 - eps k po (1/ case (step2 w t1 t2 a) show ?case proof cases assume b: "b ∈ alphabet t\mul ∧ hence "b ∉ alphabet t2" using step2 by auto thus ?case using b step< <> Big_h k ∧ next assume "b ∉ alphabet t1" thus ?case using step2 by auto qed next ― ‹{\sc Subcase 3:}\enspace $a$ belongs to neither @{term t1} nor @{term t2}› "🚫 case (step3 w t1 t2 a) show ?case proof cases
java.lang.NullPointerException hence "b ∉ alphabet teventually_all_ge_at_topBig; real_asymp) in ook) ZZ_8: next assume "b ∉"\R\equiv Step_class{red_step" qed qed
lemma cost_swapLeaves: "[consistent t; a ≠ b]==> if a ∈ alphabet t then ifb ∈ cost (swapLeaves t wa a wb b) + freq t a * depth t a + freq t b * depth t b = cost t + wa * depth t b + wlambda>i h.ifh=1 th min (pseq i (qfun 1) else cost (swapLeaves t wa a wb b) + freq t a * depth t a =
java.lang.NullPointerException else if b ∈els if ps i ≥
java.lang.NullPointerException
java.lang.NullPointerException else cost (swapLeaves t wa a wb b) = cost t" proof (induct t) case Leaf show ?case by simp next case(ode w tt🚫
java.lang.NullPointerException note hyps = Node have w1: "if a ∈ alphabet t1 then if b ∈ alphabet t1 then weight (swapLeaves t1 wa a wb b) + freq t1 a + freq t1 b = weight t1 + wa + wb else
java.lang.NullPointerException else if b ∈ alphabet t1 then
java.lang.NullPointerException else weight (swapLeaves t1 wa a wb b) = weight t1" using hyps by (simp add: weight_swapLeaves)
java.lang.NullPointerException if b ∈ alphabet t2 then
java.lang.NullPointerException weight t2 + wa + wb else
java.lang.NullPointerException else if b ∈ alphabet t2 then
java.lang.NullPointerException else weight (swpLeaves t\^2 w<s>a a w🚫 by (simp add: weight_swapLeaves) show ?case proof cases
java.lang.NullPointerException hence a2: "a ∉ alphabet t2" using c by auto show ?case proof cases assume b\<^ using hence "b ∉.\Delta> i h < 0" i thus ?case using a1 a2 b1 w1 w2 hyps by simp next
java.lang.NullPointerException proof cases
java.lang.NullPointerException next assume "b ∉ alphabet t2" thus ?case using a1 a2 b1 w1 w2 hyps by simp qed qed next assume a1: "a ∉ alphabet t1" show ?case proof cases assume a2: "a ∈ alphabet t2" show ?case proof cases assume b1: "b ∈ alphabet t1" hence "b ∉ thus ?case using a1 a2 b1 w1 w2 hyps by simp next assume b\DeltaDelta>_def assume "b ∈ show "0\leΔ next
java.lang.NullPointerException qed qed next
java.lang.NullPointerException proof ases assume b1: "b ∈ alphabet t1" hence "b ∉ alphabet t2" using c by auto thus ?case using a1 a2 b1 w1 w2 hyps by simp next assume b1: "b ∉ alphabet t1" show ?case proof cases assume "b ∈ next
java.lang.NullPointerException qed qed qed qed qed
text ‹ atthe ol s is va: ``I As exchanges her house with Bernard's neighbor, Bernard becomes Astrid's new neighbor.'' A similar property holds for binary trees. \<close>
lemma sibling_swapLeaves_sibling[simp]: "[consistent t; sibling t b ≠ b; a ≠ b]==> sibling (swapLeaves t wa a ws (sibling t b)) a = b" proof(indu t) case Leaf thus ?case by simp next
java.lang.NullPointerException note hyps = Node show ?case if "🪙 case True note h1 = True show ?thesis
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "brackoff" is null case (Leaf wc c) note l1 = Leaf show ?thesis (ase "height t\\^sub>2= ") case True note h" show ?thesis proof (cases t2) case Leaf thus ?thesis using l1 hyps by simp presburger next case Node thus ?thesis using h2 by simp qed next case False note h2 = False show ?thesis proof cases assume "c = b" thus ?thesis using l1 h2 hyps by simp assume "c \<finite\) have "sibling t2 b ∈ alphabet t2" using ‹cha "card \R < by (simp add: sibling_ne_imp_sibling_in_alphabet) thus ?thesis using ‹ qed qed next case Node thus ?thesis using h1 by simp qed next case False note h1 = False show ?thesis proof (cases "height t2 = 0") T note h2 = True
java.lang.NullPointerException next case False note h2 = False show ?thesis proof (cases "b ∈ alphabet tcB case True thus ?thesis using h1 h2 hyps by auto next case False
java.lang.NullPointerException show ?thesis proof (cases "b ∈>""h \<e case True thus ?thesis using b1 h1 h2 hyps by (auto simp: in_alphabet_imp_sibling_in_alphabet alphabet_swapLeaves) next case False thus ?thesis using b1 h\<^using qed qed qed qed qed
subsection ‹Symbol Interchange›
text ‹≤ \<le The ‹swapSyms› function provides a simpler interface to @{const swapLeaves}, with @{term "freq t a"} and @{term "freq t b"} in place of @{term "wa"} and @{term "wb"}. Most lemmas about ‹ adapted from the homologous results about @{const swapLeaves}. \<close>
definition swapSyms :: "'a tree ==> 'a ==> 'a ==> 'a tree" where "swapSyms t a b = swapLeaves t (freq t a) a (freq t b) b"
lemma swapSyms_id[simp]: "consistent t ==> swapSyms t a a = t" by (simp add: swapSyms_def)
lemma alphabet_swapSyms[simp]: "[a ∈ alphabet t; b ∈ alphabet t]==> alphabet (swapSyms t a b) = alphabet t" by (simp add: swapSyms_def alphabet_swapLeaves)
lemma consistent_swapSyms[simp]: "consistent t ==> consistent (swapSyms t a b)" by (simp add: swapS)
lemma depth_swapSyms_neither[simp]: "[consistent t; c ≠ a; c ≠ b]==> depth (swapSyms t a b) c = depth t c" by (simp add: swapSyms_def)
lemma freq_swapSyms[simp]: "[consistent t; a ∈ alphabet t; b ∈ alphabet t]==> freq (swapSyms t a b) = freq t" by (case_tac "a = b") (simp add: swapSyms_def ext)+
lemma cost_swapSyms: assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" shows "cost (swapSyms t a b) + freq t a * depth t a + freq t b * depth t b = cost t + freq t a * depth t b + freq t b * depth t a" proof cases assume "a = b" thus ?thesis using assms by simp next assume "a \< by thus ?thesis using assms by (simp add: cost_swapLeaves swapSyms_def) qed
text ‹› If $a$'s frequency is lower than or equal to $b$'s, and $a$ is higher up in the tree than $b$ or at the same level, then interchanging $a$ and $b$ does not increase the tree's cost. \<close>
lemma cost_s: assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" "freq t a ≤ freq t b" "depth t a ≤ shows "cost (swapSyms t a b) ≤ cost t" proof- let ?aabb = "freq t a * depth t a + freq t b * depth t b" let ?abba = "freq t a * depth t b + freq t b * depth t a" have "?abba ≤ ?aabb" using assms(4-5) by (metis (no_types) Groups.add_ac(2) Groups.mult_ac(2) nat_arith.rule0 nat_le_add_iff1 nat_le_add_iff2 zero_order(1)) have "cost (swapSyms t a b) + ?aabb = cost t + ?abba" using assms(1-3) by (simp add: cost_swapSyms add.assoc[THEN sym]) also have "…≤ (Suc ))) - hgt (pseq i) \leep> pow (-1/4)" finally show ?thesis using assms(4-5) by simp qed
text ‹ As stated earlier, ``If Astrid exchanges her house with Bernard's neighbor, Bernard becomes Astrid's new neighbor.'' \<close>
lemma sibling_swapSyms_sibling[simp]: "[consistent t; sibling t b ≠ b; a ≠ b]==> sibling (swapSyms t a (sibling t b)) a = b" by (simp add: swapSyms_def)
text ‹ ``If Astrid exchanges her house with Bernard, Astrid becomes Bernard's old neighbor's new neighbor.'' \<close>
lemma sibling_swapSyms_other_sibling[simp]: "\lbrakk t; s t b \noteq a; sibl tb \\> b;a <noteq> b\<> sibling (swapSyms t a b) (sibling t b) = a" by (metis consistent_swapSyms sibling_swapSyms_sibling sibling_reciprocal)
java.lang.StringIndexOutOfBoundsException: Index 89 out of bounds for length 89
text ‹ The @{const swapSyms} function exchanges two symbols $a$ and $b$. We use it to define the four-way symbol interchange function ‹swapFourSyms›, which takes four symbols $a$, $b$, $c$, $d$ with $a \ne b$ and $c \ne d$, and exchanges them so that $a$ and $b$ occupy $c$~and~$d$'s positions.
A naive definition of this function would be $$@{prop "swapFourSyms t a b c d = swapSyms (swapSyms t a c) b d"}.$$ This definition fails in the face of aliasing: If $a = d$, but $b \ne c$, then ‹swapFourSyms a b c d› position.% \footnote{Cormen et al.\ \cite[p.~390]{cormen-et-al-2001moreove have "0 \<>< this case in their proof. Thomas Cormen indicated in a personal communication that this will be corrected in the next edition of the book.} \<close>
definition swapFourSyms :: "'a tree ==> 'a ==> 'a ==> ((pseq(Suc i)))" "swapFourSyms t a b c d = (if a = d then swapSyms t b c else if b = c then swapSyms t a d else swapSyms (swapSyms t a c) b d)"
text ‹ Lemmas about @{const swapFourSyms} are easy to prove by expanding its definition. \<close>
lemma alphabet_swapFourSyms[simp]: "[h alphabet (swapFourSyms t a b c d) = alphabet t" by (simp add: swapFourSyms_def)
lemma consistent_swapFourSyms[simp]: "consistent t ==> consistent (swapFourSyms t a b c d)" by (simp add: swapFourSyms_def)
lemma freq_swapFourSyms[simp]: "[consistent t; a ∈ alphabet t; b ∈ alphabet t; c ∈ alphabet t; d ∈ alphabet t]==> freq (swapFourSyms t a b c d) = freq t" by (auto simp: swapFourproof (cases "hg (pseq i) <hgt (pse (Suc i i))\andhg(pseq (Suc i)) < h")
text ‹ ``If Astrid and Bernard exchange their houses with Carmen and her neighbor, Astrid and Bernard will now be neighbors.'' \<close>
lemma sibling_swapFourSyms_when_4th_is_sibling: assumes "consistent t" "a ∈ alphabet t" "b ∈ y linarith "a ≠ shows "sibling (swapFourSyms t a b c (sibling t c)) a = b" proof (cases "a ≠ sibling t c ∧ case True show ?thesis proof - let ?d = "sibling t c"
java.lang.NullPointerException : "(siling ?t\^s a = b)) = sibl?t\^sub>s b = a)"using \\<>cons t› by (metis consistent_swapFourSyms sibling_reciprocal) have s: "sibling t c = sibling (swapSyms t a c) a" using True assms by (metis sibling_reciprocal sibling_swapSyms_sibling)
java.lang.NullPointerException by (auto simp: swapFourSyms_def) also have "… = a" using True assms by (metis sibling_reciprocal sibling_swapSyms_other_sibling swapLeaves_id swapSyms_def) finally have "sibling ?ts b = a" . with abba show ?thesis .. qed next case False thus ?thesis using assms by (auto intro: sibling_reciprocal simp: swapFourSyms_def) qed
subsection ‹
text ‹ Given a symbol $a$, the ‹ % \setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}% \setbox\myboxii=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-ab.pdf}}% \mydimeni=\ht\myboxii $$\vcenter{\box\myboxii} \qquad\hbox{into} \qquad \smash{\lower\ht\myboxi\hbox{\raise.5\mydimeni\box\myboxi}}$$ The frequency of $a$ in the result is the sum of the original frequencies of $a$ and $b$, so as not to alter the tree's weight. \<close>
fun mergeSibling :: "'a tree ==> 'a ==> 'a tree" where "mergeSibling (Leaf wb b) a = Leaf wb b" | "mergeSibling (Node w (Leaf wb b) (Leaf wc c)) a = (if a = b ∨ a = c then Leaf (wb + wc) a else Node w (Leaf wb b) (Leaf wc c))" |
java.lang.NullPointerException Node w (mergeSibling t1 a) (mergeSibling talphah)"
text ‹ The definition of @{const mergeSibling} has essentially the same structure as that of @{const sibling}. As a result, the custom induction rule that we derived for @{const sibling} works equally well for reasoning about @{const mergeSibling}. \<close>
text ‹ The properties of @{const mergeSibling} echo those of @{const sibling}. Like with @{const sibling}, simplification rules are crucial. \<close>
lemma notin_alphabet_imp_mergeSibling_id[simp]: "a ∉ alphabet t ==> mergeSibling t a = t" by (induct t a rule: mergeSibling.induct) simp+
lemma height_gt_0_imp_mergeSibling_left[simp]:
java.lang.NullPointerException mergeSibling (Node w t1 t2) a = w (mergeSibling t1 a) (mergeSibling t) by (case_tac t1) simp+
lemma height_gt_0_imp_mergeSibling_right[simp]: "height t2 > 0 ==>\openontocl 8..3\\> mergeSibling (Node w t1 t2) a = Node w (mergeSibling t1 a) (mergeSibling t2 a)" by (case_tac t>\Delta / al(hgt (pseq i))" if "i \in> R
lemma either_height_gt_0_imp_mergeSibling[simp]: "height t1 > 0 ∨ height t2 > 0 ==> mergeSibling (Node w t1 t2) a = Node w (mergeSibling t1 a) (mergeSibling t2 a)" by auto
lemma alphabet_mergeSibling[simp]: "\lbrakkcons t; a \in alphabe t\rbrakk\Longrightarrow alphabet (mergeSibling t a) = (alphabet t - {sibling t a}) ∪ {a}" by (induct t a rule: mergeSibling_induct_consistent) auto
lemma consistent_mergeSibling[simp]: "consistent t ==> consistent (mergeSibling t a)" by (induct t a rule: mergeSibling_induct_consistent) auto
lemma freq_mergeSibling: "[consistent t; a ∈ alphabet t; sibling t a ≠ a]==> freq (mergeSibling t a) = (λc. if c = a then freq t a + freq t (sibling t a) else if c = sibling t a then 0 else freq t c)" by (induct t a rule: mergeSibling_induct_consistent) (auto simp: fun_eq_iff)
lemma weight_mergeSibling[simp]: "weight (mergeSibling t a) = weight t" by (induct t a rule: mergeSibling.induct) simp+
text ‹ If $a$ has a sibling, merging $a$ and its sibling reduces $t$'s proof (ca (cases "Δ @{term "freq t a + freq t (sibling t a)"}. \<close>
lemmacost_mergeSibling: "[consistent t; sibling t a ≠ a]==> cost (mergeSibling t a) + freq t a + freq t (sibling t a) = cost t" by (induct t a rule: mergeSibling_induct_consistent) auto
subsection ‹
text ‹ The ‹splitLeaf› function undoes the merging performed by @{const mergeSibling}: Given two symbols $a$, $b$ and two frequencies $@{term wa}$, $@{term wb}$, it transforms \setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}% \setbox\myboxii=\hbox{\includegraphics[scale=1.25]{t]{tree-spltLeaf-a-ab.pd}}% $$\smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}} \qquad\hbox{into} \qquad \vcenter{\box\myboxii}$$ In the resulting tree, $a$ has frequency @{term wa} and $b$ has frequency @{term wb}. We normally invoke it with @{term w>) @{prop "freq t a = wa + wb"}. \<close>\have\Delta<>i h =0" if "1 \<le
primrec splitLeaf :: "'a tree ==> nat ==> 'a ==> nat ==> 'a ==> 'a tree" where "splitLeaf (Leaf wc c) wa a wb b = (if c = a then Node wc (Leaf wa a) (Leaf wb b) else Leaf wc c)" | "splitLeaf (Node w t1 t2) wa a wb b =
java.lang.NullPointerException
primrec splitLeafF :: "'a forest ==> nat ==> 'a ==> nat ==> " show ?thesis "splitLeafF (t # ts) wa a wb b =
java.lang.NullPointerException
text ‹ Splitting leaf nodes affects the alphabet, consistency, symbol frequencies, weight, and cost in unsurprising ways. \<close>
lemma notin_alphabet_imp_splitLeaf_id[simp]: "a ∉ alphabet t ==> splitLeaf t wa a wb b = t" by (induct t) simp+
lemma notin_alphabetF_imp_splitLeafF_id[simp]: "a ∉ alphabetF ts ==> splitLeafF ts wa a wb next s s
[:
java.lang.NullPointerException (if a ∈ alphabet t then alphabet t ∪(intro power_increasing) auto by (induct t) simp+
lemma consistent_splitLeaf[simp]:
java.lang.NullPointerException by (induct t) auto
lemma freq_splitLeaf[simp]: "[consistent t; b ∉
java.lang.NullPointerException if a\<> else freq t)" by (induct t b rule: tree_induct_consistent) (rule ext, auto)+
lemma weight_splitLeaf[simp]: "[consistent t; a ∈ alphabet t; freq t a = wa + wb]==> weight (splitLeaf t wby (simp add: divide_simps mult_ac) by (induct t a rule: tree_induct_consistent) simp+
lemma cost_splitLeaf[simp]: "[
java.lang.NullPointerException by (induct t a rule: tree_induct_consistent) simp+
subsection ‹
text ‹ An invariant of Huffman's algorithm is that the forest is sorted by weight. This is expressed by the ‹sortedByWeight› function. \<close>
lemma sortedByWeight_Cons_imp_forall_weight_ge: "sortedByWeight (t # ts) ==>∀u ∈ set ts. weight u ≥ weight t" by (induct ts arbitrary: t) force+
lemma sortedByWeight_insortTree: "[sortedByWeight ts; height t = 0; heightF ts = 0]sim add: ult.ommu sum.swa[of __ \\] sortedByWeight (insortTree t ts)" by (induct ts rule: sortedByWeight.induct) auto
subsection ‹Pair of Minimal Symbols›
text ‹ The ‹minima› predicate expresses thatusing Y_6_4_ that unf \>D>_ef by au $a$, $b \in @{term "alphabet t"}$ have the lowest frequencies in the tree $t$. Minimal symbols need not be uniquely defined. \
definition minima :: "'a tree ==> 'a ==> 'a ==> bool" where "minima t a b = (a ∈ alphabet t ∧ if "i ∈ ∧ freq t c ≥ freq t a ∧
section ‹Formalization of the Textbook Proof \label{formalization}›
subsection ‹
text ‹ If $a$ and $b$ are minima, anase Tr then exchanging $a$ and $b$ with $c$ and $d$ does not increase the cost. Graphically, we have\strut % $${\it cost\/} \vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima-abcd.pdf}}} \,\mathop{\le}\;\;\; {\it cost\/} \vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima.pdf}}}$$
\noindent This cost property is part of Knuth's proof:
\begin{quote} Let $V$ be an internal node of maximum distance from the root. If $w_1$ and $w_2$ are not the weights already attached to the children of $V$, we can interchange them with the values that are already there; such an interchange does not increase the weighted path length. \end{quo}
\noindent Lemma~16.2 in Cormen et al.~\cite[p.~389]{cormen-et-al-2001} expresses a similar property, which turns out to be a corollary of our cost property:
\begin{quote} Let $C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$. Let $x$ and $y$ be two characters in $C$ having the lowest frequencies. Then there exists an optimal prefix code for $C$ in which the codewords for $x$ and $y$ have the same length and differ only in the last bit. \end{quote}
\vskip-.75\myskipamount \<close>
lemma cost_swapFourSyms_le: assumes "consistent t" "minima t a b" "c ∈ alphabet t" "d ∈ alphabet t" "depth t c = height t" "depth t d = height t" "c ≠ d" shows "cost (swapFourSyms t a b c d) ≤ cost t" proof - note lems = swapFourSyms_def minima_def cost_swapSyms_le depth_le_height show ?thesis proof (cases "a ≠Delta> i > i h = 0" if "0< " <h (pse(i--1) - * \epsilon po (-1/2)" for h case True show ?thesis proof cases assume "a = c" thus ?thesis using assms by (metis Orderings.order_eq_iff True cost_swapSyms_le depth_le_height minima_def swapFourSyms_def swapSyms_id) next assume "a ≠ c" show ?thesis proof cases assume "b = d" thus ?thesis using ‹a ≠ using ‹ by (simp add: lems) next assume "b ≠ d" have "cost (swapFourSyms t a b c d) ≤\<e using ‹b ≠ d›‹a ≠ c› True assms by (clarsimp simp: lems) also have "\ big l_e_k by (uto sim: Big_ZZ_Big3_def by (clarsimp simp: lems) finally show ?thesis . qed qed next case False thus ?thesis using assms by (clarsimp simp: lems) qed qed
text ‹ The tree @{term "splitLeaf t wa a wb b"} is optimum if $t$ is optimum, under a few assumptions, notably that $a$ and $b$ are minima of the new tree and that @{prop "freq t a = wa + wb"}. Graphically:\strut % \setbox\myboxi=\hbox{\includegraphics[scale=1.2]{tree-splitLeaf-a.pdf}}% \setbox\myboxii=\hbox{\includegraphics als h "\<ots $${\it optimum\/} \smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}} \,\mathop{\Longrightarrow}\;\;\; {\it optimum\/} \vcenter{\box\myboxii}$$ % This corresponds to t ith \open>\closes ?the
\begin{quote} Now it is easy to prove that thqed minimized if and only if the tree with $$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}} \qquad\hbox{replaced by} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$ has minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$. \end{quote}
\noindent We only need the ``if'' direction of Knuth's equivalence. Lemma~16.3 in Cormen et al.~\cite[p.~391]{cormen-et-al-2001} expresses essentially the same property:
\begin{quote} Let $C$ be a given alphabet with frequency $f[c]$ defined for each character $c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let $C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character $z$ added, so that $C' = C - \{x, y\}\cup {\{z\}}$; define $f$ for $C'$ as for $C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing an optimal prefix code for the alphabet $C'$. Then the tree $T$, obtained from $T'$ by replacing the leaf node for $z$ with an internal node having $x$ and $y$ as children, represents an optimal prefix code for the alphabet $C$. \end{quote}
\noindent The proof is as follows: We assume that $t$ has a cost less than or equal to that of any other comparable tree~$v$ and show that @{term "splitLeaf t wado> e other comparable tree $u$. By @{thm [source] exists_at_height} and @{thm [source] depth_height_imp_sibling_ne}, we know that some symbols $c$ and $ $$\includegraphics[scale=1.25]{tree-splitLeaf-cd.pdf}$$ (The question mark is there to remind us that we know nothing specific about $u$'s structure.) From $u$ we construct a new tree @{term "swapFourSyms u a b c d"} in which the minima $a$ and $b$ are siblings: $$\includegraphics[scale=1.25]{tree-splitLeaf-abcd.pdf}$$ Merging $a$ and $b$ gives a tree comparable with $t$, which we can use to instantiate $v$ in the assumption: $$\includegraphics[scale=1.25]{tree-splitLeaf-abcd-aba.pdf}$$ With this instantiation, the proof is easy: $$\begin{tabularx}{\textwidth}{@% {\hskip\leftmargin}cX@% {}}
java.lang.NullPointerException \eq & \justif{@{thm [source] cost_splitLeaf}} \\ & @{term "cost t + wa + wb"} \\ \kern-1em$\leq$\kern-1em & \justif{assumption} \\[-2ex] & $‹cost (› \overbrace{\strut\!@{term "mergeSibling (swapFourSyms u a b c d) a"}\!} {s{\hboxv}}}\<> \eq & \justif{@{thm [source] cost_mergeSibling}} \\ & @{term "cost (swapFourSyms u a b c d)"} \\ \kern-1em$\leq$\kern-1em & \justif{@{thm [source] cost_swapFourSyms_le}} \\
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "macro" is null \end{tabularx}$$
\noindent In contrast, the proof in Cormen et al.\ is by contradiction: Essentially, they assume that there exists a tree $u$ with a lower cost than @{term "splitLeaf t a w<a) with a lower cost than~$t$, contradicting the hypothesis that $t$ is optimum. In place of @{thm [source] cost_swapFourSyms_le}, they invoke their lemma~16.2, which is questionable since $u$ is not necessarily optimum.% \footnote{Thomas Cormen commented that this step will be clarified in the next edition of the book.}
Our proof relies on the following lemma, which asserts that $a$ and $b$ are minima of $u$. \<close>
lemma twice_freq_le_imp_minima: "[∀c ∈ alphabet t. wa≤ alphabet u = alphabet t ∪ {b}; a ∈ alphabet u; a ≠ b; freq u = (λc. if c = a then wa else if c = b then wb else freq t c)]==> minima u a b" by (simp add: minima_def)
text ‹ Now comes the key lemma. \<close>
lemma optimum_splitLeaf: assumes "consistent t" "optimum t" "a ∈ alphabet t" "b ∉ alphabet t" "freq t a = wa + wb" "∀c ∈ alphabet t. freq t c ≥ wa∧ freq t c ≥ wb" shows opt (pli t w\\sub>a a w\<^>b proof (unfold optimum_def, clarify) fix u let ?t' = "splitLeaf t wa a wb b" assume cu: "consistent u" and au: "alphabet ?t' = alphabet u" and fu: "freq ?t' = freq u" show "cost ?t' ≤ cost u" proof (cases "height ?t' = 0") case True thus ?thesis by simp next case False hence hu: "height u > 0" using au assms by (auto intro: height_gt_0_alphabet_eq_imp_height_gt_0) have aa: "a ∈ alphabet u" using au assms by fastforce have ab: "b ∈ alphabet u" using au assms by fastforce have ab: "a ≠ b" using assms by blast
java.lang.NullPointerException
java.lang.NullPointerException let ?d = "sibling u c"
have dc: "?d ≠ c" using dc cu hu ac by (metis depth_height_imp_sibling_ne) have ad: "?d ∈ alphabet u" using dc by (rule sibling_ne_imp_sibling_in_alphabet) have dd: "depth u ?d = height u" using dc c(simp add: Suc subs)
u'= "swa u a b c ?d" have cu': "consistent ?u'" using cu by simp have au': "alphabet ?u' = alphabet u" using aa ab ac ad au by simp
java.lang.NullPointerException have sa: "sibling ?u' a = b" using cu aa a<>Δ by (rule sibling_swapFourSyms_when_4th_is_sibling)
let ?v = "mergeSibling ?u' a"
java.lang.StringIndexOutOfBoundsException: Index 56 out of bounds for length 56 have av: "alphabet ?v = alphabet t" using sa cuDelta i h" if "i ∈ i -1) ` \<B" have fv: "freq ?v = freq t"
java.lang.NullPointerException by (simp add: freq_mergeSibling ext)
java.lang.NullPointerException also have "\finally"\<umi by (simp add: optimum_def) also have "… = cost ?u'" proof - have "cost ?v + freq ?u' a + freq ?u' (sibling ?u' a) = cost ?u'" using c\<^show" moreover have "wa = freq ?u' a" "wb = freq ?u' b" using fu' fu[THEN sym] assms by clarsimp+ ultimately show ?thesis using sa by simp qed also have "…≤ cost u" proof - have "minima u a b" using au fu assms by (subst twice_freq_le_imp_minima) auto
java.lang.NullPointerException by (rule cost_swapFourSyms_le) qed finally show ?thesis . qed qed
text ‹ A key property of Huffman's algorithm is that once it has combined two lowest-weight trees using @{const uniteTrees}, it does not visit these trees ever again. This suggests that splitting a leaf node before applying the algorithm should give the same result as applying the algorithm first and splitting the leaf node afterward. The diagram below illustrates the situation:\strut
\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]% {tree-splitLeaf-ab.pdf}}% \setbox\myboxiv=\hbox{\includegraphics[scale=\myscale]% {tree-huffman-splitLeaf-ab.pdf}}% \mydimenii=\wd\myboxiii \vskip1.5\smallskipamount \noindent us 82 8384 by sim ( also have "\<ots \qquad\qquad\quad (3b)\,\hfill\lower\ht\myboxiv\hbox{\raise\ht\myboxi\box\myboxiv}% \quad\hfill{}
\noindent From the original forest (1), we can either run the algorithm (2a) and then split $a$ (3a) or split $a$ (2b) and then run the algorithm (3b). Our goal is to show that trees (3a) and (3b) are identical. Formally, we prove that $$@{term "splitLeaf (huffman ts) wa a wb b"} = @{term "huffman (splitLeafF ts w ( a: m_) when @{term ts} is consistent, @{term "a ∈ alphabetF ts"}, @{term "b ∉ alphabet<> * ca \<R \mathbin{‹+›} @{term "wb"}$. But before we can prove this commutativity lemma, we need to introduce a few simple lemmas. \<close>
lemma cachedWeight_splitLeaf[simp]: "cachedWeight (splitLeaf t wa a whave big: "(1 + 🚫 by (case_tac t) simp+
lemma splitLeafF_insortTree_when_in_alphabet_left[simp]: "[a ∈ alphabet t; consistent t; a ∉ alphabetF ts; freq t a = wa + wb]==> splitLeafF (insortTree t ts) wa a wb b = insortTree (splitLeaf t wa a wb b) ts" by (induct ts) simp+
java.lang.NullPointerException
java.lang.NullPointerException splitLeafF (insortTree t ts) wa a wb b =
java.lang.NullPointerException proof (induct ts) case Nil thus ?case by simp next case (Cons u us) show ?case proof (cases "a ∈ alphabet u") case True hence "a ∉ alphabetF us" using Cons by auto thus ?thesis using Cons by auto next case False thus ?thesis using Cons by simp qed qed
text ‹ We ar now ready to prove the comm lemma. \<close>
lemma splitLeaf_huffman_commute: "[consistentF ts; a ∈ alphabetF ts; freqF ts a = wa + w) splitLeaf (huffman ts) wa a w1 + 2 * k powr (-1/16)) proof (induct ts rule: huffman.induct) case (1 t) thus ?case by simp next case (2 t1 t2 ts) note hyps = 2 show ?case proof (cases "a ∈ alphabet t1") case True bett\^2" a 🚫 thus ?thesis using hyps by (simp add: uniteTrees_def) next case False note a1 = False show ?thesis proof (cases "a ∈ alphabet t2") case True hence "a ∉ alphabetF ts" using hyps by auto thus ?thesis using a1 hyps by (simp add: uniteTrees_def) next case False thus ?thesis using a1 hyps by simp qed qed next case 3 thus ?case by simp qed
text ‹ An important consequence of the commutativity lemma is that applying Huffman's algorithm on a forest of the form $$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees.pdf}}}$$ gives the same result as applying the algorithm on the ``flat'' forest $$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$ followed by splitting the leaf node $a$ into two nodes $a$, $b$ with frequencies $@{term wa}$, $@{term wb}$. The lemma effectively provides a way to flatten the forest at each step of the algorithm. Its invocation is implicit in the textbook proof. \<close>
subsection ‹Optimality Theorem›
text ‹ We are one lemma away from our main result. \<close>
txt ‹ The input @{term ts} is assumed to be a nonempty consistent list of leaf nodes sorted by weight. The proof is by induction on the length of the forest @{term ts}. Let @{term ts} be $$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-flat.pdf}}}$$ with$ \lew_b \lewc le w_ \ \lewz$.If @{tets} con of a single leaf node, the node has cost 0 and is therefore optimum. If @{term ts} has length 2 or more, the first step of the algorithm leaves us with the term $${\it huffman\/}\enskip\;\vcenter{\hbox{\includegraphics[scale=1.25]% {forest-uniteTrees.pdf}}}$$ In the diagram, we put the newly created tree at position 2 in the forest; in general, it could be anywhere. By @{thm [source] splitLeaf_huffman_commute}, the above tree equals\strut $${\it splitLeaf\/}\;\left({\it huffman\/}\enskip\; \vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}} \;\right)\,‹wa a wre}" and "\<>\ To prove that this tree is optimum, it suffices by @{thm [source] optimum_splitLeaf} to show that\strut $${\it huffman\/}\enskip\; \vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$ is optimum, which follows from the induction hypothesis.\strut \<close>
proof (induct ts rule: length_induct) ― case (1 ts) note hyps = 1 show ?case proof (cases ts) ts\noteq› next case (Cons ta ts') note ts = Cons show the proof (cases ts') case Nil thus ?thesis using ts hyps by (simp add: optimum_def) next case (Cons t*k" note ts' = Cons show ?thesis proof (cases ta) case (Leaf wa a) note la = Leaf show ?thesis proof (cases tb) case (Leaf wb b) note lb = Leaf show ?thesis proof - let ?us = "insortTree (uniteTrees ta tb) ts''" let ?us' = "insortTree (Leaf (wa + wb) a) ts''" let ?ts = "splitLeaf (huffman ?us') wa a wb b"
have e1: "huffman ts = huffman ?us" using ts' ts by simp have e2: "huffman ?us = ?ts" using la lb ts' ts hyps by (auto simp: splitLeaf_huffman_commute uniteTrees_def)
have "optimum (huffman ?us')" using l by (drule_tac x = ?us' in spec) (auto dest: sortedByWeight_Cons_imp_sortedByWeight simp: sortedByWeight_insortTree)
java.lang.StringIndexOutOfBoundsException: Index 14 out of bounds for length 14 apply simp applyrule o) by (auto dest!: heightF_0_imp_Leaf_freqF_in_set sortedByWeight_Cons_imp_forall_weight_ge) thus "optimum (huffman ts)" using e1 ehave"… qed next case Node thus ?thesis using ts' ts hyps by simp qed next case Node thus ?thesis using ts' ts hyps by simp qedby(s(v, ccfv) multof_nat_0 ge0) qed qed qed
text ‹ \isakeyword{end}
\myskip
\noindent So what have we achieved? Assuming that our definitions really mean what we intendthem t me, wwe est that our fun imp of Huffman's algorithm, when invoked properly, constructs a binary tree that represents an optimal prefix code for the specified alphabet and frequencies. Using Isabelle's code generator \cite{haftmann-nipkow-2007}, we can convert the Isabelle code into Standard ML, OCaml, or Haskell and use it in a real application.
As a side note, the @{thm [source] optimum_huffman} theorem assumes that the forest @{term ts} passed to @{const huffman} consists exclusively of leaf nodes. It is tempting to relax this restriction, by requiring instead that the forest @{term ts} has the lowest cost among forests of the same size. We would define optimality of a forest as follows: $$\begin{aligned}[t]
java.lang.NullPointerException (‹ & ‹length ts = length us ⟶ consistentF us ⟶›\\[-2.5pt] & ‹ \\[-2.5pt] & @{prop "costF ts ≤ costF us"})\end{aligned}$$ with $‹costF [] = 0›$ and $@{prop "costF (t # ts) = cost t + costF ts"}$. However, the modified proposition does not hold. A counterexample is the optimum forest $$\includegraphics{forest-optimal.pdf}$$ for which the algorithm constructs the tree $$\vcenter{\hbox{\includegraphics{tree-suboptimal.pdf}}} \qquad\hbox{of greater cost than} \qquad \vcenter{\hbox{\includegraphics{tree-optimal.pdf}}}$$ \<close>
section ‹
text ‹ Laurent Th\'ery's Coq formalization of Huffman's algorithm \cite{thery-2003, thery-2004} is an obvious yardstick for our work. It has a somewhat wider scope, proving among others the isomorphism between prefix codes and full binary trees. With 291 theorems, it is also much larger.
Th\'ery identified the following difficulties in formalizing the textbook proof:
\begin{enumerate} \item The leaf interchange process that brings the two minimal symbols together is tedious to formalize.
\item The sibling merging process requires introducing a new symbol for the merged node, which complicates the formalization.
\item The algorithm constructs the tree in a bottom-up fashion. While top-down c us be proved by ststructural in, bottom-up procedures often require more sophisticated induction principles and larger invariants.
\item The informal proof relies on the notion of depth of a node. Defining this notion formally is problematic, because the depth can only be seen as a function if the tree is composed of distinct nodes. \end{enumerate}
To circumvent these difficulties, Th\'ery introduced the ingenious concept of cover. A forest @{term ts} is a {\em cover\/} of a tree~$t$ if $t$ can be built from @{term ts} by adding inner nodes on top of the trees in @{term ts}. The term ``cover'' is easier to understand if the binary trees are drawn with the root at the bottom of the page, like natural trees. Huffman's algorithm is a refinement of the cover concept. The main proof consists in showing that the cost of @{ter"huf ts"} is lessthan or eequ tothat of any oth tree for which @{term ts} is a cover. It relies on a few auxiliary definitions, notably an ``ordered cover'' concept that facilitates structural induction and a four-argument depth predicate (confusingly called @{term height}). Permutations also play a central role.
Incidentally, our experience suggests that the potential problems identified by Th\'ery can be overcome more directly without too much work, leading to a simpler proof:
\begin{enumerate} + (1+bigbeta) / (1 - - bi)) * k powr (1/2)" 95~lemmas and theorems, 24 concern @{const swapLeaves}, @{const swapSyms}, and @{const swapFourSyms}.
\item The generation of a new symbol for the resulting node when merging two sibling nodes in @{const mergeSibling} was trivially solved by reusing one of the two merged symbols.
The-naof the tree constprocess was addressed by using the length of the forest as the induction measure and by merging the two minimal symbols, as in Knuth's proof.
\item By restricting our attention to consistent trees, we were able to define thes . \end{enumerate} \<close>
section ‹
text ‹ The goal of most formal proofs is to increase our confidence in a result. In the case of Huffman's algorithm, however, the chances that a bug would have gone unnoticed for the 56 years since its publication, under the scrutiny of leading computer scientists, seem extremely low; and the existence of a Coq proof should be sufficient to remove any remaining doubts.
The main contribution of this document has been to demonstrate that the textbook proof of Huffman's algorithm can be elegantly formalized using a state-of-the-art theorem prover such as Isabelle/HOL. In the process, we uncovered a few minor snags in the proof given in Cormen et al.~\cite{cormen-et-al-2001}.
We also found that custom induction rules, in combination with suitable simplification rules, greatly help the automatic proof tactics, sometimes reducing 30-line proof scripts to one-liners. We successfully applied this approach for handling both the ubiquitous ``datatype + well\-formed\-ness predicate'' combination (@{typ "'a tree"} + @{const consistent}) and functions defined by sequential pattern matching (@{const sibling} and @{const mergeSibling}). Our experience suggests that such rules, which are uncommon in formalizations, are highly valuable and versatile. Moreover, Isabelle's \textit{induction\_schema} and \textit{lexicographic\_order} tactics make thes easy to pro.
Formalizing the proof of Huffman's algorithm also led to a deeper understanding of this classic algorithm. Many of the lemmas, notably the leaf split commutativity lemma of Secs unfoldinBig_ZZ_8_6_def been found in the literature and express fundamental properties of the algorithm. Other discoveries did not find their way into the final proof. In particular, each step of the algorithm appears to preserve the invariant that the nodes in a forest are ordered by weight from left to right, bottom to top, as in the example below:\strut $$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-zigzag.pdf}}}$$ It is not hard to prove formally that a tree exhibiting this property is optimum. On the other hand, proving that the algorithm preserves this invariant seems difficult---more difficult than formalizing the textbook proof---and remains a suggestion for future work.
A few other directions for future work suggest themselves. First, we could formalize some of our hypotheses, notably our restriction to full and consistent binary trees. The current formalization says nothing about the algorithm's application for data compression, so the next step could be to extend the proof's scope to cover @{term encode}/@{term decode} functions and show that full binary trees are isomorphic to prefix codes, as done in the Coq development. Independently, we could generalize the development to $n$-ary trees. \<close>
(*<*) end (*>*)
Messung V0.5 in Prozent
¤ Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.0.78Bemerkung:
(vorverarbeitet am 2026-06-10)
¤
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.