Impressum Huffman.thy

Sprache: Isabelle

(*  Title:       An Isabelle/HOL Formalization of the Textbook Proof of Huffman's Algorithm
    Author:      Jasmin Christian Blanchette <blanchette at in.tum.de>, 2008
    Maintainer:  Jasmin Christian Blanchette <blanchette at in.tum.de>
*)

(*<*)
theory Huffman
  imports Main
begin
(*>*)

section ‹Introduction›

subsection ‹Binary Codes \label{binary-codes}›

text ‹
we want to encode strings over a finite source alphabet to sequences
bits. The approach used by ASCII and most other charsets is to map each
symbol to a distinct $k$-bit code word, where $k$ is fixed and is
8 or 16. To encode a string of symbols, we simply encode each symbol
turn. Decoding involves mapping each $k$-bit block back to the symbol it
.

-length codes are simple and fast, but they generally waste space. If
know the frequency $w_a$ of each source symbol $a$, we can save space
using shorter code words for the most frequent symbols. We
that a (variable-length) code is {\sl optimum\/} if it minimizes the sum
\sum_a w_a \vthinspace\delta_a$, where $\delta_a$ is the length of the binary
word for $a$. Information theory tells us that a code is optimum if
each source symbol $c$ the code word representing $c$ has length
$\textstyle \delta_c = \log_2 {1 \over p_c}, \qquad
\hbox{where}\enskip p_c = {w_c \over \sum_a w_a}.$$
number is generally not an integer, so we cannot use it directly.
, the above criterion is a useful yardstick and paves the way for
coding \cite{rissanen-1976}, a generalization of the method
here.

def\xabacabad{\xa\xb\xa\xc\xa\xb\xa\xd}%
an example, consider the source string `$\xabacabad$'. We have
$p_{\xa} = \tfrac{1}{2},\,\; p_{\xb} = \tfrac{1}{4},\,\;
p_{\xc} = \tfrac{1}{8},\,\; p_{\xd} = \tfrac{1}{8}.$$
optimum lengths for the binary code words are all integers, namely
$\delta_{\xa} = 1,\,\; \delta_{\xb} = 2,\,\; \delta_{\xc} = 3,\,\;
\delta_{\xd} = 3,$$
they are realized by the code
$C_1 = \{ \xa \mapsto 0,\, \xb \mapsto 10,\, \xc \mapsto 110,\,
\xd \mapsto 111 \}.$$
`$\xabacabad$' produces the 14-bit code word 01001100100111. The code
C_1$ is optimum: No code that unambiguously encodes source symbols one at a
could do better than $C_1$ on the input `$\xa\xb\xa\xc\xa\xb\xa\xd$'. In
, with a fixed-length code such as
$C_2 = \{ \xa \mapsto 00,\, \xb \mapsto 01,\, \xc \mapsto 10,\,
\xd \mapsto 11 \}$$
need at least 16~bits to encode `$\xabacabad$'.
›

subsection ‹Binary Trees›

text ‹
a program, binary codes can be represented by binary trees. For example,
trees\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-unbalanced.pdf}}}
\qquad \hbox{and} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-balanced.pdf}}}$$
to $C_1$ and $C_2$. The code word for a given
can be obtained as follows: Start at the root and descend toward the leaf
associated with the symbol one node at a time; generate a 0 whenever the
child of the current node is chosen and a 1 whenever the right child is
. The generated sequence of 0s and 1s is the code word.

avoid ambiguities, we require that only leaf nodes are labeled with symbols.
ensures that no code word is a prefix of another, thereby eliminating the
of all ambiguities.%
footnote{Strictly speaking, there is another potential source of ambiguity.
the alphabet consists of a single symbol $a$, that symbol could be mapped
the empty code word, and then any string $aa\ldots a$ would map to the
bit sequence, giving the decoder no way to recover the original string's
. This scenario can be ruled out by requiring that the alphabet has
2 or more.}
that have this property are called {\sl prefix codes}. As an example of a
that does not have this property, consider the code associated with the
\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-non-prefix.pdf}}}$$
observe that `$\xb\xb\xb$', `$\xb\xd$', and `$\xd\xb$' all map to the
word 111.

node in a code tree is assigned a {\sl weight}. For a leaf node, the
is the frequency of its symbol; for an inner node, it is the sum of the
of its subtrees. Code trees can be annotated with their weights:\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]%
{tree-abcd-unbalanced-weighted.pdf}}}
\qquad\qquad
\vcenter{\hbox{\includegraphics[scale=1.25]%
{tree-abcd-balanced-weighted.pdf}}}$$
our purposes, it is sufficient to consider only full binary trees (trees
inner nodes all have two children). This is because any inner node with
one child can advantageously be eliminated; for example,\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-non-full.pdf}}}
\qquad \hbox{becomes} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-full.pdf}}}$$
›

subsection ‹Huffman's Algorithm›

text ‹
Huffman \cite{huffman-1952} discovered a simple algorithm for
an optimum code tree for specified symbol frequencies:
a forest consisting of only leaf nodes, one for each symbol in the
, taking the given symbol frequencies as initial weights for the nodes.
pick the two trees
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}}
\qquad \hbox{and} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$

noindent\strut
the lowest weights and replace them with the tree
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2.pdf}}}$$
this process until only one tree is left.

an illustration, executing the algorithm for the frequencies
$f_{\xd} = 3,\,\; f_{\xe} = 11,\,\; f_{\xf} = 5,\,\; f_{\xs} = 7,\,\;
f_{\xz} = 2$$
rise to the following sequence of states:\strut

def\myscale{1}%
setbox\myboxi=\hbox{(9)\strut}%
setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step1.pdf}}%
setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step2.pdf}}%
$(1)\quad\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii} \qquad\qquad
(2)\enskip\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}$$

vskip.5\smallskipamount

noindent
setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step3.pdf}}%
setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step4.pdf}}%
setbox\myboxiv=\hbox{\includegraphics[scale=\myscale]{tree-prime-step5.pdf}}%
3)\quad\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}\hfill\quad
(4)\quad\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}\hfill
(5)\enskip\lower\ht\myboxiv\hbox{\raise\ht\myboxi\box\myboxiv\,}

smallskip
noindent
~(5) is an optimum tree for the given frequencies.
›

subsection ‹The Textbook Proof›

text ‹
does the algorithm work? In his article, Huffman gave some motivation but
real proof. For a proof sketch, we turn to Donald Knuth
cite[p.~403--404]{knuth-1997}:

begin{quote}
is not hard to prove that this method does in fact minimize the weighted
length (i.e., $\sum_a w_a \vthinspace\delta_a$), by induction on $m$.
we have $w_1 \le w_2 \le w_3 \le \cdots \le w_m$, where $m \ge 2$, and
that we are given a tree that minimizes the weighted path length.
Such a tree certainly exists, since only finitely many binary trees with $m$
nodes are possible.) Let $V$ be an internal node of maximum distance
the root. If $w_1$ and $w_2$ are not the weights already attached to the
of $V$, we can interchange them with the values that are already
; such an interchange does not increase the weighted path length. Thus
is a tree that minimizes the weighted path length and contains the
\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$
it is easy to prove that the weighted path length of such a tree is
if and only if the tree with
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}
\qquad \hbox{replaced by} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$
minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$.
end{quote}

noindent
is, however, a small oddity in this proof: It is not clear why we must
the existence of an optimum tree that contains the subtree
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$
, the formalization works without it.

et al.\ \cite[p.~385--391]{cormen-et-al-2001} provide a very similar
, articulated around the following propositions:

begin{quote}
textsl{\textbf{Lemma 16.2}} \\
$C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$.
$x$ and $y$ be two characters in $C$ having the lowest frequencies. Then
exists an optimal prefix code for $C$ in which the codewords for $x$ and
y$ have the same length and differ only in the last bit.

medskip

textsl{\textbf{Lemma 16.3}} \\
$C$ be a given alphabet with frequency $f[c]$ defined for each character
c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let
C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character
z$ added, so that $C' = C - \{x, y\} \cup {\{z\}}$; define $f$ for $C'$ as for
C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing an
prefix code for the alphabet $C'$. Then the tree $T$, obtained from
T'$ by replacing the leaf node for $z$ with an internal node having $x$ and
y$ as children, represents an optimal prefix code for the alphabet $C$.

medskip

textsl{\textbf{Theorem 16.4}} \\
\textsc{Huffman} produces an optimal prefix code.
end{quote}
›

subsection ‹Overview of the Formalization›

text ‹
document presents a formalization of the proof of Huffman's algorithm
using Isabelle/HOL \cite{nipkow-et-al-2008}. Our proof is based on the
proofs given by Knuth and Cormen et al. The development was done
of Laurent Th\'ery's Coq proof \cite{thery-2003,thery-2004},
through its ``cover'' concept represents a considerable departure from
textbook proof.

development consists of a little under 100 lemmas and theorems. Most of
have very short proofs thanks to the extensive use of simplification
and custom induction rules. The remaining proofs are written using the
proof format Isar \cite{wenzel-2008}.
›

subsection ‹Head of the Theory File›

text ‹
Isabelle theory starts in the standard way.

myskip

noindent
isacommand{theory} ‹Huffman› \\
isacommand{imports} ‹Main› \\
isacommand{begin}

myskip

noindent
attach the ‹simp› attribute to some predefined lemmas to add them to
default set of simplification rules.
›

declare
  Int_Un_distrib [simp]
  Int_Un_distrib2 [simp]
  max.absorb1 [simp]
  max.absorb2 [simp]

section ‹Definition of Prefix Code Trees and Forests \label{trees-and-forests}›

subsection ‹Tree Type›

text ‹
{\sl prefix code tree\/} is a full binary tree in which leaf nodes are of the
@{term "Leaf w a"}, where @{term a} is a symbol and @{term w} is the
associated with @{term a}, and inner nodes are of the form
{term "Node w t₁ t₂"}, where @{term t₁} and @{term t₂} are the left and
subtrees and @{term w} caches the sum of the weights of @{term t₁} and
{term t₂}. Prefix code trees are polymorphic on the symbol datatype~@{typ 'a}.
›

datatype 'a tree =
  Leaf nat 'a
| Node nat "('a tree)" "('a tree)"

subsection ‹Forest Type›

text ‹
intermediate steps of Huffman's algorithm involve a list of prefix code
, or {\sl prefix code forest}.
›

type_synonym 'a forest = "'a tree list"

subsection ‹Alphabet›

text ‹
{\sl alphabet\/} of a code tree is the set of symbols appearing in the
's leaf nodes.
›

primrec alphabet :: "'a tree → 'a set" where
"alphabet (Leaf w a) = {a}" |
"alphabet (Node w t₁ t₂) = alphabet t₁ ∪ alphabet t₂"

text ‹
sets and predicates, Isabelle gives us the choice between inductive
(\isakeyword{inductive\_set} and \isakeyword{inductive}) and
functions (\isakeyword{primrec}, \isakeyword{fun}, and
isakeyword{function}). In this development, we consistently favor recursion
induction, for two reasons:

begin{myitemize}
item Recursion gives rise to simplification rules that greatly help automatic
tactics. In contrast, reasoning about inductively defined sets and
involves introduction and elimination rules, which are more clumsy
simplification rules.

item Isabelle's counterexample generator \isakeyword{quickcheck}
cite{berghofer-nipkow-2004}, which we used extensively during the top-down
of the proof (together with \isakeyword{sorry}), has better support
recursive definitions.
end{myitemize}

alphabet of a forest is defined as the union of the alphabets of the trees
compose it. Although Isabelle supports overloading for non-overlapping
, we avoid many type inference problems by attaching an
\raise.3ex\hbox{‹_F›}' subscript to the forest generalizations of
defined on trees.
›

primrec alphabet_F :: "'a forest → 'a set" where
"alphabet_F [] = {}" |
"alphabet_F (t # ts) = alphabet t ∪ alphabet_F ts"

text ‹
are central to our proofs, and we need the following basic facts
them.
›

lemma finite_alphabet[simp]:
"finite (alphabet t)"
by (induct t) auto

lemma exists_in_alphabet:
"∃a. a ∈ alphabet t"
by (induct t) auto

subsection ‹Consistency›

text ‹
tree is {\sl consistent\/} if for each inner node the alphabets of the two
are disjoint. Intuitively, this means that every symbol in the
occurs in exactly one leaf node. Consistency is a sufficient condition
$\delta_a$ (the length of the {\sl unique\/} code word for $a$) to be
. Although this well\-formed\-ness property is not mentioned in algorithms
\cite{aho-et-al-1983,cormen-et-al-2001,knuth-1997}, it is essential
appears as an assumption in many of our lemmas.
›

primrec consistent :: "'a tree → bool" where
"consistent (Leaf w a) = True" |
"consistent (Node w t₁ t₂) =
(alphabet t₁ ∩ alphabet t₂ = {} ∧ consistent t₁ ∧ consistent t₂)"

primrec consistent_F :: "'a forest → bool" where
"consistent_F [] = True" |
"consistent_F (t # ts) =
(alphabet t ∩ alphabet_F ts = {} ∧ consistent t ∧ consistent_F ts)"

text ‹
of our proofs are by structural induction on consistent trees $t$ and
one symbol $a$. These proofs typically distinguish the following cases.

begin{myitemize}
item[] {\sc Base case:}\enspace $t = @{term "Leaf w b"}$.
item[] {\sc Induction step:}\enspace $t = @{term "Node w t₁ t₂"}$.
item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to
@{term t₁} but not to @{term t₂}.
item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to
@{term t₂} but not to @{term t₁}.
item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to
neither @{term t₁} nor @{term t₂}.
end{myitemize}

noindent
to the consistency assumption, we can rule out the subcase where $a$
to both subtrees.

of performing the above case distinction manually, we encode it in a
induction rule. This saves us from writing repetitive proof scripts and
Isabelle's automatic proof tactics.
›

lemma tree_induct_consistent[consumes 1, case_names base step₁ step₂ step₃]:
"[consistent t;
  ∧w_b b a. P (Leaf w_b b) a;
  ∧w t₁ t₂ a.
     [consistent t₁; consistent t₂; alphabet t₁ ∩ alphabet t₂ = {};
      a ∈ alphabet t₁; a ∉ alphabet t₂; P t₁ a; P t₂ a] ==>
     P (Node w t₁ t₂) a;
  ∧w t₁ t₂ a.
     [consistent t₁; consistent t₂; alphabet t₁ ∩ alphabet t₂ = {};
      a ∉ alphabet t₁; a ∈ alphabet t₂; P t₁ a; P t₂ a] ==>
     P (Node w t₁ t₂) a;
  ∧w t₁ t₂ a.
     [consistent t₁; consistent t₂; alphabet t₁ ∩ alphabet t₂ = {};
      a ∉ alphabet t₁; a ∉ alphabet t₂; P t₁ a; P t₂ a] ==>
     P (Node w t₁ t₂) a] ==>
P t a"

txt ‹
proof relies on the \textit{induction\_schema} and
textit{lexicographic\_order} tactics, which automate the most tedious
of deriving induction rules. The alternative would have been to perform
standard structural induction on @{term t} and proceed by cases, which is
but long-winded.
›

apply rotate_tac
apply induction_schema
       apply atomize_elim
       apply (metis consistent.simps(2) disjoint_iff_not_equal tree.exhaust)
by lexicographic_order

text ‹
\textit{induction\_schema} tactic reduces the putative induction rule to
proof obligations.
, it reuses the machinery that constructs the default induction
. The resulting proof obligations concern (a)~case completeness,
b)~invariant preservation (in our case, tree consistency), and
c)~wellfoundedness. For @{thm [source] tree_induct_consistent}, the obligations
a)~and (b) can be discharged using
's simplifier and classical reasoner, whereas (c) requires a single
of \textit{lexicographic\_order}, a tactic that was originally
to prove termination of recursive functions
cite{bulwahn-et-al-2007,krauss-2007,krauss-2009}.
›

subsection ‹Symbol Depths›

text ‹
{\sl depth\/} of a symbol (which we denoted by $\delta_a$ in
~\ref{binary-codes}) is the length of the path from the root to the
node labeled with that symbol, or equivalently the length of the code word
the symbol. Symbols that do not occur in the tree or that occur at the root
a one-node tree have depth 0. If a symbol occurs in several leaf nodes (which
happen with inconsistent trees), the depth is arbitrarily defined in terms
the leftmost node labeled with that symbol.
›

primrec depth :: "'a tree → 'a → nat" where
"depth (Leaf w b) a = 0" |
"depth (Node w t₁ t₂) a =
(if a ∈ alphabet t₁ then depth t₁ a + 1
  else if a ∈ alphabet t₂ then depth t₂ a + 1
  else 0)"

text ‹
definition may seem very inefficient from a functional programming
of view, but it does not matter, because unlike Huffman's algorithm, the
{const depth} function is merely a reasoning tool and is never actually
.
›

subsection ‹Height›

text ‹
{\sl height\/} of a tree is the length of the longest path from the root to
leaf node, or equivalently the length of the longest code word. This is
generalized to forests by taking the maximum of the trees' heights. Note
a tree has height 0 if and only if it is a leaf node, and that a forest has
0 if and only if all its trees are leaf nodes.
›

primrec height :: "'a tree → nat" where
"height (Leaf w a) = 0" |
"height (Node w t₁ t₂) = max (height t₁) (height t₂) + 1"

primrec height_F :: "'a forest → nat" where
"height_F [] = 0" |
"height_F (t # ts) = max (height t) (height_F ts)"

text ‹
depth of any symbol in the tree is bounded by the tree's height, and there
a symbol with a depth equal to the height.
›

lemma depth_le_height:
"depth t a ≤ height t"
by (induct t) auto

lemma exists_at_height:
"consistent t ==> ∃a ∈ alphabet t. depth t a = height t"
proof (induct t)
  case Leaf thus ?case by simp
next
  case (Node w t₁ t₂)
  note hyps = Node
  let ?t = "Node w t₁ t₂"
  from hyps obtain b where b: "b ∈ alphabet t₁" "depth t₁ b = height t₁" by auto
  from hyps obtain c where c: "c ∈ alphabet t₂" "depth t₂ c = height t₂" by auto
  let ?a = "if height t₁ ≥ height t₂ then b else c"
  from b c have "?a ∈ alphabet ?t" "depth ?t ?a = height ?t"
    using ‹consistent ?t› by auto
  thus "∃a ∈ alphabet ?t. depth ?t a = height ?t" ..
qed

text ‹
following elimination rules help Isabelle's classical prover, notably the
textit{auto} tactic. They are easy consequences of the inequation
{thm depth_le_height[no_vars]}.
›

lemma depth_max_heightE_left[elim!]:
"[depth t₁ a = max (height t₁) (height t₂);
  [depth t₁ a = height t₁; height t₁ ≥ height t₂] ==> P] ==>
P"
by (cut_tac t = t₁ and a = a in depth_le_height) simp

lemma depth_max_heightE_right[elim!]:
"[depth t₂ a = max (height t₁) (height t₂);
  [depth t₂ a = height t₂; height t₂ ≥ height t₁] ==> P] ==>
P"
by (cut_tac t = t₂ and a = a in depth_le_height) simp

text ‹
also need the following lemma.
›

lemma height_gt_0_alphabet_eq_imp_height_gt_0:
assumes "height t > 0" "consistent t" "alphabet t = alphabet u"
shows "height u > 0"
proof (cases t)
  case Leaf thus ?thesis using assms by simp
next
  case (Node w t₁ t₂)
  note t = Node
  from exists_in_alphabet obtain b where b: "b ∈ alphabet t₁" ..
  from exists_in_alphabet obtain c where c: "c ∈ alphabet t₂" ..
  from b c have bc: "b ≠ c" using t ‹consistent t› by fastforce
  show ?thesis
  proof (cases u)
    case Leaf thus ?thesis using b c bc t assms by auto
  next
    case Node thus ?thesis by simp
  qed
qed

subsection ‹Symbol Frequencies›

text ‹
{\sl frequency\/} of a symbol (which we denoted by $w_a$ in
~\ref{binary-codes}) is the sum of the weights attached to the
nodes labeled with that symbol. If the tree is consistent, the sum
at most one nonzero term. The frequency is then the weight of the leaf
labeled with the symbol, or 0 if there is no such node. The generalization
forests is straightforward.
›

primrec freq :: "'a tree → 'a → nat" where
"freq (Leaf w a) b = (if b = a then w else 0)" |
"freq (Node w t₁ t₂) b = freq t₁ b + freq t₂ b"

primrec freq_F :: "'a forest → 'a → nat" where
"freq_F [] b = 0" |
"freq_F (t # ts) b = freq t b + freq_F ts b"

text ‹
and symbol frequencies are intimately related. Simplification rules
that sums of the form @{term "freq t₁ a + freq t₂ a"} collapse to a
term when we know which tree @{term a} belongs to.
›

lemma notin_alphabet_imp_freq_0[simp]:
"a ∉ alphabet t ==> freq t a = 0"
by (induct t) simp+

lemma notin_alphabet_{F_imp_freq}_{F_0}[simp]:
"a ∉ alphabet_F ts ==> freq_F ts a = 0"
by (induct ts) simp+

lemma freq_0_right[simp]:
"[alphabet t₁ ∩ alphabet t₂ = {}; a ∈ alphabet t₁] ==> freq t₂ a = 0"
by (auto intro: notin_alphabet_imp_freq_0 simp: disjoint_iff_not_equal)

lemma freq_0_left[simp]:
"[alphabet t₁ ∩ alphabet t₂ = {}; a ∈ alphabet t₂] ==> freq t₁ a = 0"
by (auto simp: disjoint_iff_not_equal)

text ‹
trees are {\em comparable} if they have the same alphabet and symbol
. This is an important concept, because it allows us to state not
that the tree constructed by Huffman's algorithm is optimal but also that
has the expected alphabet and frequencies.

close this section with a more technical lemma.
›

lemma height_{F_0_imp_Leaf_freq}_{F_in_set}:
"[consistent_F ts; height_F ts = 0; a ∈ alphabet_F ts] ==>
Leaf (freq_F ts a) a ∈ set ts"
proof (induct ts)
  case Nil thus ?case by simp
next
  case (Cons t ts) show ?case using Cons by (cases t) auto
qed

subsection ‹Weight›

text ‹
@{term weight} function returns the weight of a tree. In the
{const Node} case, we ignore the weight cached in the node and instead
the tree's weight recursively. This makes reasoning simpler because we
then avoid specifying cache correctness as an assumption in our lemmas.
›

primrec weight :: "'a tree → nat" where
"weight (Leaf w a) = w" |
"weight (Node w t₁ t₂) = weight t₁ + weight t₂"

text ‹
weight of a tree is the sum of the frequencies of its symbols.

myskip

noindent
isacommand{lemma} ‹weight_eq_Sum_freq›: \\
\isachardoublequoteopen}$\displaystyle ‹consistent t ==> weight t› =
!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\! @{term "freq t a"}$%
\isachardoublequoteclose}

vskip-\myskipamount
›

(*<*)
lemma weight_eq_Sum_freq:
"consistent t ==> weight t = (∑a ∈ alphabet t. freq t a)"
(*>*)
by (induct t) (auto simp: sum.union_disjoint)

text ‹
assumption @{term "consistent t"} is not necessary, but it simplifies the
by letting us invoke the lemma @{thm [source] sum.union_disjoint}:
$‹[finite A; finite B; A ∩ B = {}] ==>›~\!\sum_{x\in A} @{term "g x"}
vthinspace \mathrel{+} \sum_{x\in B} @{term "g x"}\vthinspace = %
\!\!\sum_{x\in A \cup B}\! @{term "g x"}.$$
›

subsection ‹Cost›

text ‹
{\sl cost\/} of a consistent tree, sometimes called the {\sl weighted path
}, is given by the sum $\sum_{a \in @{term "alphabet t"}\,}
{term "freq t a"} \mathbin{‹*›} @{term "depth t a"}$
which we denoted by $\sum_a w_a \vthinspace\delta_a$ in
~\ref{binary-codes}). It obeys a simple recursive law.
›

primrec cost :: "'a tree → nat" where
"cost (Leaf w a) = 0" |
"cost (Node w t₁ t₂) = weight t₁ + cost t₁ + weight t₂ + cost t₂"

text ‹
interpretation of this recursive law is that the cost of a tree is the sum
the weights of its inner nodes \cite[p.~405]{knuth-1997}. (Recall that
@{term "weight (Node w t₁ t₂)"} = @{term "weight t₁ + weight t₂"}$.) Since
cost of a tree is such a fundamental concept, it seems necessary to prove
the above function definition is correct.

myskip

noindent
isacommand{theorem} ‹cost_eq_Sum_freq_mult_depth›: \\
\isachardoublequoteopen}$\displaystyle ‹consistent t ==> cost t› =
!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\!
{term "freq t a * depth t a"}$%
\isachardoublequoteclose}

myskip

noindent
proof is by structural induction on $t$. If $t = @{term "Leaf w b"}$, both
of the equation simplify to 0. This leaves the case $@{term t} =
{term "Node w t₁ t₂"}$. Let $A$, $A_1$, and $A_2$ stand for
{term "alphabet t"}, @{term "alphabet t₁"}, and @{term "alphabet t₂"},
. We have

$\begin{tabularx}{\textwidth}{@%
\hskip\leftmargin}cX@%
}}
& @{term "cost t"} \\
eq & \justif{definition of @{const cost}} \\
& $@{term "weight t₁ + cost t₁ + weight t₂ + cost t₂"}$ \\
eq & \justif{induction hypothesis} \\
& $@{term "weight t₁"} \mathrel{+}
\sum_{a\in A_1\,} @{term "freq t₁ a * depth t₁ a"} \mathrel{+} {}$ \\
& $@{term "weight t₂"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t₂ a * depth t₂ a"}$ \\
eq & \justif{definition of @{const depth}, consistency} \\[\extrah]
& $@{term "weight t₁"} \mathrel{+}
\sum_{a\in A_1\,} @{term "freq t₁ a * (depth t a - 1)"} \mathrel{+}
{}$ \\
& $@{term "weight t₂"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t₂ a * (depth t a - 1)"}$ \\[\extrah]
eq & \justif{distributivity of ‹*› and $\sum$ over $-$} \\[\extrah]
& $@{term "weight t₁"} \mathrel{+}
\sum_{a\in A_1\,} @{term "freq t₁ a * depth t a"} \mathrel{-}
\sum_{a\in A_1\,} @{term "freq t₁ a"} \mathrel{+} {}$ \\
& $@{term "weight t₂"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t₂ a * depth t a"} \mathrel{-}
\sum_{a\in A_2\,} @{term "freq t₂ a"}$ \\[\extrah]
eq & \justif{@{thm [source] weight_eq_Sum_freq}} \\[\extrah]
& $\sum_{a\in A_1\,} @{term "freq t₁ a * depth t a"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t₂ a * depth t a"}$ \\[\extrah]
eq & \justif{definition of @{const freq}, consistency} \\[\extrah]
& $\sum_{a\in A_1\,} @{term "freq t a * depth t a"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t a * depth t a"}$ \\[\extrah]
eq & \justif{@{thm [source] sum.union_disjoint}, consistency} \\
& $\sum_{a\in A_1\cup A_2\,} @{term "freq t a * depth t a"}$ \\
eq & \justif{definition of @{const alphabet}} \\
& $\sum_{a\in A\,} @{term "freq t a * depth t a"}$.
end{tabularx}$$

noindent
structured proof closely follows this argument.
›

(*<*)
theorem cost_eq_Sum_freq_mult_depth:
"consistent t ==> cost t = (∑a ∈ alphabet t. freq t a * depth t a)"
(*>*)
proof (induct t)
  case Leaf thus ?case by simp
next
  case (Node w t₁ t₂)
  let ?t = "Node w t₁ t₂"
  let ?A = "alphabet ?t" and ?A₁ = "alphabet t₁" and ?A₂ = "alphabet t₂"
  note c = ‹consistent ?t›
  note hyps = Node
  have d₂: "∧a. [?A₁ ∩ ?A₂ = {}; a ∈ ?A₂] ==> depth ?t a = depth t₂ a + 1"
    by auto
  have "cost ?t = weight t₁ + cost t₁ + weight t₂ + cost t₂" by simp
  also have "… = weight t₁ + (∑a ∈ ?A₁. freq t₁ a * depth t₁ a)
    + weight t₂ + (∑a ∈ ?A₂. freq t₂ a * depth t₂ a)"
    using hyps by simp
  also have "… = weight t₁ + (∑a ∈ ?A₁. freq t₁ a * (depth ?t a - 1))
    + weight t₂ + (∑a ∈ ?A₂. freq t₂ a * (depth ?t a - 1))"
    using c d₂ by simp
  also have "… = weight t₁ + (∑a ∈ ?A₁. freq t₁ a * depth ?t a)
    - (∑a ∈ ?A₁. freq t₁ a)
    + weight t₂ + (∑a ∈ ?A₂. freq t₂ a * depth ?t a)
    - (∑a ∈ ?A₂. freq t₂ a)"
    using c d₂ by (simp add: sum.distrib)
  also have "… = (∑a ∈ ?A₁. freq t₁ a * depth ?t a)
    + (∑a ∈ ?A₂. freq t₂ a * depth ?t a)"
    using c by (simp add: weight_eq_Sum_freq)
  also have "… = (∑a ∈ ?A₁. freq ?t a * depth ?t a)
    + (∑a ∈ ?A₂. freq ?t a * depth ?t a)"
    using c by auto
  also have "… = (∑a ∈ ?A₁ ∪ ?A₂. freq ?t a * depth ?t a)"
    using c by (simp add: sum.union_disjoint)
  also have "… = (∑a ∈ ?A. freq ?t a * depth ?t a)" by simp
  finally show ?case .
qed

text ‹
, it should come as no surprise that trees with height 0 have cost 0.
›

lemma height_0_imp_cost_0[simp]:
"height t = 0 ==> cost t = 0"
by (case_tac t) simp+

subsection ‹Optimality›

text ‹
tree is optimum if and only if its cost is not greater than that of any
tree. We can ignore inconsistent trees without loss of generality.
›

definition optimum :: "'a tree → bool" where
"optimum t =
(∀u. consistent u ⟶ alphabet t = alphabet u ⟶ freq t = freq u ⟶
  cost t ≤ cost u)"

section ‹Functional Implementation of Huffman's Algorithm \label{implementation}›

subsection ‹Cached Weight›

text ‹
{\sl cached weight\/} of a node is the weight stored directly in the node.
arguments rely on the computed weight (embodied by the @{const weight}
) rather than the cached weight, but the implementation of Huffman's
uses the cached weight for performance reasons.
›

primrec cachedWeight :: "'a tree → nat" where
"cachedWeight (Leaf w a) = w" |
"cachedWeight (Node w t₁ t₂) = w"

text ‹
cached weight of a leaf node is identical to its computed weight.
›

lemma height_0_imp_cachedWeight_eq_weight[simp]:
"height t = 0 ==> cachedWeight t = weight t"
by (case_tac t) simp+

subsection ‹Tree Union›

text ‹
implementation of Huffman's algorithm builds on two additional auxiliary
. The first one, ‹uniteTrees›, takes two trees
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}}
\qquad \hbox{and} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$

noindent
returns the tree\strut
$\includegraphics[scale=1.25]{tree-w1-w2.pdf}$$

vskip-.5\myskipamount
›

definition uniteTrees :: "'a tree → 'a tree → 'a tree" where
"uniteTrees t₁ t₂ = Node (cachedWeight t₁ + cachedWeight t₂) t₁ t₂"

text ‹
alphabet, consistency, and symbol frequencies of a united tree are easy to
to the homologous properties of the subtrees.
›

lemma alphabet_uniteTrees[simp]:
"alphabet (uniteTrees t₁ t₂) = alphabet t₁ ∪ alphabet t₂"
by (simp add: uniteTrees_def)

lemma consistent_uniteTrees[simp]:
"[consistent t₁; consistent t₂; alphabet t₁ ∩ alphabet t₂ = {}] ==>
consistent (uniteTrees t₁ t₂)"
by (simp add: uniteTrees_def)

lemma freq_uniteTrees[simp]:
"freq (uniteTrees t₁ t₂) a = freq t₁ a + freq t₂ a"
by (simp add: uniteTrees_def)

subsection ‹Ordered Tree Insertion›

text ‹
auxiliary function ‹insortTree› inserts a tree into a forest sorted
cached weight, preserving the sort order.
›

primrec insortTree :: "'a tree → 'a forest → 'a forest" where
"insortTree u [] = [u]" |
"insortTree u (t # ts) =
(if cachedWeight u ≤ cachedWeight t then u # t # ts else t # insortTree u ts)"

text ‹
resulting forest contains one more tree than the original forest. Clearly,
cannot be empty.
›

lemma length_insortTree[simp]:
"length (insortTree t ts) = length ts + 1"
by (induct ts) simp+

lemma insortTree_ne_Nil[simp]:
"insortTree t ts ≠ []"
by (case_tac ts) simp+

text ‹
alphabet, consistency, symbol frequencies, and height of a forest after
are easy to relate to the homologous properties of the original
and the inserted tree.
›

lemma alphabet_{F_insortTree}[simp]:
"alphabet_F (insortTree t ts) = alphabet t ∪ alphabet_F ts"
by (induct ts) auto

lemma consistent_{F_insortTree}[simp]:
"consistent_F (insortTree t ts) = consistent_F (t # ts)"
by (induct ts) auto

lemma freq_{F_insortTree}[simp]:
"freq_F (insortTree t ts) = (λa. freq t a + freq_F ts a)"
by (induct ts) (simp add: ext)+

lemma height_{F_insortTree}[simp]:
"height_F (insortTree t ts) = max (height t) (height_F ts)"
by (induct ts) auto

subsection ‹The Main Algorithm›

text ‹
's algorithm repeatedly unites the first two trees of the forest it
as argument until a single tree is left. It should initially be
with a list of leaf nodes sorted by weight. Note that it is not defined
the empty list.
›

fun huffman :: "'a forest → 'a tree" where
"huffman [t] = t" |
"huffman (t₁ # t₂ # ts) = huffman (insortTree (uniteTrees t₁ t₂) ts)"

text ‹
time complexity of the algorithm is quadratic in the size of the forest.
we eliminated the inner node's cached weight component, and instead
the weight each time it is needed, the complexity would remain
, but with a larger constant. Using a binary search in @{const
}, the corresponding imperative algorithm is $O(n \log n)$ if we keep
weight cache and $O(n^2)$ if we drop it. An $O(n)$ imperative implementation
possible by maintaining two queues, one containing the unprocessed leaf nodes
the other containing the combined trees \cite[p.~404]{knuth-1997}.

tree returned by the algorithm preserves the alphabet, consistency, and
frequencies of the original forest.
›

theorem alphabet_huffman[simp]:
"ts ≠ [] ==> alphabet (huffman ts) = alphabet_F ts"
by (induct ts rule: huffman.induct) auto

theorem consistent_huffman[simp]:
"[consistent_F ts; ts ≠ []] ==> consistent (huffman ts)"
by (induct ts rule: huffman.induct) simp+

theorem freq_huffman[simp]:
"ts ≠ [] ==> freq (huffman ts) a = freq_F ts a"
by (induct ts rule: huffman.induct) auto

section ‹Definition of Auxiliary Functions Used in the Proof \label{auxiliary}›

subsection ‹Sibling of a Symbol›

text ‹
{\sl sibling\/} of a symbol $a$ in a tree $t$ is the label of the node that
the (left or right) sibling of the node labeled with $a$ in $t$. If the
$a$ is not in $t$'s alphabet or it occurs in a node with no sibling
, we define the sibling as being $a$ itself; this gives us the nice property
if $t$ is consistent, then $@{term "sibling t a"} \not= a$ if and only if
a$ has a sibling. As an illustration, we have
@{term "sibling t a"} = b$,\vthinspace{} $@{term "sibling t b"} = a$,
$@{term "sibling t c"} = c$ for the tree\strut
$t \,= \vcenter{\hbox{\includegraphics[scale=1.25]{tree-sibling.pdf}}}$$
›

fun sibling :: "'a tree → 'a → 'a" where
"sibling (Leaf w_b b) a = a" |
"sibling (Node w (Leaf w_b b) (Leaf w_c c)) a =
     (if a = b then c else if a = c then b else a)" |
"sibling (Node w t₁ t₂) a =
     (if a ∈ alphabet t₁ then sibling t₁ a
      else if a ∈ alphabet t₂ then sibling t₂ a
      else a)"

text ‹
@{const sibling} is defined using sequential pattern matching
cite{krauss-2007,krauss-2009}, reasoning about it can become tedious.
rules therefore play an important role.
›

lemma notin_alphabet_imp_sibling_id[simp]:
"a ∉ alphabet t ==> sibling t a = a"
by (cases rule: sibling.cases[where x = "(t, a)"]) simp+

lemma height_0_imp_sibling_id[simp]:
"height t = 0 ==> sibling t a = a"
by (case_tac t) simp+

lemma height_gt_0_in_alphabet_imp_sibling_left[simp]:
"[height t₁ > 0; a ∈ alphabet t₁] ==>
sibling (Node w t₁ t₂) a = sibling t₁ a"
by (case_tac t₁) simp+

lemma height_gt_0_in_alphabet_imp_sibling_right[simp]:
"[height t₂ > 0; a ∈ alphabet t₁] ==>
sibling (Node w t₁ t₂) a = sibling t₁ a"
by (case_tac t₂) simp+

lemma height_gt_0_notin_alphabet_imp_sibling_left[simp]:
"[height t₁ > 0; a ∉ alphabet t₁] ==>
sibling (Node w t₁ t₂) a = sibling t₂ a"
by (case_tac t₁) simp+

lemma height_gt_0_notin_alphabet_imp_sibling_right[simp]:
"[height t₂ > 0; a ∉ alphabet t₁] ==>
sibling (Node w t₁ t₂) a = sibling t₂ a"
by (case_tac t₂) simp+

lemma either_height_gt_0_imp_sibling[simp]:
"height t₁ > 0 ∨ height t₂ > 0 ==>
sibling (Node w t₁ t₂) a =
     (if a ∈ alphabet t₁ then sibling t₁ a else sibling t₂ a)"
by auto

text ‹
following rules are also useful for reasoning about siblings and alphabets.
›

lemma in_alphabet_imp_sibling_in_alphabet:
"a ∈ alphabet t ==> sibling t a ∈ alphabet t"
by (induct t a rule: sibling.induct) auto

lemma sibling_ne_imp_sibling_in_alphabet:
"sibling t a ≠ a ==> sibling t a ∈ alphabet t"
using in_alphabet_imp_sibling_in_alphabet by force

text ‹
default induction rule for @{const sibling} distinguishes four cases.

begin{myitemize}
item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.
item[] {\sc Induction step 1:}\enskip
$t = @{term "Node w (Leaf w_b b) (Leaf w_c c)"}$.
item[] {\sc Induction step 2:}\enskip
$t = @{term "Node w (Node w₁ t₁₁ t₁₂) t₂"}$.
item[] {\sc Induction step 3:}\enskip
$t = @{term "Node w t₁ (Node w₂ t₂₁ t₂₂)"}$.
end{myitemize}

noindent
rule leaves much to be desired. First, the last two cases overlap and
normally be handled the same way, so they should be combined. Second, the
‹Node› constructors in the last two cases reduce readability.
, under the assumption that $t$ is consistent, we would like to perform
same case distinction on $a$ as we did for
{thm [source] tree_induct_consistent}, with the same benefits for automation.
observations lead us to develop a custom induction rule that
the following cases.

begin{myitemize}
item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.
item[] {\sc Induction step 1:}\enskip
$t = @{term "Node w (Leaf w_b b) (Leaf w_c c)"}$ with
@{prop "b ≠ c"}.
item[] \begin{flushleft}
{\sc Induction step 2:}\enskip
$t = @{term "Node w t₁ t₂"}$ and either @{term t₁} or @{term t₂}
has nonzero height.
\end{flushleft}
item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to
@{term t₁} but not to @{term t₂}.
item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to
@{term t₂} but not to @{term t₁}.
item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to
neither @{term t₁} nor @{term t₂}.
end{myitemize}

statement of the rule and its proof are similar to what we did for
trees, the main difference being that we now have two induction
instead of one.
›

lemma sibling_induct_consistent[consumes 1,
  case_names base step₁ step₂₁ step₂₂ step₂₃]:
"[consistent t;
  ∧w b a. P (Leaf w b) a;
  ∧w w_b b w_c c a. b ≠ c ==> P (Node w (Leaf w_b b) (Leaf w_c c)) a;
  ∧w t₁ t₂ a.
     [consistent t₁; consistent t₂; alphabet t₁ ∩ alphabet t₂ = {};
      height t₁ > 0 ∨ height t₂ > 0; a ∈ alphabet t₁;
      sibling t₁ a ∈ alphabet t₁; a ∉ alphabet t₂;
      sibling t₁ a ∉ alphabet t₂; P t₁ a] ==>
     P (Node w t₁ t₂) a;
  ∧w t₁ t₂ a.
     [consistent t₁; consistent t₂; alphabet t₁ ∩ alphabet t₂ = {};
      height t₁ > 0 ∨ height t₂ > 0; a ∉ alphabet t₁;
      sibling t₂ a ∉ alphabet t₁; a ∈ alphabet t₂;
      sibling t₂ a ∈ alphabet t₂; P t₂ a] ==>
     P (Node w t₁ t₂) a;
  ∧w t₁ t₂ a.
     [consistent t₁; consistent t₂; alphabet t₁ ∩ alphabet t₂ = {};
      height t₁ > 0 ∨ height t₂ > 0; a ∉ alphabet t₁; a ∉ alphabet t₂] ==>
     P (Node w t₁ t₂) a] ==>
P t a"
apply rotate_tac
apply induction_schema
   apply atomize_elim
   apply (case_tac t, simp)
   apply clarsimp
   apply (metis One_nat_def add_is_0 alphabet.simps(1) bot_nat_0.not_eq_extremum disjoint_iff
                exists_at_height height.simps(2) in_alphabet_imp_sibling_in_alphabet nat.simps(3)
                tree.exhaust)
by lexicographic_order

text ‹
custom induction rule allows us to prove new properties of @{const sibling}
little effort.
›

lemma sibling_sibling_id[simp]:
"consistent t ==> sibling t (sibling t a) = a"
by (induct t a rule: sibling_induct_consistent) simp+

lemma sibling_reciprocal:
"[consistent t; sibling t a = b] ==> sibling t b = a"
by auto

lemma depth_height_imp_sibling_ne:
"[consistent t; depth t a = height t; height t > 0; a ∈ alphabet t] ==>
sibling t a ≠ a"
by (induct t a rule: sibling_induct_consistent) auto

lemma depth_sibling[simp]:
"consistent t ==> depth t (sibling t a) = depth t a"
by (induct t a rule: sibling_induct_consistent) simp+

subsection ‹Leaf Interchange›

text ‹
‹swapLeaves› function takes a tree $t$ together with two symbols
a$, $b$ and their frequencies $@{term w_a}$, $@{term w_b}$, and returns the tree
t$ in which the leaf nodes labeled with $a$ and $b$ are exchanged. When
‹swapLeaves›, we normally pass @{term "freq t a"} and
{term "freq t b"} for @{term w_a} and @{term w_b}.

that we do not bother updating the cached weight of the ancestor nodes
performing the interchange. The cached weight is used only in the
of Huffman's algorithm, which does not invoke ‹swapLeaves›.
›

primrec swapLeaves :: "'a tree → nat → 'a → nat → 'a → 'a tree" where
"swapLeaves (Leaf w_c c) w_a a w_b b =
     (if c = a then Leaf w_b b else if c = b then Leaf w_a a else Leaf w_c c)" |
"swapLeaves (Node w t₁ t₂) w_a a w_b b =
     Node w (swapLeaves t₁ w_a a w_b b) (swapLeaves t₂ w_a a w_b b)"

text ‹
a symbol~$a$ with itself leaves the tree $t$ unchanged if $a$ does not
to it or if the specified frequencies @{term w_a} and @{term w_b} equal
{term "freq t a"}.
›

lemma swapLeaves_id_when_notin_alphabet[simp]:
"a ∉ alphabet t ==> swapLeaves t w a w' a = t"
by (induct t) simp+

lemma swapLeaves_id[simp]:
"consistent t ==> swapLeaves t (freq t a) a (freq t a) a = t"
by (induct t a rule: tree_induct_consistent) simp+

text ‹
alphabet, consistency, symbol depths, height, and symbol frequencies of the
@{term "swapLeaves t w_a a w_b b"} can be related to the homologous
of $t$.
›

lemma alphabet_swapLeaves:
"alphabet (swapLeaves t w_a a w_b b) =
     (if a ∈ alphabet t then
        if b ∈ alphabet t then alphabet t else (alphabet t - {a}) ∪ {b}
      else
        if b ∈ alphabet t then (alphabet t - {b}) ∪ {a} else alphabet t)"
by (induct t) auto

lemma consistent_swapLeaves[simp]:
"consistent t ==> consistent (swapLeaves t w_a a w_b b)"
by (induct t) (auto simp: alphabet_swapLeaves)

lemma depth_swapLeaves_neither[simp]:
"[consistent t; c ≠ a; c ≠ b] ==> depth (swapLeaves t w_a a w_b b) c = depth t c"
by (induct t a rule: tree_induct_consistent) (auto simp: alphabet_swapLeaves)

lemma height_swapLeaves[simp]:
"height (swapLeaves t w_a a w_b b) = height t"
by (induct t) simp+

lemma freq_swapLeaves[simp]:
"[consistent t; a ≠ b] ==>
freq (swapLeaves t w_a a w_b b) =
     (λc. if c = a then if b ∈ alphabet t then w_a else 0
          else if c = b then if a ∈ alphabet t then w_b else 0
          else freq t c)"
apply (rule ext)
apply (induct t)
by auto

text ‹
the lemmas concerned with the resulting tree's weight and cost, we avoid
on natural numbers by rearranging terms. For example, we write
$@{prop "weight (swapLeaves t w_a a w_b b) + freq t a = weight t + w_b"}$$
noindent
than the more conventional
$@{prop "weight (swapLeaves t w_a a w_b b) = weight t + w_b - freq t a"}.$$
Isabelle/HOL, these two equations are not equivalent, because by definition
m - n = 0$ if $n > m$. We could use the second equation and additionally
that @{prop "weight t ≥ freq t a"} (an easy consequence of
{thm [source] weight_eq_Sum_freq}), and then apply the \textit{arith}
, but it is much simpler to use the first equation and stay with
textit{simp} and \textit{auto}. Another option would be to use
instead of natural numbers.
›

lemma weight_swapLeaves:
"[consistent t; a ≠ b] ==>
if a ∈ alphabet t then
   if b ∈ alphabet t then
     weight (swapLeaves t w_a a w_b b) + freq t a + freq t b =
         weight t + w_a + w_b
   else
     weight (swapLeaves t w_a a w_b b) + freq t a = weight t + w_b
else
   if b ∈ alphabet t then
     weight (swapLeaves t w_a a w_b b) + freq t b = weight t + w_a
   else
     weight (swapLeaves t w_a a w_b b) = weight t"
proof (induct t a rule: tree_induct_consistent)
  ― ‹{\sc Base case:}\enspace $t = @{term "Leaf w b"}$›
  case base thus ?case by clarsimp
next
  ― ‹{\sc Induction step:}\enspace $t = @{term "Node w t₁ t₂"}$›
  ― ‹{\sc Subcase 1:}\enspace $a$ belongs to @{term t₁} but not to
@{term t₂}›
  case (step₁ w t₁ t₂ a) show ?case
  proof cases
    assume b: "b ∈ alphabet t₁"
    hence "b ∉ alphabet t₂" using step₁ by auto
    thus ?case using b step₁ by simp
  next
    assume "b ∉ alphabet t₁" thus ?case using step₁ by auto
  qed
next
  ― ‹{\sc Subcase 2:}\enspace $a$ belongs to @{term t₂} but not to
@{term t₁}›
  case (step₂ w t₁ t₂ a) show ?case
  proof cases
    assume b: "b ∈ alphabet t₁"
    hence "b ∉ alphabet t₂" using step₂ by auto
    thus ?case using b step₂ by simp
  next
    assume "b ∉ alphabet t₁" thus ?case using step₂ by auto
  qed
next
  ― ‹{\sc Subcase 3:}\enspace $a$ belongs to neither @{term t₁} nor
@{term t₂}›
  case (step₃ w t₁ t₂ a) show ?case
  proof cases
    assume b: "b ∈ alphabet t₁"
    hence "b ∉ alphabet t₂" using step₃ by auto
    thus ?case using b step₃ by simp
  next
    assume "b ∉ alphabet t₁" thus ?case using step₃ by auto
  qed
qed

lemma cost_swapLeaves:
"[consistent t; a ≠ b] ==>
if a ∈ alphabet t then
   if b ∈ alphabet t then
     cost (swapLeaves t w_a a w_b b) + freq t a * depth t a
     + freq t b * depth t b =
         cost t + w_a * depth t b + w_b * depth t a
   else
     cost (swapLeaves t w_a a w_b b) + freq t a * depth t a =
         cost t + w_b * depth t a
else
   if b ∈ alphabet t then
     cost (swapLeaves t w_a a w_b b) + freq t b * depth t b =
         cost t + w_a * depth t b
   else
     cost (swapLeaves t w_a a w_b b) = cost t"
proof (induct t)
  case Leaf show ?case by simp
next
  case (Node w t₁ t₂)
  note c = ‹consistent (Node w t₁ t₂)›
  note hyps = Node
  have w₁: "if a ∈ alphabet t₁ then
              if b ∈ alphabet t₁ then
                weight (swapLeaves t₁ w_a a w_b b) + freq t₁ a + freq t₁ b =
                    weight t₁ + w_a + w_b
                  else
                weight (swapLeaves t₁ w_a a w_b b) + freq t₁ a = weight t₁ + w_b
            else
              if b ∈ alphabet t₁ then
                weight (swapLeaves t₁ w_a a w_b b) + freq t₁ b = weight t₁ + w_a
              else
                weight (swapLeaves t₁ w_a a w_b b) = weight t₁" using hyps
    by (simp add: weight_swapLeaves)
  have w₂: "if a ∈ alphabet t₂ then
              if b ∈ alphabet t₂ then
                weight (swapLeaves t₂ w_a a w_b b) + freq t₂ a + freq t₂ b =
                    weight t₂ + w_a + w_b
              else
                weight (swapLeaves t₂ w_a a w_b b) + freq t₂ a = weight t₂ + w_b
            else
              if b ∈ alphabet t₂ then
                weight (swapLeaves t₂ w_a a w_b b) + freq t₂ b = weight t₂ + w_a
              else
                weight (swapLeaves t₂ w_a a w_b b) = weight t₂" using hyps
    by (simp add: weight_swapLeaves)
  show ?case
  proof cases
    assume a₁: "a ∈ alphabet t₁"
    hence a₂: "a ∉ alphabet t₂" using c by auto
    show ?case
    proof cases
      assume b₁: "b ∈ alphabet t₁"
      hence "b ∉ alphabet t₂" using c by auto
      thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
    next
      assume b₁: "b ∉ alphabet t₁" show ?case
      proof cases
        assume "b ∈ alphabet t₂" thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
      next
        assume "b ∉ alphabet t₂" thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
      qed
    qed
  next
    assume a₁: "a ∉ alphabet t₁" show ?case
    proof cases
      assume a₂: "a ∈ alphabet t₂" show ?case
      proof cases
        assume b₁: "b ∈ alphabet t₁"
        hence "b ∉ alphabet t₂" using c by auto
        thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
      next
        assume b₁: "b ∉ alphabet t₁" show ?case
        proof cases
          assume "b ∈ alphabet t₂" thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
        next
          assume "b ∉ alphabet t₂" thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
        qed
      qed
    next
      assume a₂: "a ∉ alphabet t₂" show ?case
      proof cases
        assume b₁: "b ∈ alphabet t₁"
        hence "b ∉ alphabet t₂" using c by auto
        thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
      next
        assume b₁: "b ∉ alphabet t₁" show ?case
        proof cases
          assume "b ∈ alphabet t₂" thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
        next
          assume "b ∉ alphabet t₂" thus ?case using a₁ a₂ b₁ w₁ w₂ hyps by simp
        qed
      qed
    qed
  qed
qed

text ‹
sense tells us that the following statement is valid: ``If Astrid
her house with Bernard's neighbor, Bernard becomes Astrid's new
.'' A similar property holds for binary trees.
›

lemma sibling_swapLeaves_sibling[simp]:
"[consistent t; sibling t b ≠ b; a ≠ b] ==>
sibling (swapLeaves t w_a a w_s (sibling t b)) a = b"
proof (induct t)
  case Leaf thus ?case by simp
next
  case (Node w t₁ t₂)
  note hyps = Node
  show ?case
  proof (cases "height t₁ = 0")
    case True
    note h₁ = True
    show ?thesis
    proof (cases t₁)
      case (Leaf w_c c)
      note l₁ = Leaf
      show ?thesis
      proof (cases "height t₂ = 0")
        case True
        note h₂ = True
        show ?thesis
        proof (cases t₂)
          case Leaf thus ?thesis using l₁ hyps by simp presburger
        next
          case Node thus ?thesis using h₂ by simp
        qed
      next
        case False
        note h₂ = False
        show ?thesis
        proof cases
          assume "c = b" thus ?thesis using l₁ h₂ hyps by simp
        next
          assume "c ≠ b"
          have "sibling t₂ b ∈ alphabet t₂" using ‹c ≠ b› l₁ h₂ hyps
            by (simp add: sibling_ne_imp_sibling_in_alphabet)
          thus ?thesis using ‹c ≠ b› l₁ h₂ hyps by auto
        qed
      qed
    next
      case Node thus ?thesis using h₁ by simp
    qed
  next
    case False
    note h₁ = False
    show ?thesis
    proof (cases "height t₂ = 0")
      case True
      note h₂ = True
      show ?thesis using h₁ h₂ hyps(1,3-5) by auto
    next
      case False
      note h₂ = False
      show ?thesis
      proof (cases "b ∈ alphabet t₁")
        case True thus ?thesis using h₁ h₂ hyps by auto
      next
        case False
        note b₁ = False
        show ?thesis
        proof (cases "b ∈ alphabet t₂")
          case True thus ?thesis using b₁ h₁ h₂ hyps
            by (auto simp: in_alphabet_imp_sibling_in_alphabet
                           alphabet_swapLeaves)
        next
          case False thus ?thesis using b₁ h₁ h₂ hyps by simp
        qed
      qed
    qed
  qed
qed

subsection ‹Symbol Interchange›

text ‹
‹swapSyms› function provides a simpler interface to
{const swapLeaves}, with @{term "freq t a"} and @{term "freq t b"} in place of
{term "w_a"} and @{term "w_b"}. Most lemmas about ‹swapSyms› are directly
from the homologous results about @{const swapLeaves}.
›

definition swapSyms :: "'a tree → 'a → 'a → 'a tree" where
"swapSyms t a b = swapLeaves t (freq t a) a (freq t b) b"

lemma swapSyms_id[simp]:
"consistent t ==> swapSyms t a a = t"
by (simp add: swapSyms_def)

lemma alphabet_swapSyms[simp]:
"[a ∈ alphabet t; b ∈ alphabet t] ==> alphabet (swapSyms t a b) = alphabet t"
by (simp add: swapSyms_def alphabet_swapLeaves)

lemma consistent_swapSyms[simp]:
"consistent t ==> consistent (swapSyms t a b)"
by (simp add: swapSyms_def)

lemma depth_swapSyms_neither[simp]:
"[consistent t; c ≠ a; c ≠ b] ==>
depth (swapSyms t a b) c = depth t c"
by (simp add: swapSyms_def)

lemma freq_swapSyms[simp]:
"[consistent t; a ∈ alphabet t; b ∈ alphabet t] ==>
freq (swapSyms t a b) = freq t"
by (case_tac "a = b") (simp add: swapSyms_def ext)+

lemma cost_swapSyms:
assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t"
shows "cost (swapSyms t a b) + freq t a * depth t a + freq t b * depth t b =
           cost t + freq t a * depth t b + freq t b * depth t a"
proof cases
  assume "a = b" thus ?thesis using assms by simp
next
  assume "a ≠ b"
  thus ?thesis using assms by (simp add: cost_swapLeaves swapSyms_def)
qed

text ‹
$a$'s frequency is lower than or equal to $b$'s, and $a$ is higher up in the
than $b$ or at the same level, then interchanging $a$ and $b$ does not
the tree's cost.
›

lemma cost_swapSyms_le:
assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" "freq t a ≤ freq t b"
        "depth t a ≤ depth t b"
shows "cost (swapSyms t a b) ≤ cost t"
proof -
  let ?aabb = "freq t a * depth t a + freq t b * depth t b"
  let ?abba = "freq t a * depth t b + freq t b * depth t a"
  have "?abba ≤ ?aabb" using assms(4-5)
    by (metis (no_types) Groups.add_ac(2) Groups.mult_ac(2) nat_arith.rule0 nat_le_add_iff1
        nat_le_add_iff2 zero_order(1))
  have "cost (swapSyms t a b) + ?aabb = cost t + ?abba" using assms(1-3)
    by (simp add: cost_swapSyms add.assoc[THEN sym])
  also have "… ≤ cost t + ?aabb" using ‹?abba ≤ ?aabb› by simp
  finally show ?thesis using assms(4-5) by simp
qed

text ‹
stated earlier, ``If Astrid exchanges her house with Bernard's neighbor,
becomes Astrid's new neighbor.''
›

lemma sibling_swapSyms_sibling[simp]:
"[consistent t; sibling t b ≠ b; a ≠ b] ==>
sibling (swapSyms t a (sibling t b)) a = b"
by (simp add: swapSyms_def)

text ‹
`If Astrid exchanges her house with Bernard, Astrid becomes Bernard's old
's new neighbor.''
›

lemma sibling_swapSyms_other_sibling[simp]:
"[consistent t; sibling t b ≠ a; sibling t b ≠ b; a ≠ b] ==>
sibling (swapSyms t a b) (sibling t b) = a"
by (metis consistent_swapSyms sibling_swapSyms_sibling sibling_reciprocal)

subsection ‹Four-Way Symbol Interchange \label{four-way-symbol-interchange}›

text ‹
@{const swapSyms} function exchanges two symbols $a$ and $b$. We use it
define the four-way symbol interchange function ‹swapFourSyms›, which
four symbols $a$, $b$, $c$, $d$ with $a \ne b$ and $c \ne d$, and
them so that $a$ and $b$ occupy $c$~and~$d$'s positions.

naive definition of this function would be
$@{prop "swapFourSyms t a b c d = swapSyms (swapSyms t a c) b d"}.$$
definition fails in the face of aliasing: If $a = d$, but
b \ne c$, then ‹swapFourSyms a b c d› would leave $a$ in $b$'s
.%
footnote{Cormen et al.\ \cite[p.~390]{cormen-et-al-2001} forgot to consider
case in their proof. Thomas Cormen indicated in a personal communication
this will be corrected in the next edition of the book.}
›

definition swapFourSyms :: "'a tree → 'a → 'a → 'a → 'a → 'a tree" where
"swapFourSyms t a b c d =
(if a = d then swapSyms t b c
  else if b = c then swapSyms t a d
  else swapSyms (swapSyms t a c) b d)"

text ‹
about @{const swapFourSyms} are easy to prove by expanding its
.
›

lemma alphabet_swapFourSyms[simp]:
"[a ∈ alphabet t; b ∈ alphabet t; c ∈ alphabet t; d ∈ alphabet t] ==>
alphabet (swapFourSyms t a b c d) = alphabet t"
by (simp add: swapFourSyms_def)

lemma consistent_swapFourSyms[simp]:
"consistent t ==> consistent (swapFourSyms t a b c d)"
by (simp add: swapFourSyms_def)

lemma freq_swapFourSyms[simp]:
"[consistent t; a ∈ alphabet t; b ∈ alphabet t; c ∈ alphabet t;
  d ∈ alphabet t] ==>
freq (swapFourSyms t a b c d) = freq t"
by (auto simp: swapFourSyms_def)

text ‹
`If Astrid and Bernard exchange their houses with Carmen and her neighbor,
and Bernard will now be neighbors.''
›

lemma sibling_swapFourSyms_when_4th_is_sibling:
assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" "c ∈ alphabet t"
        "a ≠ b" "sibling t c ≠ c"
shows "sibling (swapFourSyms t a b c (sibling t c)) a = b"
proof (cases "a ≠ sibling t c ∧ b ≠ c")
  case True show ?thesis
  proof -
    let ?d = "sibling t c"
    let ?t_s = "swapFourSyms t a b c ?d"
    have abba: "(sibling ?t_s a = b) = (sibling ?t_s b = a)" using ‹consistent t›
      by (metis consistent_swapFourSyms sibling_reciprocal)
    have s: "sibling t c = sibling (swapSyms t a c) a" using True assms
      by (metis sibling_reciprocal sibling_swapSyms_sibling)
    have "sibling ?t_s b = sibling (swapSyms t a c) ?d" using s True assms
      by (auto simp: swapFourSyms_def)
    also have "… = a" using True assms
      by (metis sibling_reciprocal sibling_swapSyms_other_sibling swapLeaves_id swapSyms_def)
    finally have "sibling ?t_s b = a" .
    with abba show ?thesis ..
  qed
next
  case False thus ?thesis using assms
    by (auto intro: sibling_reciprocal simp: swapFourSyms_def)
qed

subsection ‹Sibling Merge›

text ‹
a symbol $a$, the ‹mergeSibling› function transforms the tree

setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}%
setbox\myboxii=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-ab.pdf}}%
mydimeni=\ht\myboxii
$\vcenter{\box\myboxii}
\qquad \hbox{into} \qquad
\smash{\lower\ht\myboxi\hbox{\raise.5\mydimeni\box\myboxi}}$$
frequency of $a$ in the result is the sum of the original frequencies of
a$ and $b$, so as not to alter the tree's weight.
›

fun mergeSibling :: "'a tree → 'a → 'a tree" where
"mergeSibling (Leaf w_b b) a = Leaf w_b b" |
"mergeSibling (Node w (Leaf w_b b) (Leaf w_c c)) a =
     (if a = b ∨ a = c then Leaf (w_b + w_c) a
      else Node w (Leaf w_b b) (Leaf w_c c))" |
"mergeSibling (Node w t₁ t₂) a =
     Node w (mergeSibling t₁ a) (mergeSibling t₂ a)"

text ‹
definition of @{const mergeSibling} has essentially the same structure as
of @{const sibling}. As a result, the custom induction rule that we
for @{const sibling} works equally well for reasoning about
{const mergeSibling}.
›

lemmas mergeSibling_induct_consistent = sibling_induct_consistent

text ‹
properties of @{const mergeSibling} echo those of @{const sibling}. Like
@{const sibling}, simplification rules are crucial.
›

lemma notin_alphabet_imp_mergeSibling_id[simp]:
"a ∉ alphabet t ==> mergeSibling t a = t"
by (induct t a rule: mergeSibling.induct) simp+

lemma height_gt_0_imp_mergeSibling_left[simp]:
"height t₁ > 0 ==>
mergeSibling (Node w t₁ t₂) a =
     Node w (mergeSibling t₁ a) (mergeSibling t₂ a)"
by (case_tac t₁) simp+

lemma height_gt_0_imp_mergeSibling_right[simp]:
"height t₂ > 0 ==>
mergeSibling (Node w t₁ t₂) a =
     Node w (mergeSibling t₁ a) (mergeSibling t₂ a)"
by (case_tac t₂) simp+

lemma either_height_gt_0_imp_mergeSibling[simp]:
"height t₁ > 0 ∨ height t₂ > 0 ==>
mergeSibling (Node w t₁ t₂) a =
     Node w (mergeSibling t₁ a) (mergeSibling t₂ a)"
by auto

lemma alphabet_mergeSibling[simp]:
"[consistent t; a ∈ alphabet t] ==>
alphabet (mergeSibling t a) = (alphabet t - {sibling t a}) ∪ {a}"
by (induct t a rule: mergeSibling_induct_consistent) auto

lemma consistent_mergeSibling[simp]:
"consistent t ==> consistent (mergeSibling t a)"
by (induct t a rule: mergeSibling_induct_consistent) auto

lemma freq_mergeSibling:
"[consistent t; a ∈ alphabet t; sibling t a ≠ a] ==>
freq (mergeSibling t a) =
     (λc. if c = a then freq t a + freq t (sibling t a)
          else if c = sibling t a then 0
          else freq t c)"
by (induct t a rule: mergeSibling_induct_consistent) (auto simp: fun_eq_iff)

lemma weight_mergeSibling[simp]:
"weight (mergeSibling t a) = weight t"
by (induct t a rule: mergeSibling.induct) simp+

text ‹
$a$ has a sibling, merging $a$ and its sibling reduces $t$'s cost by
{term "freq t a + freq t (sibling t a)"}.
›

lemma cost_mergeSibling:
"[consistent t; sibling t a ≠ a] ==>
cost (mergeSibling t a) + freq t a + freq t (sibling t a) = cost t"
by (induct t a rule: mergeSibling_induct_consistent) auto

subsection ‹Leaf Split›

text ‹
‹splitLeaf› function undoes the merging performed by
{const mergeSibling}: Given two symbols $a$, $b$ and two frequencies
@{term w_a}$, $@{term w_b}$, it transforms
setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}%
setbox\myboxii=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-ab.pdf}}%
$\smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}}
\qquad \hbox{into} \qquad
\vcenter{\box\myboxii}$$
the resulting tree, $a$ has frequency @{term w_a} and $b$ has frequency
{term w_b}. We normally invoke it with @{term w_a}~and @{term w_b} such that
{prop "freq t a = w_a + w_b"}.
›

primrec splitLeaf :: "'a tree → nat → 'a → nat → 'a → 'a tree" where
"splitLeaf (Leaf w_c c) w_a a w_b b =
(if c = a then Node w_c (Leaf w_a a) (Leaf w_b b) else Leaf w_c c)" |
"splitLeaf (Node w t₁ t₂) w_a a w_b b =
Node w (splitLeaf t₁ w_a a w_b b) (splitLeaf t₂ w_a a w_b b)"

primrec splitLeaf_F :: "'a forest → nat → 'a → nat → 'a → 'a forest" where
"splitLeaf_F [] w_a a w_b b = []" |
"splitLeaf_F (t # ts) w_a a w_b b =
     splitLeaf t w_a a w_b b # splitLeaf_F ts w_a a w_b b"

text ‹
leaf nodes affects the alphabet, consistency, symbol frequencies,
, and cost in unsurprising ways.
›

lemma notin_alphabet_imp_splitLeaf_id[simp]:
"a ∉ alphabet t ==> splitLeaf t w_a a w_b b = t"
by (induct t) simp+

lemma notin_alphabet_{F_imp_splitLeaf}_{F_id}[simp]:
"a ∉ alphabet_F ts ==> splitLeaf_F ts w_a a w_b b = ts"
by (induct ts) simp+

lemma alphabet_splitLeaf[simp]:
"alphabet (splitLeaf t w_a a w_b b) =
(if a ∈ alphabet t then alphabet t ∪ {b} else alphabet t)"
by (induct t) simp+

lemma consistent_splitLeaf[simp]:
"[consistent t; b ∉ alphabet t] ==> consistent (splitLeaf t w_a a w_b b)"
by (induct t) auto

lemma freq_splitLeaf[simp]:
"[consistent t; b ∉ alphabet t] ==>
freq (splitLeaf t w_a a w_b b) =
(if a ∈ alphabet t then (λc. if c = a then w_a else if c = b then w_b else freq t c)
  else freq t)"
by (induct t b rule: tree_induct_consistent) (rule ext, auto)+

lemma weight_splitLeaf[simp]:
"[consistent t; a ∈ alphabet t; freq t a = w_a + w_b] ==>
weight (splitLeaf t w_a a w_b b) = weight t"
by (induct t a rule: tree_induct_consistent) simp+

lemma cost_splitLeaf[simp]:
"[consistent t; a ∈ alphabet t; freq t a = w_a + w_b] ==>
cost (splitLeaf t w_a a w_b b) = cost t + w_a + w_b"
by (induct t a rule: tree_induct_consistent) simp+

subsection ‹Weight Sort Order›

text ‹
invariant of Huffman's algorithm is that the forest is sorted by weight.
is expressed by the ‹sortedByWeight› function.
›

fun sortedByWeight :: "'a forest → bool" where
"sortedByWeight [] = True" |
"sortedByWeight [t] = True" |
"sortedByWeight (t₁ # t₂ # ts) =
(weight t₁ ≤ weight t₂ ∧ sortedByWeight (t₂ # ts))"

text ‹
function obeys the following fairly obvious laws.
›

lemma sortedByWeight_Cons_imp_sortedByWeight:
"sortedByWeight (t # ts) ==> sortedByWeight ts"
by (case_tac ts) simp+

lemma sortedByWeight_Cons_imp_forall_weight_ge:
"sortedByWeight (t # ts) ==> ∀u ∈ set ts. weight u ≥ weight t"
by (induct ts arbitrary: t) force+

lemma sortedByWeight_insortTree:
"[sortedByWeight ts; height t = 0; height_F ts = 0] ==>
sortedByWeight (insortTree t ts)"
by (induct ts rule: sortedByWeight.induct) auto

subsection ‹Pair of Minimal Symbols›

text ‹
‹minima› predicate expresses that two symbols
a$, $b \in @{term "alphabet t"}$ have the lowest frequencies in the tree $t$.
symbols need not be uniquely defined.
›

definition minima :: "'a tree → 'a → 'a → bool" where
"minima t a b =
(a ∈ alphabet t ∧ b ∈ alphabet t ∧ a ≠ b
  ∧ (∀c ∈ alphabet t. c ≠ a ⟶ c ≠ b ⟶
    freq t c ≥ freq t a ∧ freq t c ≥ freq t b))"

section ‹Formalization of the Textbook Proof \label{formalization}›

subsection ‹Four-Way Symbol Interchange Cost Lemma›

text ‹
$a$ and $b$ are minima, and $c$ and $d$ are at the very bottom of the tree,
exchanging $a$ and $b$ with $c$ and $d$ does not increase the cost.
, we have\strut

${\it cost\/}
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima-abcd.pdf}}}
\,\mathop{\le}\;\;\;
{\it cost\/}
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima.pdf}}}$$

noindent
cost property is part of Knuth's proof:

begin{quote}
$V$ be an internal node of maximum distance from the root. If $w_1$ and
w_2$ are not the weights already attached to the children of $V$, we can
them with the values that are already there; such an interchange
not increase the weighted path length.
end{quote}

noindent
~16.2 in Cormen et al.~\cite[p.~389]{cormen-et-al-2001} expresses a
property, which turns out to be a corollary of our cost property:

begin{quote}
$C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$.
$x$ and $y$ be two characters in $C$ having the lowest frequencies. Then
exists an optimal prefix code for $C$ in which the codewords for $x$ and
y$ have the same length and differ only in the last bit.
end{quote}

vskip-.75\myskipamount
›

lemma cost_swapFourSyms_le:
assumes
  "consistent t" "minima t a b" "c ∈ alphabet t" "d ∈ alphabet t"
  "depth t c = height t" "depth t d = height t" "c ≠ d"
shows "cost (swapFourSyms t a b c d) ≤ cost t"
proof -
  note lems = swapFourSyms_def minima_def cost_swapSyms_le depth_le_height
  show ?thesis
  proof (cases "a ≠ d ∧ b ≠ c")
    case True show ?thesis
    proof cases
      assume "a = c" thus ?thesis
      using assms by (metis Orderings.order_eq_iff True cost_swapSyms_le depth_le_height minima_def
                            swapFourSyms_def swapSyms_id)
    next
      assume "a ≠ c" show ?thesis
      proof cases
        assume "b = d" thus ?thesis using ‹a ≠ c› True assms
          by (simp add: lems)
      next
        assume "b ≠ d"
        have "cost (swapFourSyms t a b c d) ≤ cost (swapSyms t a c)"
          using ‹b ≠ d› ‹a ≠ c› True assms by (clarsimp simp: lems)
        also have "… ≤ cost t" using ‹b ≠ d› ‹a ≠ c› True assms
          by (clarsimp simp: lems)
        finally show ?thesis .
      qed
    qed
  next
    case False thus ?thesis using assms by (clarsimp simp: lems)
  qed
qed

subsection ‹Leaf Split Optimality Lemma \label{leaf-split-optimality}›

text ‹
tree @{term "splitLeaf t w_a a w_b b"} is optimum if $t$ is optimum, under a
assumptions, notably that $a$ and $b$ are minima of the new tree and
@{prop "freq t a = w_a + w_b"}.
:\strut

setbox\myboxi=\hbox{\includegraphics[scale=1.2]{tree-splitLeaf-a.pdf}}%
setbox\myboxii=\hbox{\includegraphics[scale=1.2]{tree-splitLeaf-ab.pdf}}%
${\it optimum\/} \smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}}
\,\mathop{\Longrightarrow}\;\;\;
{\it optimum\/} \vcenter{\box\myboxii}$$

corresponds to the following fragment of Knuth's proof:

begin{quote}
it is easy to prove that the weighted path length of such a tree is
if and only if the tree with
$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}
\qquad \hbox{replaced by} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$
minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$.
end{quote}

noindent
only need the ``if'' direction of Knuth's equivalence. Lemma~16.3 in
et al.~\cite[p.~391]{cormen-et-al-2001} expresses essentially the same
:

begin{quote}
$C$ be a given alphabet with frequency $f[c]$ defined for each character
c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let
C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character
z$ added, so that $C' = C - \{x, y\} \cup {\{z\}}$; define $f$ for $C'$ as for
C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing an
prefix code for the alphabet $C'$. Then the tree $T$, obtained from
T'$ by replacing the leaf node for $z$ with an internal node having $x$ and
y$ as children, represents an optimal prefix code for the alphabet $C$.
end{quote}

noindent
proof is as follows: We assume that $t$ has a cost less than or equal to
of any other comparable tree~$v$ and show that
{term "splitLeaf t w_a a w_b b"} has a cost less than or equal to that of any
comparable tree $u$. By @{thm [source] exists_at_height} and
{thm [source] depth_height_imp_sibling_ne}, we know that some symbols $c$ and
d$ appear in sibling nodes at the very bottom of~$u$:
$\includegraphics[scale=1.25]{tree-splitLeaf-cd.pdf}$$
The question mark is there to remind us that we know nothing specific about
u$'s structure.) From $u$ we construct a new tree
{term "swapFourSyms u a b c d"} in which the minima $a$ and $b$ are siblings:
$\includegraphics[scale=1.25]{tree-splitLeaf-abcd.pdf}$$
$a$ and $b$ gives a tree comparable with $t$, which we can use to
$v$ in the assumption:
$\includegraphics[scale=1.25]{tree-splitLeaf-abcd-aba.pdf}$$
this instantiation, the proof is easy:
$\begin{tabularx}{\textwidth}{@%
\hskip\leftmargin}cX@%
}}
& @{term "cost (splitLeaf t a w_a b w_b)"} \\
eq & \justif{@{thm [source] cost_splitLeaf}} \\
& @{term "cost t + w_a + w_b"} \\
kern-1em$\leq$\kern-1em & \justif{assumption} \\[-2ex]
& $‹cost (›
\overbrace{\strut\!@{term "mergeSibling (swapFourSyms u a b c d) a"}\!}
^{\smash{\hbox{$v$}}}‹) + w_a + w_b›$ \\[\extrah]
eq & \justif{@{thm [source] cost_mergeSibling}} \\
& @{term "cost (swapFourSyms u a b c d)"} \\
kern-1em$\leq$\kern-1em & \justif{@{thm [source] cost_swapFourSyms_le}} \\
& @{term "cost u"}. \\
end{tabularx}$$

noindent
contrast, the proof in Cormen et al.\ is by contradiction: Essentially, they
that there exists a tree $u$ with a lower cost than
{term "splitLeaf t a w_a b w_b"} and show that there exists a tree~$v$
a lower cost than~$t$, contradicting the hypothesis that $t$ is optimum. In
of @{thm [source] cost_swapFourSyms_le}, they invoke their lemma~16.2,
is questionable since $u$ is not necessarily optimum.%
footnote{Thomas Cormen commented that this step will be clarified in the
edition of the book.}

proof relies on the following lemma, which asserts that $a$ and $b$ are
of $u$.
›

lemma twice_freq_le_imp_minima:
"[∀c ∈ alphabet t. w_a ≤ freq t c ∧ w_b ≤ freq t c;
  alphabet u = alphabet t ∪ {b}; a ∈ alphabet u; a ≠ b;
  freq u = (λc. if c = a then w_a else if c = b then w_b else freq t c)] ==>
minima u a b"
by (simp add: minima_def)

text ‹
comes the key lemma.
›

lemma optimum_splitLeaf:
assumes "consistent t" "optimum t" "a ∈ alphabet t" "b ∉ alphabet t"
        "freq t a = w_a + w_b" "∀c ∈ alphabet t. freq t c ≥ w_a ∧ freq t c ≥ w_b"
shows "optimum (splitLeaf t w_a a w_b b)"
proof (unfold optimum_def, clarify)
  fix u
  let ?t' = "splitLeaf t w_a a w_b b"
  assume c_u: "consistent u" and a_u: "alphabet ?t' = alphabet u" and
    f_u: "freq ?t' = freq u"
  show "cost ?t' ≤ cost u"
  proof (cases "height ?t' = 0")
    case True thus ?thesis by simp
  next
    case False
    hence h_u: "height u > 0" using a_u assms
      by (auto intro: height_gt_0_alphabet_eq_imp_height_gt_0)
    have a_a: "a ∈ alphabet u" using a_u assms by fastforce
    have a_b: "b ∈ alphabet u" using a_u assms by fastforce
    have ab: "a ≠ b" using assms by blast

    from exists_at_height[OF c_u]
    obtain c where a_c: "c ∈ alphabet u" and d_c: "depth u c = height u" ..
    let ?d = "sibling u c"

    have dc: "?d ≠ c" using d_c c_u h_u a_c by (metis depth_height_imp_sibling_ne)
    have a_d: "?d ∈ alphabet u" using dc
      by (rule sibling_ne_imp_sibling_in_alphabet)
    have d_d: "depth u ?d = height u" using d_c c_u by simp

    let ?u' = "swapFourSyms u a b c ?d"
    have c_u_': "consistent ?u'" using c_u by simp
    have a_u_': "alphabet ?u' = alphabet u" using a_a a_b a_c a_d a_u by simp
    have f_u_': "freq ?u' = freq u" using a_a a_b a_c a_d c_u f_u by simp
    have s_a: "sibling ?u' a = b" using c_u a_a a_b a_c ab dc
      by (rule sibling_swapFourSyms_when_4th_is_sibling)

    let ?v = "mergeSibling ?u' a"
    have c_v: "consistent ?v" using c_u_' by simp
    have a_v: "alphabet ?v = alphabet t" using s_a c_u_' a_u_' a_a a_u assms by auto
    have f_v: "freq ?v = freq t"
      using s_a c_u_' a_u_' f_u_' f_u[THEN sym] ab a_u[THEN sym] assms
      by (simp add: freq_mergeSibling ext)

    have "cost ?t' = cost t + w_a + w_b" using assms by simp
    also have "… ≤ cost ?v + w_a + w_b" using c_v a_v f_v ‹optimum t›
      by (simp add: optimum_def)
    also have "… = cost ?u'"
      proof -
        have "cost ?v + freq ?u' a + freq ?u' (sibling ?u' a) = cost ?u'"
          using c_u_' s_a assms by (subst cost_mergeSibling) auto
        moreover have "w_a = freq ?u' a" "w_b = freq ?u' b"
          using f_u_' f_u[THEN sym] assms by clarsimp+
        ultimately show ?thesis using s_a by simp
      qed
    also have "… ≤ cost u"
      proof -
        have "minima u a b" using a_u f_u assms
          by (subst twice_freq_le_imp_minima) auto
        with c_u show ?thesis using a_c a_d d_c d_d dc[THEN not_sym]
          by (rule cost_swapFourSyms_le)
      qed
    finally show ?thesis .
  qed
qed

subsection ‹Leaf Split Commutativity Lemma \label{leaf-split-commutativity}›

text ‹
key property of Huffman's algorithm is that once it has combined two
-weight trees using @{const uniteTrees}, it does not visit these trees
again. This suggests that splitting a leaf node before applying the
should give the same result as applying the algorithm first and
the leaf node afterward. The diagram below illustrates the
:\strut

def\myscale{1.05}%
setbox\myboxi=\hbox{(9)\strut}%
setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{forest-a.pdf}}%
$(1)\,\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}$$

smallskip

setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-splitLeaf-a.pdf}}%
setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]%
{forest-splitLeaf-ab.pdf}}%
mydimeni=\wd\myboxii

noindent
2a)\,\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}%
\qquad\qquad\quad
(2b)\,\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}\quad{}

setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]%
{tree-splitLeaf-ab.pdf}}%
setbox\myboxiv=\hbox{\includegraphics[scale=\myscale]%
{tree-huffman-splitLeaf-ab.pdf}}%
mydimenii=\wd\myboxiii
vskip1.5\smallskipamount
noindent
3a)\,\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}%
\qquad\qquad\quad
(3b)\,\hfill\lower\ht\myboxiv\hbox{\raise\ht\myboxi\box\myboxiv}%
\quad\hfill{}

noindent
the original forest (1), we can either run the algorithm (2a) and then
$a$ (3a) or split $a$ (2b) and then run the algorithm (3b). Our goal is
show that trees (3a) and (3b) are identical. Formally, we prove that
$@{term "splitLeaf (huffman ts) w_a a w_b b"} =
@{term "huffman (splitLeaf_F ts w_a a w_b b)"}$$
@{term ts} is consistent, @{term "a ∈ alphabet_F ts"}, @{term
b ∉ alphabet_F ts"}, and $@{term "freq_F ts a"} = @{term w_a}
mathbin{‹+›} @{term "w_b"}$. But before we can prove this
lemma, we need to introduce a few simple lemmas.
›

lemma cachedWeight_splitLeaf[simp]:
"cachedWeight (splitLeaf t w_a a w_b b) = cachedWeight t"
by (case_tac t) simp+

lemma splitLeaf_{F_insortTree_when_in_alphabet_left}[simp]:
"[a ∈ alphabet t; consistent t; a ∉ alphabet_F ts; freq t a = w_a + w_b] ==>
splitLeaf_F (insortTree t ts) w_a a w_b b = insortTree (splitLeaf t w_a a w_b b) ts"
by (induct ts) simp+

lemma splitLeaf_{F_insortTree_when_in_alphabet}_{F_tail}[simp]:
"[a ∈ alphabet_F ts; consistent_F ts; a ∉ alphabet t; freq_F ts a = w_a + w_b] ==>
splitLeaf_F (insortTree t ts) w_a a w_b b =
insortTree t (splitLeaf_F ts w_a a w_b b)"
proof (induct ts)
  case Nil thus ?case by simp
next
  case (Cons u us) show ?case
  proof (cases "a ∈ alphabet u")
    case True
    hence "a ∉ alphabet_F us" using Cons by auto
    thus ?thesis using Cons by auto
  next
    case False thus ?thesis using Cons by simp
  qed
qed

text ‹
are now ready to prove the commutativity lemma.
›

lemma splitLeaf_huffman_commute:
"[consistent_F ts; a ∈ alphabet_F ts; freq_F ts a = w_a + w_b] ==>
splitLeaf (huffman ts) w_a a w_b b = huffman (splitLeaf_F ts w_a a w_b b)"
proof (induct ts rule: huffman.induct)
  case (1 t) thus ?case by simp
next
  case (2 t₁ t₂ ts)
  note hyps = 2
  show ?case
  proof (cases "a ∈ alphabet t₁")
    case True
    hence "a ∉ alphabet t₂" "a ∉ alphabet_F ts" using hyps by auto
    thus ?thesis using hyps by (simp add: uniteTrees_def)
  next
    case False
    note a₁ = False
    show ?thesis
    proof (cases "a ∈ alphabet t₂")
      case True
      hence "a ∉ alphabet_F ts" using hyps by auto
      thus ?thesis using a₁ hyps by (simp add: uniteTrees_def)
    next
      case False
      thus ?thesis using a₁ hyps by simp
    qed
  qed
next
  case 3 thus ?case by simp
qed

text ‹
important consequence of the commutativity lemma is that applying Huffman's
on a forest of the form
$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees.pdf}}}$$
the same result as applying the algorithm on the ``flat'' forest
$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$
by splitting the leaf node $a$ into two nodes $a$, $b$ with
$@{term w_a}$, $@{term w_b}$. The lemma effectively
a way to flatten the forest at each step of the algorithm. Its
is implicit in the textbook proof.
›

subsection ‹Optimality Theorem›

text ‹
are one lemma away from our main result.
›

theorem optimum_huffman:
"[consistent_F ts; height_F ts = 0; sortedByWeight ts; ts ≠ []] ==>
optimum (huffman ts)"

txt ‹
input @{term ts} is assumed to be a nonempty consistent list of leaf nodes
by weight. The proof is by induction on the length of the forest
{term ts}. Let @{term ts} be
$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-flat.pdf}}}$$
$w_a \le w_b \le w_c \le w_d \le \cdots \le w_z$. If @{term ts} consists
a single leaf node, the node has cost 0 and is therefore optimum. If
{term ts} has length 2 or more, the first step of the algorithm leaves us with
term
${\it huffman\/}\enskip\; \vcenter{\hbox{\includegraphics[scale=1.25]%
{forest-uniteTrees.pdf}}}$$
the diagram, we put the newly created tree at position 2 in the forest; in
, it could be anywhere. By @{thm [source] splitLeaf_huffman_commute},
above tree equals\strut
${\it splitLeaf\/}\;\left({\it huffman\/}\enskip\;
\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}
\;\right)\,‹w_a a w_b b›.$$
prove that this tree is optimum, it suffices by
{thm [source] optimum_splitLeaf} to show that\strut
${\it huffman\/}\enskip\;
\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$
optimum, which follows from the induction hypothesis.\strut
›

proof (induct ts rule: length_induct)
  ― ‹\sc Complete induction step›
  case (1 ts)
  note hyps = 1
  show ?case
  proof (cases ts)
    case Nil thus ?thesis using ‹ts ≠ []› by fast
  next
    case (Cons t_a ts')
    note ts = Cons
    show ?thesis
    proof (cases ts')
      case Nil thus ?thesis using ts hyps by (simp add: optimum_def)
    next
      case (Cons t_b ts'')
      note ts' = Cons
      show ?thesis
      proof (cases t_a)
        case (Leaf w_a a)
        note l_a = Leaf
        show ?thesis
        proof (cases t_b)
          case (Leaf w_b b)
          note l_b = Leaf
          show ?thesis
          proof -
            let ?us = "insortTree (uniteTrees t_a t_b) ts''"
            let ?us' = "insortTree (Leaf (w_a + w_b) a) ts''"
            let ?t_s = "splitLeaf (huffman ?us') w_a a w_b b"

            have e₁: "huffman ts = huffman ?us" using ts' ts by simp
            have e₂: "huffman ?us = ?t_s" using l_a l_b ts' ts hyps
              by (auto simp: splitLeaf_huffman_commute uniteTrees_def)

            have "optimum (huffman ?us')" using l_a ts' ts hyps
              by (drule_tac x = ?us' in spec)
                 (auto dest: sortedByWeight_Cons_imp_sortedByWeight
                  simp: sortedByWeight_insortTree)
            hence "optimum ?t_s" using l_a l_b ts' ts hyps
              apply simp
              apply (rule optimum_splitLeaf)
              by (auto dest!: height_{F_0_imp_Leaf_freq}_{F_in_set}
                  sortedByWeight_Cons_imp_forall_weight_ge)
            thus "optimum (huffman ts)" using e₁ e₂ by simp
          qed
        next
          case Node thus ?thesis using ts' ts hyps by simp
        qed
      next
        case Node thus ?thesis using ts' ts hyps by simp
      qed
    qed
  qed
qed

text ‹
isakeyword{end}

myskip

noindent
what have we achieved? Assuming that our definitions really mean what we
them to mean, we established that our functional implementation of
's algorithm, when invoked properly, constructs a binary tree that
an optimal prefix code for the specified alphabet and frequencies.
Isabelle's code generator \cite{haftmann-nipkow-2007}, we can convert the
code into Standard ML, OCaml, or Haskell and use it in a real
.

a side note, the @{thm [source] optimum_huffman} theorem assumes that the
@{term ts} passed to @{const huffman} consists exclusively of leaf nodes.
is tempting to relax this restriction, by requiring instead that the forest
{term ts} has the lowest cost among forests of the same size. We would define
of a forest as follows:
$\begin{aligned}[t]
@{prop "optimum_F ts"}\,\;‹=›\;\,
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "macro" is null
& ‹length ts = length us ⟶ consistent_F us ⟶› \\[-2.5pt]
& ‹alphabet_F ts = alphabet_F us ⟶ freq_F ts = freq_F us ⟶›
\[-2.5pt]
& @{prop "cost_F ts ≤ cost_F us"})\end{aligned}$$
$‹cost_F [] = 0›$ and
@{prop "cost_F (t # ts) = cost t + cost_F ts"}$. However, the modified
does not hold. A counterexample is the optimum forest
$\includegraphics{forest-optimal.pdf}$$
which the algorithm constructs the tree
$\vcenter{\hbox{\includegraphics{tree-suboptimal.pdf}}}
\qquad \hbox{of greater cost than} \qquad
\vcenter{\hbox{\includegraphics{tree-optimal.pdf}}}$$
›

section ‹Related Work \label{related-work}›

text ‹
Th\'ery's Coq formalization of Huffman's algorithm \cite{thery-2003,
-2004} is an obvious yardstick for our work. It has a somewhat wider
, proving among others the isomorphism between prefix codes and full binary
. With 291 theorems, it is also much larger.

\'ery identified the following difficulties in formalizing the textbook
:

begin{enumerate}
item The leaf interchange process that brings the two minimal symbols together
is tedious to formalize.

item The sibling merging process requires introducing a new symbol for the
merged node, which complicates the formalization.

item The algorithm constructs the tree in a bottom-up fashion. While top-down
procedures can usually be proved by structural induction, bottom-up
procedures often require more sophisticated induction principles and
larger invariants.

item The informal proof relies on the notion of depth of a node. Defining this
notion formally is problematic, because the depth can only be seen as a
function if the tree is composed of distinct nodes.
end{enumerate}

circumvent these difficulties, Th\'ery introduced the ingenious concept of
. A forest @{term ts} is a {\em cover\/} of a tree~$t$ if $t$ can be built
@{term ts} by adding inner nodes on top of the trees in @{term ts}. The
``cover'' is easier to understand if the binary trees are drawn with the
at the bottom of the page, like natural trees. Huffman's algorithm is
refinement of the cover concept. The main proof consists in showing that
cost of @{term "huffman ts"} is less than or equal to that of any other
for which @{term ts} is a cover. It relies on a few auxiliary definitions,
an ``ordered cover'' concept that facilitates structural induction
a four-argument depth predicate (confusingly called @{term height}).
also play a central role.

, our experience suggests that the potential problems identified
Th\'ery can be overcome more directly without too much work, leading to a
proof:

begin{enumerate}
item Formalizing the leaf interchange did not prove overly tedious. Among our
95~lemmas and theorems, 24 concern @{const swapLeaves},
@{const swapSyms}, and @{const swapFourSyms}.

item The generation of a new symbol for the resulting node when merging two
sibling nodes in @{const mergeSibling} was trivially solved by reusing
one of the two merged symbols.

item The bottom-up nature of the tree construction process was addressed by
using the length of the forest as the induction measure and by merging
the two minimal symbols, as in Knuth's proof.

item By restricting our attention to consistent trees, we were able to define
the @{const depth} function simply and meaningfully.
end{enumerate}
›

section ‹Conclusion \label{conclusion}›

text ‹
goal of most formal proofs is to increase our confidence in a result. In
case of Huffman's algorithm, however, the chances that a bug would have
unnoticed for the 56 years since its publication, under the scrutiny of
computer scientists, seem extremely low; and the existence of a Coq
should be sufficient to remove any remaining doubts.

main contribution of this document has been to demonstrate that the
proof of Huffman's algorithm can be elegantly formalized using
state-of-the-art theorem prover such as Isabelle/HOL. In the process, we
a few minor snags in the proof given in Cormen et
.~\cite{cormen-et-al-2001}.

also found that custom induction rules, in combination with suitable
rules, greatly help the automatic proof tactics, sometimes
30-line proof scripts to one-liners. We successfully applied this
for handling both the ubiquitous ``datatype + well\-formed\-ness
'' combination (@{typ "'a tree"} + @{const consistent}) and functions
by sequential pattern matching (@{const sibling} and
{const mergeSibling}). Our experience suggests that such rules, which are
in formalizations, are highly valuable and versatile. Moreover,
's \textit{induction\_schema} and \textit{lexicographic\_order} tactics
these easy to prove.

the proof of Huffman's algorithm also led to a deeper
of this classic algorithm. Many of the lemmas, notably the leaf
commutativity lemma of Section~\ref{leaf-split-commutativity}, have not
found in the literature and express fundamental properties of the
. Other discoveries did not find their way into the final proof. In
, each step of the algorithm appears to preserve the invariant that
nodes in a forest are ordered by weight from left to right, bottom to top,
in the example below:\strut
$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-zigzag.pdf}}}$$
is not hard to prove formally that a tree exhibiting this property is
. On the other hand, proving that the algorithm preserves this invariant
difficult---more difficult than formalizing the textbook proof---and
a suggestion for future work.

few other directions for future work suggest themselves. First, we could
some of our hypotheses, notably our restriction to full and
binary trees. The current formalization says nothing about the
's application for data compression, so the next step could be to
the proof's scope to cover @{term encode}/@{term decode} functions
show that full binary trees are isomorphic to prefix codes, as done in the
development. Independently, we could generalize the development to $n$-ary
.
›

(*<*)
end
(*>*)

Messung V0.5 in Prozent

¤ Dauer der Verarbeitung: 0.120 Sekunden (vorverarbeitet am 2026-07-03) ¤

Wurzel

Suchen

PVS Prover

Isabelle Prover

NIST Cobol Testsuite

Cephes Mathematical Library

Vienna Development Method

Haftungshinweis

Die Informationen auf dieser Webseite wurden nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit, noch Qualität der bereit gestellten Informationen zugesichert.

Bemerkung:

Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.