begin with the specification of string matching given by 🍋‹‹Chapter~16› in
Bird:PearlsofFAD:2010"›. (References to ``Bird'' in the following are to this text.) Note that
assume @{const ‹eq›} has some nice properties (see \S\ref{sec:equality}) and
strict lists.
's strategy for deriving KMP from this specification is encoded in the following lemmas:
we can rewrite @{const ‹endswith›} as a composition of a predicate with a
{const ‹sfoldl›}, then we can rewrite @{const ‹matches›} into a @{const ‹sscanl›}.
›
lemma fork_sfoldl: shows"sfoldl⋅f1⋅z1 && sfoldl⋅f2⋅z2 = sfoldl⋅(Λ (a, b) z. (f1⋅a⋅z, f2⋅b⋅z))⋅(z1, z2)" (is"?lhs = ?rhs") proof(rule cfun_eqI) fix xs show"?lhs⋅xs = ?rhs⋅xs" by (induct xs arbitrary: z1 z2) simp_all qed
proceeds by reworking @{const ‹endswith›} into the form required by @{thm [source] "Bird_strategy"}.
is eased by an alternative definition of @{const ‹endswith›}.
›
lemma sfilter_supto: assumes"0 ≤ d" shows"sfilter⋅(Λ x. le⋅(MkI⋅n - x)⋅(MkI⋅d))⋅(supto⋅(MkI⋅m)⋅(MkI⋅n)) = supto⋅(MkI⋅(if m ≤ n - d then n - d else m))⋅(MkI⋅n)" (is"?sfilterp⋅?suptomn = _") proof(cases "m ≤ n - d") case True moreover from True assms have"?sfilterp⋅?suptomn = ?sfilterp⋅(supto⋅(MkI⋅m)⋅(MkI⋅(n - d - 1)) :@ supto⋅(MkI⋅(n - d))⋅(MkI⋅n))" using supto_split1 by auto moreoverfrom True assms have"?sfilterp⋅(supto⋅(MkI⋅m)⋅(MkI⋅(n - d - 1))) = [::]"by auto ultimatelyshow ?thesis by (clarsimp intro!: trans[OF sfilter_cong[OF refl] sfilter_const_TT]) next case False thenshow ?thesis by (clarsimp intro!: trans[OF sfilter_cong[OF refl] sfilter_const_TT]) qed
lemma endswith_eq_sdrop: "endswith⋅pat⋅xs = eq⋅pat⋅(sdrop⋅(slength⋅xs - slength⋅pat)⋅xs)" proof(cases "pat = ⊥""xs = ⊥" rule: bool.exhaust[case_product bool.exhaust]) case False_False thenshow ?thesis by (cases "endswith⋅pat⋅xs";
simp only: endswith.simps cfcomp2 stails_sdrop';
force simp: zero_Integer_def one_Integer_def elim: slengthE) qed simp_all
lemma endswith_def2: ―‹ Bird p127 › shows"endswith⋅pat⋅xs = eq⋅pat⋅(shead⋅(sfilter⋅(Λ x. prefix⋅x⋅pat)⋅(stails⋅xs)))"(is"?lhs = ?rhs") proof(cases "pat = ⊥""xs = ⊥" rule: bool.exhaust[case_product bool.exhaust]) case False_False from False_False obtain patl xsl where *: "slength⋅xs = MkI⋅xsl""0 ≤ xsl""slength⋅pat = MkI⋅patl""0 ≤ patl" by (meson Integer.exhaust slength_bottom_iff slength_ge_0) let ?patl_xsl = "if patl ≤ xsl then xsl - patl else 0" have"?rhs = eq⋅pat⋅(shead⋅(sfilter⋅(Λ x. le⋅(slength⋅x)⋅(slength⋅pat) andalso prefix⋅x⋅pat)⋅(stails⋅xs)))" by (subst prefix_slength_strengthen) simp alsohave"… = eq⋅pat⋅(shead⋅(sfilter⋅(Λ x. prefix⋅x⋅pat)⋅(sfilter⋅(Λ x. le⋅(slength⋅x)⋅(slength⋅pat))⋅(stails⋅xs))))" by (simp add: sfilter_sfilter') alsohave"… = eq⋅pat⋅(shead⋅(smap⋅(Λ k. sdrop⋅k⋅xs)⋅(sfilter⋅(Λ k. prefix⋅(sdrop⋅k⋅xs)⋅pat)⋅(sfilter⋅(Λ k. le⋅(slength⋅(sdrop⋅k⋅xs))⋅(MkI⋅patl))⋅(supto⋅(MkI⋅0)⋅(MkI⋅xsl))))))" using‹slength⋅xs = MkI⋅xsl›‹slength⋅pat = MkI⋅patl› by (simp add: stails_sdrop' sfilter_smap' cfcomp1 zero_Integer_def) alsohave"… = eq⋅pat⋅(shead⋅(smap⋅(Λ k. sdrop⋅k⋅xs)⋅(sfilter⋅(Λ k. prefix⋅(sdrop⋅k⋅xs)⋅pat)⋅(sfilter⋅(Λ k. le⋅(MkI⋅xsl - k)⋅(MkI⋅patl))⋅(supto⋅(MkI⋅0)⋅(MkI⋅xsl))))))" using‹slength⋅xs = MkI⋅xsl› by (subst (2) sfilter_cong[where p'="Λ x. le⋅(MkI⋅xsl - x)⋅(MkI⋅patl)"]; fastforce simp: zero_Integer_def) alsohave"… = If prefix⋅(sdrop⋅(MkI⋅?patl_xsl)⋅xs)⋅pat then eq⋅pat⋅(sdrop⋅(MkI⋅?patl_xsl)⋅xs) else eq⋅pat⋅(shead⋅(smap⋅(Λ k. sdrop⋅k⋅xs)⋅(sfilter⋅(Λ x. prefix⋅(sdrop⋅x⋅xs)⋅pat)⋅(supto⋅(MkI⋅(?patl_xsl + 1))⋅(MkI⋅xsl)))))" using False_False ‹0 ≤ xsl›‹0 ≤ patl› by (subst sfilter_supto) (auto simp: If_distr one_Integer_def) alsohave"… = ?lhs" (is"If ?c then _ else ?else = _") proof(cases ?c) case TT with‹slength⋅xs = MkI⋅xsl›‹slength⋅pat = MkI⋅patl› show ?thesis by (simp add: endswith_eq_sdrop sdrop_neg zero_Integer_def) next case FF ―‹ Recursive case: the lists generated by ‹supto› are too short › have"?else = shead⋅(smap⋅(Λ x. eq⋅pat⋅(sdrop⋅x⋅xs))⋅(sfilter⋅(Λ x. prefix⋅(sdrop⋅x⋅xs)⋅pat)⋅(supto⋅(MkI⋅(?patl_xsl + 1))⋅(MkI⋅xsl))))" using FF by (subst shead_smap_distr[where f="eq⋅pat", symmetric]) (auto simp: cfcomp1) alsohave"… = shead⋅(smap⋅(Λ x. seq⋅x⋅FF)⋅(sfilter⋅(Λ x. prefix⋅(sdrop⋅x⋅xs)⋅pat)⋅(supto⋅(MkI⋅(?patl_xsl + 1))⋅(MkI⋅xsl))))" using False_False * by (subst smap_cong[OF refl, where f'="Λ x. seq⋅x⋅FF"]) (auto simp: zero_Integer_def split: if_splits) alsofrom * FF have"… = ?lhs" apply (auto 00 simp: shead_smap_distr seq_conv_if endswith_eq_sdrop zero_Integer_def dest!: prefix_FF_not_snilD) apply (metis (full_types) le_MkI_MkI linorder_not_less order_refl prefix_FF_not_snilD sdrop_all zless_imp_add1_zle) using FF apply fastforce apply (metis add.left_neutral le_MkI_MkI linorder_not_less order_refl prefix_FF_not_snilD sdrop_0(1) sdrop_all zero_Integer_def zless_imp_add1_zle) done finallyshow ?thesis using FF by simp qed (simp add: False_False) finallyshow ?thesis .. qed simp_all
text‹
then generalizes @{term ‹sfilter⋅(Λ x. prefix⋅x⋅pat) oo stails›} to @{term ‹split›},
``‹split⋅pat⋅xs› splits ‹pat› into two lists ‹us› and ‹vs› so that
{prop ‹us :@ vs = pat›} and ‹us› is the longest suffix of ‹xs› that is a prefix of ‹pat›.''
demonstrates that @{const ‹op›} is partially correct wrt @{const ‹split›}, i.e.,
{prop "op⋅pat⋅(split⋅pat⋅xs)⋅x ⊑ split⋅pat⋅(xs :@ [:x:])"}. For total correctness we
prove that @{const ‹op›} terminates on well-defined arguments with an inductive argument.
@{thm [source] "split_sfoldl_op"} we can rewrite @{const ‹op›} into a more perspicuous form
exhibits how KMP handles the failure of the text to continue matching the pattern:
›
fixrec
op' :: "[:'a::Eq_def:] → [:'a:] × [:'a:] → 'a → [:'a:] × [:'a:]" where
[simp del]: "op'⋅pat⋅(us, vs)⋅x = ( If prefix⋅[:x:]⋅vs then (us :@ [:x:], stail⋅vs) ― ‹ continue matching › else If snull⋅us then ([::], pat) ― ‹ fail at the start of the pattern: discard ‹x›› else sfoldl⋅(op'⋅pat)⋅([::], pat)⋅(stail⋅us :@ [:x:]) ― ‹ fail later: discard ‹shead⋅us› and determine where to restart › )"
text‹
if ‹x› continues the pattern match then we
the @{const ‹split›} of ‹pat›
in ‹us› and ‹vs›. Otherwise we
to find a prefix of ‹pat› to continue matching
. If we have yet to make any progress (i.e., ‹us =
::]›) we restart with the entire ‹pat› (aka ‹z›) and discard ‹x›. Otherwise, because a
cannot begin with @{term ‹us :@ [:x:]›}, we @{const ‹split›} ‹pat› (aka ‹z›) by
@{const ‹op'›} over @{term ‹stail⋅us :@ [:x:]›}. The remainder of the
is about memoising this last computation.
is not yet the full KMP algorithm as it lacks what we call the
K' optimisation, which we add in \S\ref{sec:KMP:data_refinement}.
that a termination proof for @{const "op'"} in HOL is tricky due
its use of higher-order nested recursion via @{const ‹sfoldl›}.
lemma sfoldl_op'_strict[simp]: "op'⋅pat⋅(sfoldl⋅(op'⋅pat)⋅(us, ⊥)⋅xs)⋅x = ⊥" by (induct xs arbitrary: x rule: srev_induct) simp_all
lemma op'_op: shows"op'⋅pat⋅usvs⋅x = op⋅pat⋅usvs⋅x" proof(cases "pat = ⊥""x = ⊥" rule: bool.exhaust[case_product bool.exhaust]) case True_False thenshow ?thesis apply (subst op'.unfold) apply (subst op.unfold) apply simp done next case False_False thenshow ?thesis proof(induct usvs arbitrary: x rule: op_induct) case (step usvs x) have *: "sfoldl⋅(op'⋅pat)⋅([::], pat)⋅xs = sfoldl⋅(op⋅pat)⋅([::], pat)⋅xs" if"lt⋅(slength⋅xs)⋅(slength⋅(cfst⋅usvs)) = TT"for xs using that proof(induct xs rule: srev_induct) case (ssnoc x' xs') from ssnoc(1,2,4) have"lt⋅(slength⋅xs')⋅(slength⋅(cfst⋅usvs)) = TT" using lt_slength_0(2) lt_trans by auto moreover from step(2) ssnoc(1,2,4) have"lt⋅(slength⋅(cfst⋅(split⋅pat⋅xs')))⋅(slength⋅(cfst⋅usvs)) = TT" using lt_trans split_length_lt by (auto 100) ultimatelyshow ?caseby (simp add: ssnoc.hyps split_sfoldl_op split_snoc_op step) qed simp_all from step.prems show ?case apply (subst op'.unfold) apply (subst op.unfold) apply (clarsimp simp: If2_def[symmetric] snull_FF_conv split_sfoldl_op[symmetric] * split: If2_splits) apply (clarsimp simp: split_sfoldl_op step split_length_lt) done qed qed simp_all
subsection‹ Step 2: Data refinement and the `K' optimisation \label{sec:KMP:data_refinement} ›
text‹
memoises the restart computation in @{const ‹op'›} in two steps. The first reifies
control structure of @{const ‹op'›} into a non-wellfounded tree, which we discuss here. The
increases the sharing in this tree; see \S\ref{sec:KMP:increase_sharing}.
, we cache the @{term ‹sfoldl⋅(op'⋅pat)⋅([::], pat)⋅(stail⋅us :@ [:x:])›}
in @{const ‹op'›} by finding a ``representation'' type @{typ "'t"}
the ``abstract'' type @{typ ‹[:'a::Eq_def:] × [:'a:]›}, a
of functions @{term ‹rep :: [:'a::Eq_def:] × [:'a:] → 't›},
{term ‹abs :: 't → [:'a::Eq_def:] × [:'a:]›} where @{prop ‹abs oo rep = ID›}, and then
a derived form of @{const ‹op'›} that works on @{typ "'t"} rather
@{typ "[:'a::Eq_def:] × [:'a:]"}. We also take the opportunity to add the `K' optimisation in the form of the @{term ‹next›}
.
such steps are essentially @{emph ‹deus ex machina›}, we try to provide some intuition
showing the new definitions.
›
domain 'a tree ―‹ Bird p130 ›
= Null
| Node (label :: 'a) (lazy left :: "'a tree") (lazy right :: "'a tree") ―‹ Strict in the label @{typ "'a"} ›
(*<*)
lemma tree_injects'[simp]: ―‹ An unconditional form of @{thm [source] tree.injects}. › "(Node⋅a⋅l⋅r = Node⋅a'⋅l'⋅r') ⟷ (a = a' ∧ (a ≠⊥⟶ l = l' ∧ r = r'))" by (cases "a = ⊥"; clarsimp)
lemma match_Node_mplus_match_Node: "match_Node⋅x⋅k1 +++ match_Node⋅x⋅k2 = match_Node⋅x⋅(Λ v l r. k1⋅v⋅l⋅r +++ k2⋅v⋅l⋅r)" by (cases x; clarsimp)
lemma tree_case_distr: "f⋅⊥ = ⊥==> f⋅(tree_case⋅g⋅h⋅t) = tree_case⋅(f⋅g)⋅(Λ x l r. f⋅(h⋅x⋅l⋅r))⋅t" "(tree_case⋅g'⋅h'⋅t)⋅z = tree_case⋅(g'⋅z)⋅(Λ x l r. h'⋅x⋅l⋅r⋅z)⋅t" by (case_tac [!] t) simp_all
lemma tree_case_cong: assumes"t = t'" assumes"t' = Null ==> n = n'" assumes"∧v l r. [t' = Node⋅v⋅l⋅r; v ≠⊥]==> c v l r = c' v l r" assumes"cont (λ(x, y, z). c x y z)" assumes"cont (λ(x, y, z). c' x y z)" shows"tree_case⋅n⋅(Λ v l r. c v l r)⋅t = tree_case⋅n'⋅(Λ v l r. c' v l r)⋅t'" using assms by (cases t; cases t'; clarsimp simp: prod_cont_iff)
\path[->] (q_0) edge [bend right] (q_0i)
(q_1) edge [bend left] (q_0)
(q_2) edge [bend left] (q_0) % MP
(q_2) edge [bend left,color=red] (q_0i) % K opt
(q_3) edge [bend left] (q_1)
(q_4) edge [bend left] (q_1) % MP
(q_4) edge [bend left,color=red] (q_0) % K opt
(q_5) edge [bend left] (q_2); \end{tikzpicture} \caption{An example from 🍋‹‹\S2.1› in "CrochemoreRytter:2002"›. The MP tree for the
pattern $01001$ is drawn in black: right transitions are labelled
with a symbol, whereas left transitions are unlabelled. The two
`K'-optimised left transitions are shown in red. The boxes denote
@{const ‹Null›}. The root node is $q_0$.} \label{fig:example_tree}
end{figure}
tree can be interpreted as a sort of automaton\footnote{🍋‹‹\S3.1› in "Bird:2012"› suggests it can
thought of as a doubly-linked list, following 🍋‹"TakeichiAkama:1990"›.)}, where @{const ‹op2›} goes @{const ‹right›} if the pattern
with the next element of the text, and @{const ‹left›} otherwise, to determine how much of a prefix of
pattern could still be in play. Figure~\ref{fig:example_tree}
such an automaton for the pattern $01001$, used by 🍋‹‹\S2.1› in "CrochemoreRytter:2002"› to
the difference between Morris-Pratt (MP) and
-Morris-Pratt (KMP) preprocessing as we discuss below. Note that
are not the classical Mealy machines that correspond to regular
, where all outgoing transitions are labelled with symbols.
following lemma shows how our sample automaton is encoded as a non-wellfounded tree.
sharing that we expect from a lazy (call-by-need) evaluator is here implied by the use of
fixed points.
KMP preprocessor is expressed by the @{const ‹left2›} function, where @{const ‹op2›} is used
match the pattern against itself; the use of @{const ‹op2›} in @{const ‹matches2›} (``the driver'')
responsible for matching the (preprocessed) pattern against the text. This formally cashes in
observation by 🍋‹‹\S5› in "vanderWoude:1989"›, that these two algorithms
essentially the same, which has eluded other presentations\footnote{For instance, contrast
shared use of @{const ‹op2›} with the separated \texttt{match} \texttt{rematch} functions of 🍋‹‹Figure~1› in "AgerDanvyRohde:2006"›.}.
uses @{const ‹Null›} on a left path to signal to the driver that it should discard the
element of the text and restart matching from the beginning of the pattern (i.e,
{const ‹root2›}). This is a step towards the removal of @{term ‹us›} in \S\ref{sec:KMP:step8}.
that the @{const ‹Null›} at the end of the rightmost path is unreachable: the rightmost
{const ‹Node›} has @{term "vs = [::]"} and therefore @{const ‹op2›} always takes the left branch.
`K' optimisation is perhaps best understood by example. Consider
automaton in Figure~\ref{fig:example_tree}, and a text beginning \texttt{011}. Using the MP (black) transitions we take the path \rightarrow q_0 \stackrel{{\mathtt{0}}}{\rightarrow} q_1
stackrel{\mathtt{1}}{\rightarrow} \overbrace{q_2 \rightarrow q_0
rightarrow \Box}$. Now, due to the failure of the comparison of the
element of the text (\texttt{1}) at $q_2$, we can predict that
(identical) comparison at node $q_0$ will fail as well, and
have $q_2$ left-branch directly to $\Box$. This saves a
in the driver at the cost of another in the preprocessor
in @{const ‹next›}). These optimisations are the red
in the diagram, and can in general save an arbitrary number of
comparisons; consider the pattern $\mathtt{1}^n$ for instance.
formally, @{const ‹next›} ensures that the heads of
suffixes of the pattern (@{term ‹vs›}) on consecutive
on left paths are distinct; see below for a proof of this fact
our setting, and 🍋‹‹\S3.3.4› in
Gusfield:1997"› for a classical account. Unlike Bird's suggestion
p134), our @{const ‹next›} function is not recursive.
note in passing that while MP only allows ‹Null› on
left of the root node, ‹Null› can be on the left of
KMP node except for the rightmost
i.e., the one that signals a complete pattern match) where no optimisation is possible.
proceed with the formalities of the data refinement.
›
schematic_goal root2_op2_rep2_left2_right2_def: ― ‹ Obtain the definition of these functions as a single fixed point › "( root2 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op2 :: [:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , rep2 :: [:'a:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree , left2 :: [:'a:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree , right2 :: [:'a:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree ) = fix⋅?F" unfolding op2_def root2_def rep2_def left2_def right2_def by simp
lemma abs2_strict[simp]: "abs2⋅⊥ = ⊥" "abs2⋅Null = ⊥" by fixrec_simp+
lemma not_prefix_op2_next: assumes"prefix⋅[:x:]⋅xs = FF" shows"op2⋅pat⋅(next⋅xs⋅(rep2⋅pat⋅usvs))⋅x = op2⋅pat⋅(rep2⋅pat⋅usvs)⋅x" proof - obtain us vs where"usvs = (us, vs)"by force with assms show ?thesis by (cases xs; cases us; clarsimp; cases vs;
clarsimp simp: rep2.simps prefix_singleton_FF If2_def[symmetric] split: If2_splits) qed
text‹
's appeal to ‹foldl_fusion› (p130) is too weak to
this data refinement as his condition (iii) requires the
functions to coincide on all representation values. Concretely
asks that:
worker/wrapper fusion 🍋‹"GillHutton:2009" and "Gammie:2011"› specialised to @{const ‹sscanl›} (@{thm [source] "sscanl_ww_fusion"}) we only
to establish this identity for valid representations, i.e., when ‹t› lies under the image of ‹rep2›. In
, we show that this diagram commutes:
this result self-composes: after an initial @{term ‹rep2⋅pat›} step, we can repeatedly simulate
{const ‹op›} steps with @{const ‹op2›} steps.
›
lemma op_op2_refinement: assumes"pat ≠⊥" shows"rep2⋅pat⋅(op⋅pat⋅usvs⋅x) = op2⋅pat⋅(rep2⋅pat⋅usvs)⋅x" proof(cases "x = ⊥""usvs = ⊥" rule: bool.exhaust[case_product bool.exhaust]) case False_False thenhave"x ≠⊥""usvs ≠⊥"by simp_all thenshow ?thesis proof(induct usvs arbitrary: x rule: op_induct) case (step usvs) obtain us vs where usvs: "usvs = (us, vs)"by fastforce have *: "sfoldl⋅(op2⋅pat)⋅(root2⋅pat)⋅xs = rep2⋅pat⋅(split⋅pat⋅xs)"if"lt⋅(slength⋅xs)⋅(slength⋅us) = TT"for xs using that proof(induct xs rule: srev_induct) case (ssnoc x xs) from ssnoc(1,2,4) have IH: "sfoldl⋅(op2⋅pat)⋅(root2⋅pat)⋅xs = rep2⋅pat⋅(split⋅pat⋅xs)" by - (rule ssnoc; auto intro: lt_trans dest: lt_slength_0) obtain us' vs' where us'vs': "split⋅pat⋅xs = (us', vs')"by fastforce from‹pat ≠⊥› ssnoc(1,2,4) usvs show ?case apply (clarsimp simp: split_sfoldl_op[symmetric] IH) apply (rule step(1)[simplified abs_rep2_ID', simplified, symmetric]) using lt_trans split_length_lt split_sfoldl_op apply fastforce+ done qed (fastforce simp: ‹pat ≠⊥› root2.unfold)+ have **: "If snull⋅us then rep2⋅pat⋅([::], pat) else rep2⋅pat⋅(op⋅pat⋅(split⋅pat⋅(stail⋅us))⋅x) = op2⋅pat⋅(left2⋅pat⋅(us, vs))⋅x"if"prefix⋅[:x:]⋅vs = FF" proof(cases us) case snil with that show ?thesis by simp (metis next_Null op2.simps(1) prefix.simps(1) prefix_FF_not_snilD root2.simps) next case (scons u' us') from‹pat ≠⊥› scons have"lt⋅(slength⋅(cfst⋅(split⋅pat⋅us')))⋅(slength⋅us) = TT" using split_length_lt by fastforce from‹pat ≠⊥›‹x ≠⊥› usvs that scons this show ?thesis by (clarsimp simp: * step(1)[simplified abs_rep2_ID'] not_prefix_op2_next) qed simp from‹usvs ≠⊥› usvs show ?case apply (subst (2) rep2.unfold) apply (subst op2.unfold) apply (subst op.unfold) apply (clarsimp simp: If_distr rep2_snoc_right2 ** cong: If_cong) done qed qed (simp_all add: rep2.unfold)
text‹
the result of this data refinement is extensionally equal to
specification:
›
lemma data_refinement: shows"matches = matches2" proof(intro cfun_eqI) fix pat xs :: "[:'a:]"show"matches⋅pat⋅xs = matches2⋅pat⋅xs" proof(cases "pat = ⊥") case True thenshow ?thesis by (cases xs; clarsimp simp: matches2.simps) next case False thenshow ?thesis unfolding matches2.simps apply (subst matches_op) ―‹ Continue with previous derivation. › apply (subst sscanl_ww_fusion[where wrap="ID ** abs2"and unwrap="ID ** rep2⋅pat"and f'="Λ (n, x) y. (n + 1, op2⋅pat⋅x⋅y)"]) apply (simp add: abs_rep2_ID) apply (simp add: op_op2_refinement) apply (simp add: oo_assoc sfilter_smap root2.unfold) apply (simp add: oo_assoc[symmetric]) done qed qed
text‹
computation can be thought of as a pair coroutines with a
(@{const ‹root2›}/@{const ‹rep2›})
consumer (@{const ‹op2›}) structure. It turns out that
is not essential (see \S\ref{sec:implementations}), though it
depend on being able to traverse incompletely defined trees.
key difficulty in defining this computation in HOL using present
is that @{const ‹op2›} is neither terminating
@{emph ‹friendly›} in the terminology of 🍋‹"BlanchetteEtAl:2017"›.
this representation works for automata with this sort of
, it is unclear how general it is; in particular it may not
so well if @{const ‹left›} branches can go forward
well as back. See also the commentary in 🍋‹"HinzeJeuring:2001"›, who observe that sharing is easily lost, and so
is probably only useful in ``closed'' settings like the present
, unless the language is extended in unusual ways 🍋‹"JeanninEtAl:2017"›.
label{thm:k_property}
conclude by proving that @{const ‹rep2›} produces
that have the `K' property, viz that labels on consecutive nodes
a left path do not start with the same symbol. This also
the productivity of @{const ‹root2›}. The
of proof used here -- induction nested in coinduction --
in \S\ref{sec:KMP:increase_sharing}.
›
coinductive K :: "([:'a::Eq:] × [:'a:]) tree ==> bool"where "K Null"
| "[ usvs ≠⊥; K l; K r; ∧v vs. csnd⋅usvs = v :# vs ==> l = Null ∨ (∃v' vs'. csnd⋅(label⋅l) = v' :# vs' ∧ eq⋅v⋅v' = FF) ]==> K (Node⋅usvs⋅l⋅r)"
declare K.intros[intro!, simp]
lemma sfoldl_op2_root2_rep2_split: assumes"pat ≠⊥" shows"sfoldl⋅(op2⋅pat)⋅(root2⋅pat)⋅xs = rep2⋅pat⋅(split⋅pat⋅xs)" proof(induct xs rule: srev_induct) case (ssnoc x xs) with‹pat ≠⊥› ssnoc show ?caseby (clarsimp simp: split_sfoldl_op[symmetric] op_op2_refinement) qed (simp_all add: ‹pat ≠⊥› root2.unfold)
lemma K_rep2: assumes"pat ≠⊥" assumes"us :@ vs = pat" shows"K (rep2⋅pat⋅(us, vs))" using assms proof(coinduction arbitrary: us vs) case (K us vs) thenshow ?case proof(induct us arbitrary: vs rule: op_induct') case (step us) from step.prems have"us ≠⊥""vs ≠⊥"by auto show ?case proof(cases us) case bottom with‹us ≠⊥›show ?thesis by simp next case snil with step.prems show ?thesis by (cases vs; force simp: rep2.simps) next case (scons u' us') from‹pat ≠⊥› scons ‹us ≠⊥›‹vs ≠⊥› obtain usl vsl where splitl: "split⋅pat⋅us' = (usl, vsl)""usl ≠⊥""vsl ≠⊥""usl :@ vsl = pat" by (metis (no_types, opaque_lifting) Rep_cfun_strict1 prod.collapse sappend_strict sappend_strict2 split_pattern) from scons obtain l r where r: "rep2⋅pat⋅(us, vs) = Node⋅(us, vs)⋅l⋅r"by (simp add: rep2.simps) moreover have"(∃us vs. l = rep2⋅pat⋅(us, vs) ∧ us :@ vs = pat) ∨ K l" proof(cases vs) case snil with scons splitl r show ?thesis by (clarsimp simp: rep2.simps sfoldl_op2_root2_rep2_split) next case scons with‹pat ≠⊥›‹us = u' :# us'›‹u' ≠⊥›‹us' ≠⊥›‹vs ≠⊥› r splitl show ?thesis apply (clarsimp simp: rep2.simps sfoldl_op2_root2_rep2_split) apply (cases vsl; cases usl; clarsimp simp: If2_def[symmetric] sfoldl_op2_root2_rep2_split split: If2_splits) apply (rename_tac ul' usl') apply (cut_tac us'="prod.fst (split⋅pat⋅usl')"and vs="prod.snd (split⋅pat⋅usl')"in step(1); clarsimp simp: split_pattern) apply (metis fst_conv lt_trans slength.simps(2) split_length_lt step.prems(1)) apply (erule disjE; clarsimp simp: sfoldl_op2_root2_rep2_split) apply (rename_tac b l r) apply (case_tac b; clarsimp simp: rep2.simps) apply (auto simp: If2_def[symmetric] rep2.simps dest: split_pattern[rotated] split: If2_splits) done qed (simp add: ‹vs ≠⊥›) moreover from‹us :@ vs = pat›‹us ≠⊥›‹vs ≠⊥› r have"(∃usr vsr. r = rep2⋅pat⋅(usr, vsr) ∧ usr :@ vsr = pat) ∨ K r" by (cases vs; clarsimp simp: rep2.simps) moreover have"l = Null ∨ (∃v' vs'. csnd⋅(label⋅l) = v' :# vs' ∧ eq⋅v⋅v' = FF)"if"vs = v :# vs'"for v vs' proof(cases vsl) case snil with‹us :@ vs = pat›‹us = u' :# us'› splitl show ?thesis using split_length_lt[where pat=pat and xs=us'] by (force elim: slengthE simp: one_Integer_def split: if_splits) next case scons from splitl have"lt⋅(slength⋅usl)⋅(slength⋅us' + 1) = TT" by (metis fst_conv fst_strict split_bottom_iff split_length_lt) with scons ‹pat ≠⊥›‹us = u' :# us'›‹u' ≠⊥›‹us' ≠⊥›‹vs ≠⊥› r splitl ‹vs = v :# vs'›show ?thesis using step(1)[OF _ ‹pat ≠⊥›, where us'="prod.fst (split⋅pat⋅us')"and vs="prod.snd (split⋅pat⋅us')"] by (clarsimp simp: rep2.simps sfoldl_op2_root2_rep2_split If2_def[symmetric] split: If2_splits) qed (simp add: ‹vsl ≠⊥›) moreovernote‹pat ≠⊥›‹us ≠⊥›‹vs ≠⊥› ultimatelyshow ?thesis by auto qed qed qed
theorem K_root2: assumes"pat ≠⊥" shows"K (root2⋅pat)" using assms unfolding root2.unfold by (simp add: K_rep2)
text‹
remaining steps are as follows:
▪ 3. introduce an accumulating parameter (‹grep›).
▪ 4. inline ‹rep› and simplify.
▪ 5. simplify to Bird's ``simpler forms.''
▪ 6. memoise ‹left›.
▪ 7. simplify, unfold ‹prefix›.
▪ 8. discard ‹us›.
▪ 9. factor out ‹pat›.
›
subsection‹ Step 3: Introduce an accumulating parameter (grep) ›
text‹
we prepare for the second memoization step (\S\ref{sec:KMP:increase_sharing})
introducing an accumulating parameter to @{const ‹rep2›} that supplies the value of the left
.
retain @{const ‹rep2›} as a wrapper for now, and inline @{const ‹right2›} to speed up
.
›
fixrec ―‹ Bird p131 / p132 ›
root3 :: "[:'a::Eq_def:] → ([:'a:] × [:'a:]) tree" and op3 :: "[:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree" and rep3 :: "[:'a:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree" and grep3 :: "[:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree" where
[simp del]: "root3⋅pat = rep3⋅pat⋅([::], pat)"
| "op3⋅pat⋅Null⋅x = root3⋅pat"
| "usvs ≠⊥==> op3⋅pat⋅(Node⋅usvs⋅l⋅r)⋅x = If prefix⋅[:x:]⋅(csnd⋅usvs) then r else op3⋅pat⋅l⋅x"
| [simp del]: ―‹ Inline @{const ‹left2›}, factor out @{const ‹next›}. › "rep3⋅pat⋅usvs = grep3⋅pat⋅(case cfst⋅usvs of [::] ==> Null | u :# us ==> sfoldl⋅(op3⋅pat)⋅(root3⋅pat)⋅us)⋅usvs"
| [simp del]: ―‹ @{const ‹rep2›} with @{const ‹left2›} abstracted, @{const ‹right2›} inlined. › "grep3⋅pat⋅l⋅usvs = Node⋅usvs⋅(next⋅(csnd⋅usvs)⋅l)⋅(case csnd⋅usvs of [::] ==> Null | v :# vs ==> rep3⋅pat⋅(cfst⋅usvs :@ [:v:], vs))"
schematic_goal root3_op3_rep3_grep3_def: "( root3 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op3 :: [:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , rep3 :: [:'a:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree , grep3 :: [:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree ) = fix⋅?F" unfolding root3_def op3_def rep3_def grep3_def by simp
lemma r3_2: "(Λ (root, op, rep, grep). (root, op, rep))⋅ ( root3 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op3 :: [:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , rep3 :: [:'a:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree , grep3 :: [:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree ) = (Λ (root, op, rep, left, right). (root, op, rep))⋅ ( root2 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op2 :: [:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , rep2 :: [:'a::Eq_def:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree , left2 :: [:'a::Eq_def:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree , right2 :: [:'a::Eq_def:] → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree )" unfolding root2_op2_rep2_left2_right2_def root3_op3_rep3_grep3_def apply (simp add: match_snil_match_scons_slist_case match_Null_match_Node_tree_case slist_case_distr tree_case_distr) apply (simp add: fix_cprod fix_const) ―‹ Very slow. Sensitive to tuple order due to the asymmetry of ‹fix_cprod›. › apply (simp add: slist_case_distr) done
subsection‹ Step 4: Inline rep ›
text‹
further simplify by inlining @{const ‹rep3›} into @{const ‹root3›} and @{const ‹grep3›}.
›
fixrec
root4 :: "[:'a::Eq_def:] → ([:'a:] × [:'a:]) tree" and op4 :: "[:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree" and grep4 :: "[:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree" where
[simp del]: "root4⋅pat = grep4⋅pat⋅Null⋅([::], pat)"
| "op4⋅pat⋅Null⋅x = root4⋅pat"
| "usvs ≠⊥==> op4⋅pat⋅(Node⋅usvs⋅l⋅r)⋅x = If prefix⋅[:x:]⋅(csnd⋅usvs) then r else op4⋅pat⋅l⋅x"
| [simp del]: "grep4⋅pat⋅l⋅usvs = Node⋅usvs⋅(next⋅(csnd⋅usvs)⋅l)⋅(case csnd⋅usvs of [::] ==> Null | v :# vs ==> grep4⋅pat⋅(case cfst⋅usvs :@ [:v:] of [::] ==> Null ―‹ unreachable › | u :# us ==> sfoldl⋅(op4⋅pat)⋅(root4⋅pat)⋅us)⋅(cfst⋅usvs :@ [:v:], vs))"
schematic_goal root4_op4_grep4_def: "( root4 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op4 :: [:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , grep4 :: [:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree ) = fix⋅?F" unfolding root4_def op4_def grep4_def by simp
subsection‹ Step 5: Simplify to Bird's ``simpler forms'' ›
text‹
remainder of @{const ‹left2›} in @{const ‹grep4›} can be simplified by transforming the
{text "case"} scrutinee from @{term "cfst⋅usvs :@ [:v:]"} into @{term "cfst⋅usvs"}.
›
fixrec
root5 :: "[:'a::Eq_def:] → ([:'a:] × [:'a:]) tree" and op5 :: "[:'a::Eq_def:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree" and grep5 :: "[:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree" where
[simp del]: "root5⋅pat = grep5⋅pat⋅Null⋅([::], pat)"
| "op5⋅pat⋅Null⋅x = root5⋅pat"
| "usvs ≠⊥==> op5⋅pat⋅(Node⋅usvs⋅l⋅r)⋅x = If prefix⋅[:x:]⋅(csnd⋅usvs) then r else op5⋅pat⋅l⋅x"
| [simp del]: "grep5⋅pat⋅l⋅usvs = Node⋅usvs⋅(next⋅(csnd⋅usvs)⋅l)⋅(case csnd⋅usvs of [::] ==> Null | v :# vs ==> grep5⋅pat⋅(case cfst⋅usvs of ― ‹ was @{term ‹cfst⋅usvs :@ [:v:]›} › [::] ==> root5⋅pat | u :# us ==> sfoldl⋅(op5⋅pat)⋅(root5⋅pat)⋅(us :@ [:v:]))⋅(cfst⋅usvs :@ [:v:], vs))"
schematic_goal root5_op5_grep5_def: "( root5 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op5 :: [:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , grep5 :: [:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree ) = fix⋅?F" unfolding root5_def op5_def grep5_def by simp
this step cashes in the fact that, in the context of
{term ‹grep6⋅pat⋅l⋅usvs›}, @{term
sfoldl⋅(op6⋅pat)⋅(root6⋅pat)⋅us"} is
to @{term ‹l›}.
this step with the previous one is not simply a matter of
reasoning; we can see this by observing that the right
of @{term ‹grep5⋅pat⋅l⋅usvs›}
not depend on @{term ‹l›} whereas that of @{term ‹grep6⋅pat⋅l⋅usvs›} does, and therefore
cannot be extensionally equal. Furthermore the computations of
corresponding @{term ‹root›}s do not proceed in
: consider the computation of the left subtree.
our purposes it is enough to show that the trees @{const ‹root5›} and @{const ‹root6›} are equal,
which it follows that @{prop "op6 = op5"} by induction on its
argument. The equality is established by exhibiting a @{emph ‹tree bisimulation›} (@{const tree_bisim}) that relates
corresponding ``producer'' ‹grep› functions. Such a ‹R› must satisfy:
▪ ‹R ⊥⊥›;
▪ ‹R Null Null›; and
▪ if ‹R (Node⋅x⋅l⋅r) (Node⋅x'⋅l'⋅r')› then ‹x = x'›, ‹R l l'›, and ‹R r r'›.
› text‹
following pair of ‹left› functions define suitable left
from the corresponding @{term ‹root›}s.
inductive ―‹ This relation is not inductive. ›
root_bisim :: "[:'a::Eq_def:] ==> ([:'a:] × [:'a:]) tree ==> ([:'a:] × [:'a:]) tree ==> bool" for pat :: "[:'a:]" where
bottom: "root_bisim pat ⊥⊥"
| Null: "root_bisim pat Null Null"
| gl: "[pat ≠⊥; us ≠⊥; vs ≠⊥] ==> root_bisim pat (grep6⋅pat⋅(left6⋅pat⋅us)⋅(us, vs)) (grep5⋅pat⋅(left5⋅pat⋅us)⋅(us, vs))"
declare root_bisim.intros[intro!, simp]
lemma left6_left5_strict[simp]: "left6⋅pat⋅⊥ = ⊥" "left5⋅pat⋅⊥ = ⊥" by fixrec_simp+
lemma op6_left6: "[us ≠⊥; v ≠⊥]==> op6⋅pat⋅(left6⋅pat⋅us)⋅v = left6⋅pat⋅(us :@ [:v:])" by (cases us) simp_all
lemma op5_left5: "[us ≠⊥; v ≠⊥]==> op5⋅pat⋅(left5⋅pat⋅us)⋅v = left5⋅pat⋅(us :@ [:v:])" by (cases us) simp_all
lemma root5_left5: "v ≠⊥==> root5⋅pat = left5⋅pat⋅[:v:]" by simp
lemma op5_sfoldl_left5: "[us ≠⊥; u ≠⊥; v ≠⊥]==> op5⋅pat⋅(sfoldl⋅(op5⋅pat)⋅(root5⋅pat)⋅us)⋅v = left5⋅pat⋅(u :# us :@ [:v:])" by simp
lemma root_bisim_root: assumes"pat ≠⊥" shows"root_bisim pat (root6⋅pat) (root5⋅pat)" unfolding root6.unfold root5.unfold using assms by simp (metis (no_types, lifting) left5.simps(1) left6.simps(1) root_bisim.simps slist.con_rews(3))
lemma root_bisim_op_next56: assumes"root_bisim pat t6 t5" assumes"prefix⋅[:x:]⋅xs = FF" shows"op6⋅pat⋅(next⋅xs⋅t6)⋅x = op6⋅pat⋅t6⋅x ∧ op5⋅pat⋅(next⋅xs⋅t5)⋅x = op5⋅pat⋅t5⋅x" using‹root_bisim pat t6 t5› proof cases case Null with assms(2) show ?thesis by (cases xs) simp_all next case (gl us vs) with assms(2) show ?thesis apply (cases "x = ⊥", simp) apply (cases xs; clarsimp) apply (subst (12) grep6.simps) apply (subst (12) grep5.simps) apply (cases vs; clarsimp simp: If2_def[symmetric] split: If2_splits) done qed simp
text‹
main part of establishing that @{const ‹root_bisim›}
a @{const ‹tree_bisim›} is in showing that the left
constructed by the ‹grep›s are @{const ‹root_bisim›}-related. We do this by inducting over the
of the pattern so far matched (‹us›), as we did
proving that this tree has the `K' property in
S\ref{thm:k_property}.
›
lemma assumes"pat ≠⊥" shows root_bisim_op: "root_bisim pat t6 t5 ==> root_bisim pat (op6⋅pat⋅t6⋅x) (op5⋅pat⋅t5⋅x)" ― ‹ unused › and root_bisim_next_left: "root_bisim pat (next⋅vs⋅(left6⋅pat⋅us)) (next⋅vs⋅(left5⋅pat⋅us))" (is"?rbnl us vs") proof - let ?ogl5 = "λus vs. op5⋅pat⋅(grep5⋅pat⋅(left5⋅pat⋅us)⋅(us, vs))⋅x" let ?ogl6 = "λus vs. op6⋅pat⋅(grep6⋅pat⋅(left6⋅pat⋅us)⋅(us, vs))⋅x" let ?for5 = "λus. sfoldl⋅(op5⋅pat)⋅(root5⋅pat)⋅us" let ?for6 = "λus. sfoldl⋅(op6⋅pat)⋅(root6⋅pat)⋅us" have"[?ogl6 us vs = Node⋅usvs⋅l⋅r; cfst⋅usvs ≠⊥; x ≠⊥]==> le⋅(slength⋅(cfst⋅usvs))⋅(slength⋅us + 1) = TT" and"[op6⋅pat⋅(next⋅xs⋅(grep6⋅pat⋅(left6⋅pat⋅us)⋅(us, vs)))⋅x = Node⋅usvs⋅l⋅r; cfst⋅usvs ≠⊥; x ≠⊥; us ≠⊥; vs ≠⊥]==> le⋅(slength⋅(cfst⋅usvs))⋅(slength⋅us + 1) = TT" and"[?for6 us = Node⋅usvs⋅l⋅r; cfst⋅usvs ≠⊥]==> lt⋅(slength⋅(cfst⋅usvs))⋅(slength⋅us + 1) = TT" and"[us ≠⊥; vs ≠⊥]==> root_bisim pat (?ogl6 us vs) (?ogl5 us vs)" and"root_bisim pat (?for6 us) (?for5 us)" and"?rbnl us vs" for usvs l r xs us vs proof(induct us arbitrary: usvs l r vs x xs rule: op_induct') case (step us) have rbl: "root_bisim pat (left6⋅pat⋅us) (left5⋅pat⋅us)" by (cases us; fastforce intro: step(5) simp: left6.unfold left5.unfold)
{ case (1 usvs l r vs x) from rbl have L: "le⋅(slength⋅(prod.fst usvs'))⋅(slength⋅us + 1) = TT" if"op6⋅pat⋅(next⋅vs⋅(left6⋅pat⋅us))⋅x = Node⋅usvs'⋅l⋅r" and"cfst⋅usvs' ≠⊥" and"vs ≠⊥" for usvs' l r proof cases case bottom with that show ?thesis by simp next case Null with that show ?thesis apply simp apply (subst (asm) root6.unfold) apply (subst (asm) grep6.unfold) apply (fastforce intro: le_plus_1) done next case (gl us'' vs'') show ?thesis proof(cases us) case bottom with that show ?thesis by simp next case snil with that show ?thesis apply simp apply (subst (asm) root6.unfold) apply (subst (asm) grep6.unfold) apply clarsimp done next case (scons ush ust) moreoverfrom that gl scons ‹x ≠⊥›have"le⋅(slength⋅(cfst⋅usvs'))⋅(slength⋅us'' + 1) = TT" apply simp apply (subst (asm) (2) grep6.unfold) apply (fastforce dest: step(2, 3)[rotated]) done moreoverfrom gl scons have"lt⋅(slength⋅us'')⋅(slength⋅(stail⋅us) + 1) = TT" apply simp apply (subst (asm) grep6.unfold) apply (fastforce dest: step(3)[rotated]) done ultimatelyshow ?thesis apply clarsimp apply (metis Integer_le_both_plus_1 Ord_linear_class.le_trans le_iff_lt_or_eq) done qed qed from1show ?case apply (subst (asm) grep6.unfold) apply (cases vs; clarsimp simp: If2_def[symmetric] split: If2_splits) apply (rule L; fastforce) apply (metis (no_types, lifting) ab_semigroup_add_class.add_ac(1) fst_conv grep6.simps le_refl_Integer sappend_snil_id_right slength.simps(2) slength_bottom_iff slength_sappend slist.con_rews(3) tree_injects') apply (rule L; fastforce) done } note slength_ogl = this
{ case (2 usvs l r vs x xs) from2have"xs ≠⊥"by clarsimp from‹vs ≠⊥›‹xs ≠⊥›2(1) show ?case proof(cases rule: next_grep6_cases) case gl with‹cfst⋅usvs ≠⊥›‹x ≠⊥›show ?thesis using slength_ogl by blast next case nl from rbl show ?thesis proof cases case bottom with nl ‹cfst⋅usvs ≠⊥›show ?thesis by simp next case Null with nl ‹us ≠⊥›‹vs ≠⊥›show ?thesis apply simp apply (subst (asm) root6.unfold) apply (subst (asm) grep6.unfold) apply (clarsimp simp: zero_Integer_def one_Integer_def elim!: slengthE) done next case (gl us'' vs'') show ?thesis proof(cases us) case bottom with‹us ≠⊥›show ?thesis by simp next case snil with gl show ?thesis by (subst (asm) grep6.unfold) simp next case (scons u' us') with2 nl gl show ?thesis apply clarsimp apply (subst (asm) (4) grep6.unfold) apply clarsimp apply (drule step(3)[rotated]; clarsimp) apply (drule step(2)[rotated]; clarsimp) apply (clarsimp simp: lt_le) apply (metis Integer_le_both_plus_1 Ord_linear_class.le_trans) done qed qed qed }
{ case (3 usvs l r) show ?case proof(cases us rule: srev_cases) case snil with3show ?thesis apply (subst (asm) root6.unfold) apply (subst (asm) grep6.unfold) apply fastforce done next case (ssnoc u' us') thenhave"root_bisim pat (?for6 us') (?for5 us')"by (fastforce intro: step(5)) thenshow ?thesis proof cases case bottom with3 ssnoc show ?thesis by simp next case Null with3 ssnoc show ?thesis apply simp apply (subst (asm) root6.unfold) apply (subst (asm) grep6.unfold) apply (clarsimp simp: zero_Integer_def one_Integer_def elim!: slengthE) done next case (gl us'' vs'') with3 ssnoc show ?thesis apply clarsimp apply (subst (asm) (2) grep6.unfold) apply (fastforce simp: zero_Integer_def one_Integer_def split: if_splits dest!: step(1)[rotated] step(3)[rotated] elim!: slengthE) done qed qed (use3in simp) }
{ case (4 vs x) show ?case proof(cases "prefix⋅[:x:]⋅vs") case bottom thenshow ?thesis apply (subst grep6.unfold) apply (subst grep5.unfold) apply auto done next case TT with‹pat ≠⊥›‹us ≠⊥›show ?thesis apply (subst grep6.unfold) apply (subst grep5.unfold) apply (cases vs; clarsimp simp: op6_left6) apply (cases us; clarsimp simp del: left6.simps left5.simps simp add: root5_left5) apply (metis (no_types, lifting) op5_sfoldl_left5 root5_left5 root_bisim.simps sappend_bottom_iff slist.con_rews(3) slist.con_rews(4)) done next case FF with‹pat ≠⊥›‹us ≠⊥›show ?thesis apply (subst grep6.unfold) apply (subst grep5.unfold) using rbl apply (auto simp: root_bisim_op_next56 elim!: root_bisim.cases intro: root_bisim_root) apply (subst (asm) grep6.unfold) apply (cases us; fastforce dest: step(3)[rotated] intro: step(4)) done qed }
{ case5show ?case proof(cases us rule: srev_cases) case (ssnoc u' us') thenhave"root_bisim pat (?for6 us') (?for5 us')"by (fastforce intro: step(5)) thenshow ?thesis proof cases case (gl us'' vs'') with ssnoc show ?thesis apply clarsimp apply (subst (asm) grep6.unfold) apply (fastforce dest: step(3)[rotated] intro: step(4)) done qed (use‹pat ≠⊥› ssnoc root_bisim_root in auto) qed (use‹pat ≠⊥› root_bisim_root in auto) }
{ case (6 vs) from rbl ‹pat ≠⊥›show rbnl: "?rbnl us vs" proof cases case bottom thenshow ?thesis by fastforce next case Null thenshow ?thesis by (cases vs) auto next case (gl us'' vs'') thenshow ?thesis apply clarsimp apply (subst grep5.unfold) apply (subst grep6.unfold) apply (subst (asm) grep5.unfold) apply (subst (asm) grep6.unfold) apply (cases us; clarsimp; cases vs''; clarsimp) apply (metis Rep_cfun_strict1 bottom left5.simps(2) left6.simps(2) next_snil next_strict(1) rbl) apply (cases vs; clarsimp) using rbl apply (fastforce dest: step(3)[rotated] intro: step(6) simp: If2_def[symmetric] simp del: eq_FF split: If2_splits)+ done qed } qed from‹pat ≠⊥› this(4) show"root_bisim pat t6 t5 ==> root_bisim pat (op6⋅pat⋅t6⋅x) (op5⋅pat⋅t5⋅x)" by (auto elim!: root_bisim.cases intro: root_bisim_root) show‹root_bisim pat (next⋅vs⋅(left6⋅pat⋅us)) (next⋅vs⋅(left5⋅pat⋅us))›by fact qed
text‹
this result in hand the remainder is technically fiddly but straightforward.
lemma r6_5: shows"(root6⋅pat, op6⋅pat) = (root5⋅pat, op5⋅pat)" proof(cases "pat = ⊥") case True from True have"root6⋅pat = root5⋅pat" apply (subst root6.unfold) apply (subst grep6.unfold) apply (subst root5.unfold) apply (subst grep5.unfold) apply simp done moreover from True ‹root6⋅pat = root5⋅pat›have"op6⋅pat⋅t⋅x = op5⋅pat⋅t⋅x"for t x by (induct t) simp_all ultimatelyshow ?thesis by (simp add: cfun_eq_iff) next case False thenhave root: "root6⋅pat = root5⋅pat" by (rule tree.coinduct[OF tree_bisim_root_bisim root_bisim_root]) moreover from root have"op6⋅pat⋅t⋅x = op5⋅pat⋅t⋅x"for t x by (induct t) simp_all ultimatelyshow ?thesis by (simp add: cfun_eq_iff) qed
text‹
conclude this section by observing that accumulator-introduction is a well known technique
see, for instance, 🍋‹‹\S13.6› in "Hutton:2016"›), but the examples in the
assume that the type involved is defined inductively. Bird adopts this strategy without
what the mixed inductive/coinductive rule is that justifies the preservation of total
.
difficulty of this step is why we wired in the `K' opt earlier: it allows us to preserve the
of the tree all the way from the data refinement to the final version.
›
subsection‹ Step 7: Simplify, unfold prefix ›
text‹
next step (Bird, bottom of p132) is to move the case split in @{const ‹grep6›} on ‹vs› above the ‹Node› constructor, which makes @{term ‹grep7›} strict in that parameter and therefore not
equal to @{const ‹grep6›}. We establish a weaker correspondence using fixed-point induction.
also unfold @{const ‹prefix›} in @{const ‹op6›}.
›
fixrec
root7 :: "[:'a::Eq_def:] → ([:'a:] × [:'a:]) tree" and op7 :: "[:'a::Eq_def:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree" and grep7 :: "[:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree" where
[simp del]: "root7⋅pat = grep7⋅pat⋅Null⋅([::], pat)"
| "op7⋅pat⋅Null⋅x = root7⋅pat"
| "op7⋅pat⋅(Node⋅(us, [::])⋅l⋅r)⋅x = op7⋅pat⋅l⋅x" ―‹ Unfold ‹prefix››
| "[v ≠⊥; vs ≠⊥]==> op7⋅pat⋅(Node⋅(us, v :# vs)⋅l⋅r)⋅x = If eq⋅x⋅v then r else op7⋅pat⋅l⋅x"
| [simp del]: "grep7⋅pat⋅l⋅(us, [::]) = Node⋅(us, [::])⋅l⋅Null" ―‹ Case split on ‹vs› hoisted above ‹Node›. ›
| "[v ≠⊥; vs ≠⊥]==> grep7⋅pat⋅l⋅(us, v :# vs) = Node⋅(us, v :# vs)⋅(next⋅(v :# vs)⋅l)⋅(grep7⋅pat⋅(op7⋅pat⋅l⋅v)⋅(us :@ [:v:], vs))"
schematic_goal root7_op7_grep7_def: "( root7 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op7 :: [:'a:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , grep7 :: [:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree ) = fix⋅?F" unfolding root7_def op7_def grep7_def by simp
lemma r7_6_aux: assumes"pat ≠⊥" shows "(Λ (root, op, grep). (root⋅pat, seq⋅x⋅(op⋅pat⋅t⋅x), grep⋅pat⋅l⋅(us, vs)))⋅ ( root7 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op7 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , grep7 :: [:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree ) = (Λ (root, op, grep). (root⋅pat, seq⋅x⋅(op⋅pat⋅t⋅x), seq⋅vs⋅(grep⋅pat⋅l⋅(us, vs))))⋅ ( root6 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree , op6 :: [:'a::Eq_def:] → ([:'a:] × [:'a:]) tree → 'a → ([:'a:] × [:'a:]) tree , grep6 :: [:'a:] → ([:'a:] × [:'a:]) tree → [:'a:] × [:'a:] → ([:'a:] × [:'a:]) tree )" unfolding root6_op6_grep6_def root7_op7_grep7_def proof(induct arbitrary: t x l us vs rule: parallel_fix_ind[case_names adm bottom step]) case (step X Y t x l us vs) thenshow ?case apply - apply (cases X, cases Y) apply (rename_tac r10 o10 g10 r9 o9 g9) apply (clarsimp simp: cfun_eq_iff assms match_Node_mplus_match_Node match_Null_match_Node_tree_case tree_case_distr match_snil_match_scons_slist_case slist_case_distr If_distr) apply (intro allI conjI) apply (case_tac t; clarsimp) apply (rename_tac us vs l r) apply (case_tac "x = ⊥"; clarsimp) apply (case_tac vs; clarsimp; fail) apply (case_tac vs; clarsimp) apply (metis ID1 seq_simps(3)) done qed simp_all
lemma ok8_strict[simp]: "ok8⋅⊥ = ⊥" "ok8⋅Null = ⊥" by fixrec_simp+
text‹
cannot readily relate @{const ‹next›} and @{const ‹next'›} using worker/wrapper as the obvious abstraction
not invertible. Conversely the desired result is easily shown by
-point induction.
›
lemmanext'_next: assumes"v ≠⊥" assumes"vs ≠⊥" shows"next'⋅v⋅(tree_map'⋅csnd⋅t) = tree_map'⋅csnd⋅(next⋅(v :# vs)⋅t)" proof(induct t) case (Node usvs' l r) with assms show ?case apply (cases usvs'; clarsimp) apply (rename_tac us'' vs'') apply (case_tac vs''; clarsimp simp: If_distr) done qed (use assms in simp_all)
lemma r8_7[simplified]: shows"(Λ (root, op, grep). ( root⋅pat , op⋅pat⋅(tree_map'⋅csnd⋅t)⋅x , grep⋅pat⋅(tree_map'⋅csnd⋅l)⋅(csnd⋅usvs)))⋅(root8, op8, grep8) = (Λ (root, op, grep). ( tree_map'⋅csnd⋅(root⋅pat) , tree_map'⋅csnd⋅(op⋅pat⋅t⋅x) , tree_map'⋅csnd⋅(grep⋅pat⋅l⋅usvs)))⋅(root7, op7, grep7)" unfolding root7_op7_grep7_def root8_op8_grep8_def proof(induct arbitrary: l t usvs x rule: parallel_fix_ind[case_names adm bottom step]) case (step X Y l t usvs x) thenshow ?case apply - apply (cases X; cases Y) apply (clarsimp simp: cfun_eq_iff next'_next
match_snil_match_scons_slist_case slist_case_distr
match_Node_mplus_match_Node match_Null_match_Node_tree_case tree_case_distr
cong: slist_case_cong) apply (cases t; clarsimp simp: If_distr) apply (rename_tac us vs l r) apply (case_tac vs; fastforce) done qed simp_all
text‹
-level equivalence follows from ‹lfp_fusion› specialized to ‹sscanl› (@{thm [source]
sscanl_lfp_fusion"}), which states that
begin{center}
@{prop ‹smap⋅g oo sscanl⋅f⋅z = sscanl⋅f'⋅(g⋅z)›}
end{center}
that ‹g› is strict and the following diagram commutes for @{prop ‹x ≠⊥›}:
begin{center} \begin{tikzcd}[column sep=6em]
a \arrow[r, "‹Λ a. f⋅a⋅x›"] \arrow[d, "‹g›"] & a' \arrow[d, "‹g›"] \\
b \arrow[r, "‹Λ a. f'⋅a⋅x›"] & b' \end{tikzcd}
end{center}
subsection‹ Step 9: Factor out pat (final version) \label{sec:KMP:final_version}›
text‹
we factor @{term ‹pat›} from these definitions and arrive
Bird's cyclic data structure, which when executed using lazy
actually memoises the computation of @{const ‹grep8›}.
‹Letrec› syntax groups recursive bindings with ‹,› and separates these with ‹;›. Its lack
support for clausal definitions, and that \texttt{HOLCF} ‹case› does not support nested patterns, leads to some
.
›
fixrec matchesf :: "[:'a::Eq_def:] → [:'a:] → [:Integer:]"where
[simp del]: "matchesf⋅pat = (Letrec okf = (Λ (Node⋅vs⋅l⋅r). snull⋅vs); nextf = (Λ x t. case t of Null ==> Null | Node⋅vs⋅l⋅r ==> (case vs of [::] ==> t | v :# vs ==> If eq⋅x⋅v then l else t)); rootf = grepf⋅Null⋅pat, opf = (Λ t x. case t of Null ==> rootf | Node⋅vs⋅l⋅r ==> (case vs of [::] ==> opf⋅l⋅x | v :# vs ==> If eq⋅x⋅v then r else opf⋅l⋅x)), grepf = (Λ l vs. case vs of [::] ==> Node⋅[::]⋅l⋅Null | v :# vs ==> Node⋅(v :# vs)⋅(nextf⋅v⋅l)⋅(grepf⋅(opf⋅l⋅v)⋅vs)); stepf = (Λ (n, t) x. (n + 1, opf⋅t⋅x)) in smap⋅cfst oo sfilter⋅(okf oo csnd) oo sscanl⋅stepf⋅(0, rootf))"
lemma matchesf_next': "(Λ x t. case t of Null ==> Null | Node⋅vs⋅l⋅r ==> (case vs of [::] ==> t | v :# vs ==> If eq⋅x⋅v then l else t)) = next'" apply (clarsimp simp: cfun_eq_iff next'.unfold
match_snil_match_scons_slist_case slist_case_distr
match_Node_mplus_match_Node match_Null_match_Node_tree_case tree_case_distr) apply (simp cong: tree_case_cong) apply (simp cong: slist_case_cong) done
lemma matchesf_8: "fix⋅(Λ (Rootf, Opf, Grepf). ( Grepf⋅Null⋅pat , Λ t x. case t of Null ==> Rootf | Node⋅vs⋅l⋅r ==> (case vs of [::] ==> Opf⋅l⋅x | v :# vs ==> If eq⋅x⋅v then r else Opf⋅l⋅x) , Λ l vs. case vs of [::] ==> Node⋅[::]⋅l⋅Null | v :# vs ==> Node⋅(v :# vs)⋅(next'⋅v⋅l)⋅(Grepf⋅(Opf⋅l⋅v)⋅vs)) ) = (Λ (root, op, grep). (root⋅pat, op⋅pat, grep⋅pat))⋅(root8, op8, grep8)" unfolding root8_op8_grep8_def by (rule lfp_fusion[symmetric])
(fastforce simp: cfun_eq_iff
match_snil_match_scons_slist_case slist_case_distr
match_Node_mplus_match_Node match_Null_match_Node_tree_case tree_case_distr)+
final program above is easily syntactically translated into the
shown in Figure~\ref{fig:haskell-kmp}, and one can expect
's list fusion machinery to compile the top-level driver into an
loop. 🍋‹"LochbihlerMaximova:2015"›
mechanised this optimisation for Isabelle/HOL's code generator
and see also 🍋‹"Huffman:2009"›).
we lack both pieces of infrastructure we show such a fusion is sound
hand.
›
lemma fused_driver': assumes"g⋅⊥ = ⊥" assumes"p⋅⊥ = ⊥" shows"smap⋅g oo sfilter⋅p oo sscanl⋅f⋅z = (μ R. Λ z xxs. case xxs of [::] ==> If p⋅z then [:g⋅z:] else [::] | x :# xs ==> let z' = f⋅z⋅x in If p⋅z then g⋅z :# R⋅z'⋅xs else R⋅z'⋅xs)⋅z"
(is"?lhs = ?rhs") proof(rule cfun_eqI) fix xs from assms show"?lhs⋅xs = ?rhs⋅xs" by (induct xs arbitrary: z) (subst fix_eq; fastforce simp: If_distr Let_def)+ qed
(*<*)
end (*>*)
Messung V0.5 in Prozent
¤ Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.0.94Bemerkung:
(vorverarbeitet am 2026-06-10)
¤
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.