Quellcodebibliothek Statistik Leitseite products/Sources/formale Sprachen/Isabelle/Doc/Codegen/   (Beweissystem Isabelle Version 2025-1©)  Datei vom 16.11.2025 mit Größe 15 kB image not shown  

Quelle  Foundations.thy

  Sprache: Isabelle
 

theory Foundations
imports Introduction
begin

section Code generation foundations \label{sec:foundations}

subsection Code generator architecture \label{sec:architecture}

text 
 The code generator is actually a framework consisting of different
 components which can be customised individually.

 Conceptually all components operate on Isabelle's logic framework
 Pure. Practically, the object logic HOL
 provides the necessary facilities to make use of the code generator,
 mainly since it is an extension of Isabelle/Pure.

 The constellation of the different components is visualized in the
 following picture.

 \begin{figure}[h]
 \def\sys#1{\emph{#1}}
 \begin{tikzpicture}[x = 4cm, y = 1cm]
 \tikzstyle positive=[color = black, fill = white];
 \tikzstyle negative=[color = white, fill = black];
 \tikzstyle entity=[rounded corners, draw, thick];
 \tikzstyle process=[ellipse, draw, thick];
 \tikzstyle arrow=[-stealth, semithick];
 \node (spec) at (0, 3) [entity, positive] {specification tools};
 \node (user) at (1, 3) [entity, positive] {user proofs};
 \node (spec_user_join) at (0.5, 3) [shape=coordinate] {};
 \node (raw) at (0.5, 4) [entity, positive] {raw code equations};
 \node (pre) at (1.5, 4) [process, positive] {preprocessing};
 \node (eqn) at (2.5, 4) [entity, positive] {code equations};
 \node (iml) at (0.5, 0) [entity, positive] {intermediate program};
 \node (seri) at (1.5, 0) [process, positive] {serialisation};
 \node (SML) at (2.5, 3) [entity, positive] {\sys{SML}};
 \node (OCaml) at (2.5, 2) [entity, positive] {\sys{OCaml}};
 \node (Haskell) at (2.5, 1) [entity, positive] {\sys{Haskell}};
 \node (Scala) at (2.5, 0) [entity, positive] {\sys{Scala}};
 \draw [semithick] (spec) -- (spec_user_join);
 \draw [semithick] (user) -- (spec_user_join);
 \draw [-diamond, semithick] (spec_user_join) -- (raw);
 \draw [arrow] (raw) -- (pre);
 \draw [arrow] (pre) -- (eqn);
 \draw [arrow] (eqn) -- node (transl) [process, positive] {translation} (iml);
 \draw [arrow] (iml) -- (seri);
 \draw [arrow] (seri) -- (SML);
 \draw [arrow] (seri) -- (OCaml);
 \draw [arrow] (seri) -- (Haskell);
 \draw [arrow] (seri) -- (Scala);
 \end{tikzpicture}
 \caption{Code generator architecture}
 \label{fig:arch}
 \end{figure}

 \noindent Central to code generation is the notion of \emph{code
 equations}. A code equation as a first approximation is a theorem
 of the form f t1 t2 tn t (an equation headed by a
 constant f with arguments t1 t2 tn and right
 hand side t).

 \begin{itemize}

 \item Starting point of code generation is a collection of (raw)
 code equations in a theory. It is not relevant where they stem
 from, but typically they were either produced by specification
 tools or proved explicitly by the user.
 
 \item These raw code equations can be subjected to theorem
 transformations. This \qn{preprocessor} (see
 \secref{sec:preproc}) can apply the full expressiveness of
 ML-based theorem transformations to code generation. The result
 of preprocessing is a structured collection of code equations.

 \item These code equations are \qn{translated} to a program in an
 abstract intermediate language. Think of it as a kind of
 \qt{Mini-Haskell} with four \qn{statements}: data (for
 datatypes), fun (stemming from code equations), also
 class and inst (for type classes).

 \item Finally, the abstract program is \qn{serialised} into
 concrete source code of a target language. This step only
 produces concrete syntax but does not change the program in
 essence; all conceptual transformations occur in the translation
 step.

 \end{itemize}

 \noindent From these steps, only the last two are carried out
 outside the logic; by keeping this layer as thin as possible, the
 amount of code to trust is kept to a minimum.
 



subsection The pre- and postprocessor \label{sec:preproc}

text 
 Before selected function theorems are turned into abstract code, a
 chain of definitional transformation steps is carried out:
 \emph{preprocessing}. The preprocessor consists of two
 components: a \emph{simpset} and \emph{function transformers}.

 The preprocessor simpset has a disparate brother, the
 \emph{postprocessor simpset}.
 In the theory-to-code scenario depicted in the picture
 above, it plays no role. But if generated code is used
 to evaluate expressions (cf.~\secref{sec:evaluation}), the
 postprocessor simpset is applied to the resulting expression before this
 is turned back.

 The pre- and postprocessor \emph{simpsets} can apply the
 full generality of the Isabelle
 simplifier. Due to the interpretation of theorems as code
 equations, rewrites are applied to the right hand side and the
 arguments of the left hand side of an equation, but never to the
 constant heading the left hand side.

 Pre- and postprocessor can be setup to transfer between
 expressions suitable for logical reasoning and expressions
 suitable for execution. As example, take list membership; logically
 it is expressed as termx set xs. But for execution
 the intermediate set is not desirable. Hence the following
 specification:
 


definition %quote member :: "'a list ==> 'a ==> bool"
where
  [code_abbrev]: "member xs x x set xs"

text 
 \noindent The \emph{@{attribute code_abbrev} attribute} declares
 its theorem a rewrite rule for the postprocessor and the symmetric
 of its theorem as rewrite rule for the preprocessor. Together,
 this has the effect that expressions @{thm (rhs) member_def [no_vars]}
 are replaced by @{thm (lhs) member_def [no_vars]} in generated code, but
 are turned back into @{thm (rhs) member_def [no_vars]} if generated
 code is used for evaluation.

 Rewrite rules for pre- or postprocessor may be
 declared independently using \emph{@{attribute code_unfold}}
 or \emph{@{attribute code_post}} respectively.

 \emph{Function transformers} provide a very general
 interface, transforming a list of function theorems to another list
 of function theorems, provided that neither the heading constant nor
 its type change. The term0::nat / constSuc pattern
 used in theory Code_Abstract_Nat (see \secref{abstract_nat})
 uses this interface.

 \noindent The current setup of the pre- and postprocessor may be inspected
 using the @{command_def print_codeproc} command. @{command_def
 code_thms} (see \secref{sec:equations}) provides a convenient
 mechanism to inspect the impact of a preprocessor setup on code
 equations. Attribute @{attribute code_preproc_trace} allows
 for low-level tracing:
 


declare %quote [[code_preproc_trace]]

declare %quote [[code_preproc_trace only: distinct remdups]]

declare %quote [[code_preproc_trace off]]


subsection Understanding code equations \label{sec:equations}

text 
 As told in \secref{sec:principle}, the notion of code equations is
 vital to code generation. Indeed most problems which occur in
 practice can be resolved by an inspection of the underlying code
 equations.

 It is possible to exchange the default code equations for constants
 by explicitly proving alternative ones:
 


lemma %quote [code]:
  "dequeue (AQueue xs []) =
     (if xs = [] then (None, AQueue [] [])
       else dequeue (AQueue [] (rev xs)))"
  "dequeue (AQueue xs (y # ys)) =
     (Some y, AQueue xs ys)"
  by (cases xs, simp_all) (cases "rev xs", simp_all)

text 
 \noindent The annotation [code] is an attribute
 which states that the given theorems should be considered as code
 equations for a fun statement -- the corresponding constant
 is determined syntactically. The resulting code:
 


text %quote 
 @{code_stmts dequeue constant: dequeue (Haskell)}
 


text 
 \noindent You may note that the equality test termxs = [] has
 been replaced by the predicate termList.null xs. This is due
 to the default setup of the \qn{preprocessor}.

 This possibility to select arbitrary code equations is the key
 technique for program and datatype refinement (see
 \secref{sec:refinement}).

 Due to the preprocessor, there is the distinction of raw code
 equations (before preprocessing) and code equations (after
 preprocessing).

 The first can be listed (among other data) using the @{command_def
 print_codesetup} command.

 The code equations after preprocessing are already are blueprint of
 the generated program and can be inspected using the @{command
 code_thms} command:
 


code_thms %quote dequeue

text 
 \noindent This prints a table with the code equations for constdequeue, including \emph{all} code equations those equations depend
 on recursively. These dependencies themselves can be visualized using
 the @{command_def code_deps} command.
 



subsection Equality

text 
 Implementation of equality deserves some attention. Here an example
 function involving polymorphic equality:
 


primrec %quote collect_duplicates :: "'a list ==> 'a list ==> 'a list ==> 'a list" where
  "collect_duplicates xs ys [] = xs"
"collect_duplicates xs ys (z#zs) = (if z set xs
    then if z set ys
      then collect_duplicates xs ys zs
      else collect_duplicates xs (z#ys) zs
    else collect_duplicates (z#xs) (z#ys) zs)"

text 
 \noindent During preprocessing, the membership test is rewritten,
 resulting in constList.member, which itself performs an explicit
 equality check, as can be seen in the corresponding SML code:
 


text %quote 
 @{code_stmts collect_duplicates (SML)}
 


text 
 \noindent Obviously, polymorphic equality is implemented the Haskell
 way using a type class. How is this achieved? HOL introduces an
 explicit class 🍋equal with a corresponding operation constHOL.equal such that @{thm equal [no_vars]}. The preprocessing
 framework does the rest by propagating the 🍋equal constraints
 through all dependent code equations. For datatypes, instances of
 🍋equal are implicitly derived when possible. For other types,
 you may instantiate equal manually like any other type class.
 



subsection Explicit partiality \label{sec:partiality}

text 
 Explicit partiality is caused by missing patterns
 (in contrast to partiality caused by nontermination, which is covered in Section~\ref{sec:partial}).
 Here is an example, again for amortised queues:
 


definition %quote strict_dequeue :: "'a queue ==> 'a × 'a queue" where
  "strict_dequeue q = (case dequeue q
    of (Some x, q') ==> (x, q'))"

lemma %quote strict_dequeue_AQueue [code]:
  "strict_dequeue (AQueue xs []) =
    (case rev xs of y # ys ==> (y, AQueue [] ys))"
  "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"
  by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split)

text 
 \noindent In the corresponding code, there is no equation
 for the pattern termAQueue [] []:
 


text %quote 
 @{code_stmts strict_dequeue constant: strict_dequeue (Haskell)}
 


text 
 \noindent In some cases it is desirable to state this
 pseudo-\qt{partiality} more explicitly, e.g.~as follows:
 


axiomatization %quote empty_queue :: 'a

definition %quote strict_dequeue' :: "'a queue ==> 'a × 'a queue" where
  "strict_dequeue' q = (case dequeue q of (Some x, q') ==> (x, q')
    | _ ==> empty_queue)"

lemma %quote strict_dequeue'_AQueue [code]:
  "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue
     else strict_dequeue' (AQueue [] (rev xs)))"
  "strict_dequeue' (AQueue xs (y # ys)) =
     (y, AQueue xs ys)"
  by (simp_all add: strict_dequeue'_def split: list.splits)

text 
 Observe that on the right hand side of the definition of conststrict_dequeue', the unspecified constant constempty_queue occurs.
 An attempt to generate code for conststrict_dequeue' would
 make the code generator complain that constempty_queue has
 no associated code equations. In most situations unimplemented
 constants indeed indicated a broken program; however such
 constants can also be thought of as function definitions which always fail,
 since there is never a successful pattern match on the left hand
 side. In order to categorise a constant into that category
 explicitly, use the @{attribute code} attribute with
 abort:
 


declare %quote [[code abort: empty_queue]]

text 
 \noindent Then the code generator will just insert an error or
 exception at the appropriate position:
 


text %quote 
 @{code_stmts strict_dequeue' constant: empty_queue strict_dequeue' (Haskell)}
 


text 
 \noindent This feature however is rarely needed in practice. Note
 that the HOL default setup already includes
 


declare %quote [[code abort: undefined]]

text 
 \noindent -- hence constundefined can always be used in such
 situations.
 



subsection If something goes utterly wrong \label{sec:utterly_wrong}

text 
 Under certain circumstances, the code generator fails to produce
 code entirely. To debug these, the following hints may prove
 helpful:

 \begin{description}

 \ditem{\emph{Check with a different target language}.} Sometimes
 the situation gets more clear if you switch to another target
 language; the code generated there might give some hints what
 prevents the code generator to produce code for the desired
 language.

 \ditem{\emph{Inspect code equations}.} Code equations are the central
 carrier of code generation. Most problems occurring while generating
 code can be traced to single equations which are printed as part of
 the error message. A closer inspection of those may offer the key
 for solving issues (cf.~\secref{sec:equations}).

 \ditem{\emph{Inspect preprocessor setup}.} The preprocessor might
 transform code equations unexpectedly; to understand an
 inspection of its setup is necessary (cf.~\secref{sec:preproc}).

 \ditem{\emph{Generate exceptions}.} If the code generator
 complains about missing code equations, in can be helpful to
 implement the offending constants as exceptions
 (cf.~\secref{sec:partiality}); this allows at least for a formal
 generation of code, whose inspection may then give clues what is
 wrong.

 \ditem{\emph{Remove offending code equations}.} If code
 generation is prevented by just a single equation, this can be
 removed (cf.~\secref{sec:equations}) to allow formal code
 generation, whose result in turn can be used to trace the
 problem. The most prominent case here are mismatches in type
 class signatures (\qt{wellsortedness error}).

 \end{description}
 


end

Messung V0.5 in Prozent
C=24 H=90 G=65

¤ Dauer der Verarbeitung: 0.11 Sekunden  (vorverarbeitet am  2026-06-09) ¤

*© Formatika GbR, Deutschland






Wurzel

Suchen

Beweissystem der NASA

Beweissystem Isabelle

NIST Cobol Testsuite

Cephes Mathematical Library

Wiener Entwicklungsmethode

Haftungshinweis

Die Informationen auf dieser Webseite wurden nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit, noch Qualität der bereit gestellten Informationen zugesichert.

Bemerkung:

Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.