### by Robert Boyer and J. Strother Moore

There have been two main goals in the development of automated theorem provers. The first was the development of complete procedures for uncovering proofs in various logical systems. Completeness is, of course, a very important and interesting property of a mechanical procedure since such a procedure's structure and properties can be used to uncover very deep aspects of the logic it implements. A second goal, however, was to develop computer systems that could be used by mathematicians developing proofs and by computer scientists studying the formal properties of programs. While many researchers would motivate their theorem proving efforts by mentioning both of these goals, it was often hard to believe their sincerity in the latter, more practical goal since they frequently would not supply convincing and ``real'' examples of their methods or would conclude their papers with decision procedures that would require the Avogadro number of seconds to discover simple proofs.

Even from the early days of mechanical theorem proving, it was clear that complete procedures were often not useful procedures and vice versa (of course, ``useful'' is not a technical term like ``completeness'' so there is room for debating this conclusion). During the past 15 years, Boyer and Moore have been approaching theorem proving by placing the ``sound and useful'' criterion in front of the ``sound and complete'' criterion. Their sincere orientation to utility was manifest in several ways: they gave up on finding only complete procedures (although a few decision procedures are employed in their system); they reasoned that induction must be incorporated since most of the interesting structures in computer science (and many in mathematics) are constructed recursively; they developed a computer system that incorporated their ideas so they could run experiments and see how useful those ideas were; and finally, they put a great deal of effort into developing by themselves and often with the help of students and colleagues a large number of interesting and significant examples. The result of their efforts is one of the most mature and useful theorem proving projects to-date. The two books under review, the first published 10 years ago, provides a complete picture of their ideas, the computer system that implements them, and examples of their use.

Chapters I -- III of the first text provides an informal and then formal presentation of the logic underlying their system. This logic is a first-order logic based on the rules of propositional calculus, equality reasoning, the rules of instantiation and induction, and three methods to add new formulas to the logic: axioms, shell definitions (the introduction of a new concrete data type and axioms of concerning its behavior), and recursive definitions satisfying certain admissibility conditions (checked by invoking the theorem prover). The authors suggest that new concepts be explained to their system by explicit definitions instead of axioms. Although providing explicit definitions is more difficult, in general, than providing axioms, the consistency of the theorem prover is preserved by the introduction of well-founded definitions while the additions of axioms may introduce inconsistences. The authors are very careful to argue that the simple and low-level axioms supplied by the system's initial environment are consistent, that accepted definitions preserve consistency, and that any formula proclaimed to be a theorem by their various procedures and heuristics is indeed a logical consequence of the axioms of the system.

Chapters IV and V provide examples of how theorems can be stated and proved. Chapters VI -- XV provide the details of the five processes that are used to reduce a given goal (a conjecture) to subgoals until either all subgoals have been removed (the conjecture is proved) or a failure has been found. In the latter case, the prover draws no conclusion about the status of the conjecture.

Chapter XVI is interesting since 42 out of its 43 pages were written by the theorem prover. Those pages present the result of its successful verification of a sequence of definitions and lemmas that leads to the theorem: for every pair of integers x =< 0 and y > 0, there exists non-negative integers i and j such that j < y and x = i * y + j. This chapter illustrates a couple important aspects of the how this theorem prover is used. First, the definition of integers and of plus and times must be entered. Second, since only outermost universal quantification is allowed, the existential quantifiers in the theorem proposed above must be removed. These quantifiers can be instantiated by the explicitly defined (Skolem) functions for quotient and remainder and then showing that x= quotient(x,y) * y + remainder(x,y). It is only this very explicit statement of the theorem that can be verified by the prover. Finally, the prover must be ``programmed'' via a proof outline. If the prover is given our proposed theorem, it will quickly fail since no immediate application of induction will yield a proof: many simple facts need to be entered first. For example, the associativity of both plus and times is needed (as are many other lemmas) to help in rewriting a possible inductive hypothesis into a conclusion. Such simpler lemmas need to be first entered, verified, and committed to the database of theorems so that they can be accessed by subsequent attempts at proving new conjectures. Once enough such lemmas are established, our proposed theorem will be provable by some inductive argument: the lemmas accumulated to this point will make the system's rewriting simplifier and inductive hypothesis generator able to incorporate new facts that were not available on its first attempt. A common mode of using the system on low-level theorems is to watch it fail on attempting a proof, to try to infer from the printed transcript of that attempt what information was lacking, to formalize that missing information as a lemma, and then to prove and accumulate that lemma. Finally, this process is repeated until the prover no longer fails. On deep theorems it is important for the user to have a proof outline that breaks the desired proof into smaller pieces so that the preceding explorative method can be used on the smaller pieces. Thus on deep theorems, the prover is really acting as a proof checker: it is merely filling in the details to the given outline. The activity of supplying lemmas is the way one programs this theorem prover. Fortunately, this programming can be done entirely declaratively by supplying formulas of the logic and not by coding directly in the implementation language. For such programming efforts to be successful, it is important that the user be aware of how the prover's automatic processes work and what they look for in the database of theorems. Chapters VI -- XV provide the details of that processing. While the skills one has developed from writing inductive proofs by hand can be used here, more involved and ``hacky'' programming skills are also required in various verification efforts.

Chapter XVII shows how correctness of an optimizing compiler can be established and Chapter XVIII verifies a fast search searching algorithm. Chapter XIX describes the formulation and verification of the unique prime factorization theorem. Appendix A lists 286 theorems and 110 definitions that the prover was able to accept and verify (back in 1978). Appendixes B and C contain all the axioms that were built into the prover.

The Handbook is a continuation of this first book. Chapters 1 -- 5 address the logic of the most recent version of their prover. The extensions to the logic involve adding a quote facility so that the logic now contains names for all of its own terms and adding axioms describing a partial function for evaluating terms of the logic. With these extensions, metatheoretic aspects of the language can be explored and exploited. For example, if a proof procedure is shown to be sound by the prover, it can then be incorporated into the prover to increase its power. This extension also permitted simple forms of bounded quantification and higher-order programming in the logic.

The rest of the book, Chapters 6 -- 14 and Appendixes I -- III, contain details about the prover as a collection of Common Lisp functions. Information on how to obtain the source files, install a working system on several kinds of computer systems, manage the database of theorems and definitions, and how to start, stop, trace, and interact with the prover. Several examples that show how the various features of the computer system can be used as well as several hints and suggestions on how to organize proofs and formalize problems are presented.

The second part of this text, starting with Chapter 6, is intended mostly for people familiar with basic Lisp programming and who are interested in using this prover. While the first five chapters focus more on the underlying logic, a background in Lisp programming is still useful, although not necessary, for appreciating the subtleties and difficulties of providing a usable encoding of some of the logic into itself. Lisp has always been able to manipulate its own expressions: Chapters 2, 3, and 4 provide a first-order formalization of certain aspects of the Lisp approach to such meta-manipulations.

The Boyer and Moore theorem prover has been used by numerous people during the past 15 years. The list of theorems that have been successfully verified using this system is impressive and is outlined in the introduction to the Handbook. In the area of elementary number theory, it has been used to verify Wilson's theorem and the law of quadratic reciprocity; in the area of metamathematics, the halting problem for pure Lisp, the Church-Rosser theorem, and Goedel's incompleteness theorem. In computing systems and languages, the prover has been used to help verify formal properties of various communication protocols, concurrent programs, and real time control systems. A small processor, an assembly language, a simple operating system, and a high-level programming language have also been formally analyzed and the relationships between these components have been investigated.

There are a few things, however, I would wished to have seen discussed with respect to the Boyer and Moore theorem prover. First, there is very little discussion about the relationship their efforts have with those in the literature. Their references are appropriate for what they discuss, but there is not a chapter or section on related work.

Second, while it is clearly important to know that a given formula is a theorem, in many settings it is also important to know why a formula is a theorem. That is, a proof of the theorem should be available to examine and learn from. The English language transcript of the theorem prover's attempt to establish a conjecture provides the key to a proof, although it is in a very operational and non-declarative form. It is easy to imagine modifications of their prover so that it would build completed proofs (in the sense used by logicians) and the inclusion of software for exploring and presenting the details of such a proof.

Finally, and returning to the issue of completeness, it would be useful to know on which syntactically characterized subsets of formulas are the various automatic processing components complete. Similarly, it would be good to know under what circumstances the system's heuristics behave monotonically with respect to a growing database of theorems. In other words, if providing proof outlines is a kind of programming, what formal properties does such programming satisfy? While partial answers to these questions are possible, no general answer seems to be available since using any proof system on serious problems is an art that cannot easily be formalized. Recently, after listening to a beautiful organ recital, Boyer was moved to say: ``The next time someone asks me how long it takes to be proficient at using our theorem prover, I'll tell them `just as long as it takes someone to be proficient at the organ.'.'' Indeed, after considering the evidence in these two books, it seems clear that the best theorem provers of the future will require many years of study and practice in order to be used to verify deep pieces of mathematics and large computer systems. The demand for human interaction illustrated by the Boyer and Moore prover, although not part of the dream of the early theorem proving researchers, does seem inevitable.