Fall 98, CSE 468: Lectures

Fall 98, CSE 468: Lecture 34 (Dec 2)

The theorem of Rice-Shapiro

The theorem of Rice-Shapiro characterizes the semidecidable extensional properties of programs. "Extensional" are those properties which regard the input-output relation of the program. In contrast, and "intensional property" is a property of the code.

There is a very general formulation of the result of Rice-Shapiro in terms of effective domains, but we will see only the instance which is relevant for this course. We will not see the proof because it involves topological techniques that are beyond the scope of this course.

Theorem (Rice-Shapiro) Let p be a property of recursively enumerable languages, i.e. p: RE -> {true,false}. Let X = {e(M) | M is a TM and p(L(M)) holds}. Then X is RE iff there exists a recursively enumerable set of indexes I, and a sequence of finite sets indexed on I, {L_i | i in I} such that

{L in RE | p(L) holds} = union_{i in I} UC(L_i).

Were UC(L) (upward-closure of L) is the set of all supersets of L, i.e.

UC(L) = {L' | L is a subest of L'}.

One part of this theorem is the so-called "Lemma of effective discontinuity" (it's called "lemma" because it is used to prove the theorem).

Lemma (Effective discontinuity) Let p: RE -> {true,false}. Let X = {e(M) | M is a TM and p(L(M)) holds}. If X is RE then

p is upward-closed, i.e. if p(L) holds, and L is a subset of L', then p(L') holds too.
p is finitely provable, i.e. if p(L) holds, then there is a finite subset L' of L such that p(L') holds too.

Again, we will not see a formal proof of this lemma since it requires notions from topology theory. However the intuitive explanation is the following. Suppose that X is RE. Then we have a machine M_X that, given as input the (encoding of) another machine M, it is able to terminate with answer "yes" iff L(M) satisfies p. Since the decision of saying "yes" is taken after a finite number of steps, it must be based only on a finite subset of the set L(M) (because in finite time we can test only a finite number of strings). This justifies Point 2 in the lemma above. As for Point 1, note that if the machine says "yes" on M, then it must say "yes" also on any other machine M' whose language L(M') is a superset of L(M). In fact, we have no way to know (in general) that a string does not belong to L(M) (M might loop on the strings which are not in L(M)). Hence the strings which are in L(M') and not in L(M) cannot change the decision of M_X of saying "yes".

It should be clear that the Theorem of Rice (Lecture 36) is an immediate consequence of the theorem of Rice-Shapiro. In fact, if the set X defined above is Recursive, then both X and the complement of X are RE. By the lemma of effective discontinuity, both p and the negation of p should then be upward closed. We have two possibilities:

p holds on the emptyset: then p holds on every language (p is always true).
p does not hold on the emptyset. Then the negation of p holds on the emptyset, and therefore on every language (p is always false).

Some consequences of the theorem of Rice-Shapiro

We give here a series of examples of sets which can be proved RE / not RE by using the theorem of Rice-Shapiro or the lemma of effective discontinuity.

L₁ = {e(M) | L(M) is finite} is not RE (contradicts Point 1 in the lemma)
L₂ = {e(M) | L(M) is infinite} is not RE (contradicts Point 2 in the lemma)
L₃ = {e(M) | L(M) contains x₀} (where x₀ is a given string) is RE. In fact p holds exacly on the set UC({x₀}).
L₄ = {e(M) | L(M) does not contain x₀} is not RE (contradicts Point 1 in the lemma).
L₅ = {e(M) | L(M) is recursive} is not RE (contradicts Point 1 in the lemma, because the emptyset is recursive and other languages are not).
L₆ = {e(M) | L(M) is context-free} is not RE (same reason as for L₅).
L₇ = {e(M) | L(M) is regular} is not RE (same reason as for L₅).
L₈ = {e(M) | L(M) is not recursive} is not RE (contradicts Point 2 in the lemma, because all the finite sets are recursive).
L₉ = {e(M) | L(M) is not context-free} is not RE (same reason as for L₈).
L₁₀ = {e(M) | L(M) is not regular} is not RE (same reason as for L₈).

We have formulated these results in terms of RE/not RE, but we could have formulated them in terms of semidecidability/not semidecidability. For instance, the cases of L₆ and L₁₀ can be reformulated as follows:

In general it is not possible to semidecide whether a given language is context-free or not. In other words, there exist no general method to construct a CF grammar for any CF language, and there exist no general method able to prove that a langauge is not CF for any not-CF langauge.

The above negative result depends critically, of course, on the fact that we allow here the most general kind of definitions for languages (Turing machines). If we would fix the format of the specification (for instance, if we would allow only certain kinds of recursive definitions) then the problem "is L CF?" might become semidecidable or even decidable.

Some consequences of the theorem of Rice-Shapiro for programming languages

In previous section we have seen several negative results about the capability of semideciding (and hence deciding) properties of TMs. Because of the Church's thesis, the same kind of results apply to any programming languages, including for instance C, C++, Java, Pascal, ML, Prolog etc.

It should be remarked that these results regard "extensional properties" of programs, (i.e. properties of the input-output relation computed by a program), and not the "intentional properties" (i.e. properties of the code). The latter are in general decidable.

Let us consider in detail two main negative results for programming languages related to the theorem of Rice-Shapiro. In the following, we assume the programming language to be fixed, for intance C.

Termination. The problem "given a program P, does P terminate on every input?" is not semidecidable. In fact, this is equivalent to saying that the language {e(P) | L(P) = Sigma^*} is not RE. (The proof is left as an exercise.)
The problem "given a program P, does P terminate on input x₀?" (where x₀ is a given string) is semidecidable, but not decidable (Proof: from the results for L₃ and L₄ above.) This is the so-called "halting problem".
Correctness. The problem "given a program P, does P compute the relation r₀?" (where r₀ - the specification - is a given relation on string) is not semidecidable in general. In fact, this is equivalent to saying that the language {e(P) | {x#y| x,y in Sigma^* and x r₀y } is a subset of L(P)} is not RE. The symbol "#" here is a special symbol not contained in Sigma which serves to separate the input from the output. The proof is left as an exercise.
The problem becomes semidecidable (but not decidable) if r₀ is finite, i.e. if the correctness has to be tested only on a finite number of inputs.

These negative results should not discourage, of course, from trying to write programs which are correct and terminating. Neither they should discourage from trying to develop formal methods for proving correctness and termination. They only state that, however refined these methods can be, they will never be complete, i.e. they will never be guarranteed to be able to prove the correctness of all correct programs, nor to prove the termination of all terminating programs. In other words, there will always be a correct program that cannot be proved correct with the given method, and a terminating program whose termination (on all inputs) cannot be proved.