Fall 98, CSE 468: Lectures

Fall 98, CSE 468: Lecture 4 (Sep 2)

Languages

Alphabet

An alphabet A is any set of symbols. Examples: {a,b,c}, the English alphabet {a,b,c,...,z}, the alphabet of digits {0,1,2,...,9}, etc.

String

A finit sequence of symbols of the given alphabet . Examples: lambda (the empty string), a, ab, aba, etc.

String concatenation

If x and y are strings, then xy is the string obtained by concatenating x and y (first x then y). Example: if x = ab and y = bc, then xy = abbc.

Length of a string

The length of a string x, which we will denote by |x|, is the number of symbols occurring in x, counting the repetitions of the same symbol. Example: |abbc| = 4.

Language

Any set of strings on the given alphabet. Example: L = {lambda, a, ab, abcaa}.

Operations on languages

Language concatenation

If L₁ and L₂ are languages, then we define

L₁L₂ = {x₁x₂ | x₁ is in L₁, x₂ is in L₂}

Exponential and Kleene's star

The exponential is defined inductively (or recursively) in the following way

L⁰ = {lambda}
Lⁿ⁺¹ = LⁿL

Note that the concatenation is associative, hence we could have equivalently defined Lⁿ⁺¹ = LLⁿ. The star is defined as follows

L^* = the union of all Lⁿ for n greater than or equal to 0

Sometimes we want to exclude L⁰ from this construction, therefore we define also

L⁺ = the union of all Lⁿ for n greater than or equal to 1

Note that the set of all strings in a alphabet A is A^*.

Definition of languages

The first part of the course will focus on the formal definition of languages (i.e. the formal specification of a particular language as a particular set of strings), the study of properties of a language, and the recognition of the strings of a language.

One very general method for defining a language is to give an inductive definition (also called recursive definition in Martin's book).

Inductive defintion

Given an universe U, and inductive definition is a specification of the form:

e₁,...,e_m are in S
e is in S -> op₁(e) is in S,
...
e is in S -> op_m(e) is in S

where e₁,...,e_m are elements of U and op₁,...,op_m are operations on U. This definition can be extended in the obvious way to operations with more than one argument.

The set S specified by this definition is the smallest subset of U which satisfies points (1) and (2) above (sometimes this qualification of "smallest" is added explicitly to the definition). Note that there might more than one set satisfying (1) and (2), hence it is important to remember that we intend the smallest one.

Structural induction

Inductive definitions are associated to the important principle of structural induction. Given S defined as above, and a property P the inductive principle says that

In order to prove the formula
for all x in S. P(x)
it is sufficient to prove

Base case: P(e₁),...,P(e_m), and
Inductive step: P(e) -> P(op₁(e)),..., P(e) -> P(op_m(e)).

The correctness of this principle can be explained as follows: Let S' = {x in U | P(x)}. If we prove the base case and the inductive step for P, then we have proved that S' satisfies (1) and (2) above. But S is the smallest such set. Hence S is contained in (or equal to) S'. Therefore for all x in S, P(x) holds.

Example

Consider the alphabet A = {a,b}, the universe A^*, and the inductive specification of a language L as follows:

a is in L
x is in L -> axb is in L

Suppose we want to prove that for every x in L, the number of a's in x is equal to the number of b's in x plus 1. It is easy to prove this property by structural induction.

Exercise Prove that the language L defined above is equal to the languge L' = {aⁿ⁺¹bⁿ | n >= 0}. Hint: prove that L is contained in L' by structural induction, and that L' is contained in L by mathematical induction on n.