Fall 98, CSE 468:
Lecture 4 (Sep 2)
Languages
Alphabet
An alphabet A is any set of symbols. Examples: {a,b,c},
the English alphabet {a,b,c,...,z}, the alphabet of digits
{0,1,2,...,9}, etc.
String
A finit sequence of symbols of the given alphabet .
Examples: lambda (the empty string), a, ab, aba, etc.
String concatenation
If x and y are strings, then xy is the string obtained
by concatenating x and y (first x then y). Example:
if x = ab and y = bc, then
xy = abbc.
Length of a string
The length of a string x, which we will denote by |x|, is the
number of symbols occurring in x, counting the repetitions of the same symbol.
Example: |abbc| = 4.
Language
Any set of strings on the given alphabet. Example:
L = {lambda, a, ab, abcaa}.
Operations on languages
Language concatenation
If L1 and L2 are languages,
then we define
L1L2 = {x1x2 |
x1 is in L1, x2 is in
L2}
Exponential and Kleene's star
The exponential is defined inductively (or recursively) in the following way
L0 = {lambda}
Ln+1 = LnL
Note that the concatenation is associative, hence we could have
equivalently defined Ln+1 = LLn.
The star is defined as follows
L* = the union of all Ln for n greater than or equal to 0
Sometimes we want to exclude L0 from this construction,
therefore we define also
L+ = the union of all Ln for n greater than or equal to 1
Note that the set of all strings in a alphabet A is A*.
Definition of languages
The first part of the course will focus
on the formal definition of languages (i.e. the formal specification of
a particular language as a particular set of strings), the
study of properties of a language, and the recognition of
the strings of a language.
One very general method for defining a language is to give an
inductive definition (also called recursive definition in Martin's book).
Inductive defintion
Given an universe U, and inductive definition is a specification of
the form:
- e1,...,em are in S
- e is in S -> op1(e) is in S,
...
e is in S -> opm(e) is in S
where e1,...,em are elements of U and
op1,...,opm are operations on U.
This definition can be extended in the obvious way to operations with
more than one argument.
The set S specified by this definition is the smallest
subset of U which satisfies points (1) and (2) above (sometimes this
qualification of "smallest" is added explicitly to the definition).
Note that there might more than one set satisfying (1) and (2), hence it
is important to remember that we intend the smallest one.
Structural induction
Inductive definitions are associated to the important principle
of structural induction. Given S defined as above, and a property P
the inductive principle says that
In order to prove the formula
for all x in S. P(x)
it is sufficient to prove
- Base case: P(e1),...,P(em), and
- Inductive step: P(e) -> P(op1(e)),...,
P(e) -> P(opm(e)).
The correctness of this principle can be explained as follows:
Let S' = {x in U | P(x)}. If we prove the base case and
the inductive step for P, then we have proved that S'
satisfies (1) and (2) above. But S is the smallest such set. Hence
S is contained in (or equal to) S'. Therefore for all x in S, P(x) holds.
Example
Consider the alphabet A = {a,b}, the universe A*, and the
inductive specification of a language L as follows:
- a is in L
- x is in L -> axb is in L
Suppose we want to prove that for every x in L, the number of
a's in x is equal to the number of b's in x plus 1. It is easy to
prove this property by structural induction.
Exercise Prove that the language L defined above is
equal to the languge L' = {an+1bn | n >= 0}.
Hint: prove that L is contained in L' by structural induction,
and that L' is contained in L by mathematical induction on n.