##
*Fall 98, CSE 468:
Lecture 4 (Sep 2)*

## Languages

#### Alphabet

An alphabet A is any set of symbols. Examples: {a,b,c},
the English alphabet {a,b,c,...,z}, the alphabet of digits
{0,1,2,...,9}, etc.
#### String

A finit sequence of symbols of the given alphabet .
Examples: lambda (the empty string), a, ab, aba, etc.
#### String concatenation

If x and y are strings, then xy is the string obtained
by concatenating x and y (first x then y). Example:
if x = ab and y = bc, then
xy = abbc.
#### Length of a string

The length of a string x, which we will denote by |x|, is the
number of symbols occurring in x, counting the repetitions of the same symbol.
Example: |abbc| = 4.
#### Language

Any set of strings on the given alphabet. Example:
L = {lambda, a, ab, abcaa}.
### Operations on languages

#### Language concatenation

If L_{1} and L_{2} are languages,
then we define
L_{1}L_{2} = {x_{1}x_{2} |
x_{1} is in L_{1}, x_{2} is in
L_{2}}

#### Exponential and Kleene's star

The exponential is defined inductively (or recursively) in the following way
L^{0} = {lambda}

L^{n+1} = L^{n}L

Note that the concatenation is associative, hence we could have
equivalently defined L^{n+1} = LL^{n}.
The star is defined as follows
L^{*} = the union of all L^{n} for n greater than or equal to 0

Sometimes we want to exclude L^{0} from this construction,
therefore we define also
L^{+} = the union of all L^{n} for n greater than or equal to 1

Note that the set of all strings in a alphabet A is A^{*}.
### Definition of languages

The first part of the course will focus
on the formal definition of languages (i.e. the formal specification of
a particular language as a particular set of strings), the
study of properties of a language, and the recognition of
the strings of a language.
One very general method for defining a language is to give an
inductive definition (also called recursive definition in Martin's book).

#### Inductive defintion

Given an universe U, and inductive definition is a specification of
the form:
- e
_{1},...,e_{m} are in S
- e is in S -> op
_{1}(e) is in S,

...

e is in S -> op_{m}(e) is in S

where e_{1},...,e_{m} are elements of U and
op_{1},...,op_{m} are operations on U.
This definition can be extended in the obvious way to operations with
more than one argument.
The set S specified by this definition is *the smallest*
subset of U which satisfies points (1) and (2) above (sometimes this
qualification of "smallest" is added explicitly to the definition).
Note that there might more than one set satisfying (1) and (2), hence it
is important to remember that we intend the smallest one.

#### Structural induction

Inductive definitions are associated to the important principle
of *structural induction*. Given S defined as above, and a property P
the inductive principle says that
In order to prove the formula
for all x in S. P(x)

it is sufficient to prove
- Base case: P(e
_{1}),...,P(e_{m}), and
- Inductive step: P(e) -> P(op
_{1}(e)),...,
P(e) -> P(op_{m}(e)).

The correctness of this principle can be explained as follows:
Let S' = {x in U | P(x)}. If we prove the base case and
the inductive step for P, then we have proved that S'
satisfies (1) and (2) above. But S is the smallest such set. Hence
S is contained in (or equal to) S'. Therefore for all x in S, P(x) holds.
#### Example

Consider the alphabet A = {a,b}, the universe A^{*}, and the
inductive specification of a language L as follows:
- a is in L
- x is in L -> axb is in L

Suppose we want to prove that for every x in L, the number of
a's in x is equal to the number of b's in x plus 1. It is easy to
prove this property by structural induction.
**Exercise** Prove that the language L defined above is
equal to the languge L' = {a^{n+1}b^{n} | n >= 0}.
Hint: prove that L is contained in L' by structural induction,
and that L' is contained in L by mathematical induction on n.