## Languages

#### Alphabet

An alphabet A is any set of symbols. Examples: {a,b,c}, the English alphabet {a,b,c,...,z}, the alphabet of digits {0,1,2,...,9}, etc.

#### String

A finit sequence of symbols of the given alphabet . Examples: lambda (the empty string), a, ab, aba, etc.

#### String concatenation

If x and y are strings, then xy is the string obtained by concatenating x and y (first x then y). Example: if x = ab and y = bc, then xy = abbc.

#### Length of a string

The length of a string x, which we will denote by |x|, is the number of symbols occurring in x, counting the repetitions of the same symbol. Example: |abbc| = 4.

#### Language

Any set of strings on the given alphabet. Example: L = {lambda, a, ab, abcaa}.

### Operations on languages

#### Language concatenation

If L1 and L2 are languages, then we define
L1L2 = {x1x2 | x1 is in L1, x2 is in L2}

#### Exponential and Kleene's star

The exponential is defined inductively (or recursively) in the following way
L0 = {lambda}
Ln+1 = LnL
Note that the concatenation is associative, hence we could have equivalently defined Ln+1 = LLn. The star is defined as follows
L* = the union of all Ln for n greater than or equal to 0
Sometimes we want to exclude L0 from this construction, therefore we define also
L+ = the union of all Ln for n greater than or equal to 1
Note that the set of all strings in a alphabet A is A*.

### Definition of languages

The first part of the course will focus on the formal definition of languages (i.e. the formal specification of a particular language as a particular set of strings), the study of properties of a language, and the recognition of the strings of a language.

One very general method for defining a language is to give an inductive definition (also called recursive definition in Martin's book).

#### Inductive defintion

Given an universe U, and inductive definition is a specification of the form:
1. e1,...,em are in S
2. e is in S -> op1(e) is in S,
...
e is in S -> opm(e) is in S
where e1,...,em are elements of U and op1,...,opm are operations on U. This definition can be extended in the obvious way to operations with more than one argument.

The set S specified by this definition is the smallest subset of U which satisfies points (1) and (2) above (sometimes this qualification of "smallest" is added explicitly to the definition). Note that there might more than one set satisfying (1) and (2), hence it is important to remember that we intend the smallest one.

#### Structural induction

Inductive definitions are associated to the important principle of structural induction. Given S defined as above, and a property P the inductive principle says that
In order to prove the formula
for all x in S. P(x)
it is sufficient to prove
• Base case: P(e1),...,P(em), and
• Inductive step: P(e) -> P(op1(e)),..., P(e) -> P(opm(e)).
The correctness of this principle can be explained as follows: Let S' = {x in U | P(x)}. If we prove the base case and the inductive step for P, then we have proved that S' satisfies (1) and (2) above. But S is the smallest such set. Hence S is contained in (or equal to) S'. Therefore for all x in S, P(x) holds.

#### Example

Consider the alphabet A = {a,b}, the universe A*, and the inductive specification of a language L as follows:
1. a is in L
2. x is in L -> axb is in L
Suppose we want to prove that for every x in L, the number of a's in x is equal to the number of b's in x plus 1. It is easy to prove this property by structural induction.

Exercise Prove that the language L defined above is equal to the languge L' = {an+1bn | n >= 0}. Hint: prove that L is contained in L' by structural induction, and that L' is contained in L by mathematical induction on n.