Fall 2000, CSE 468: Lecture 5 (Sep 6)


The notions of Alphabet, string, and language

Alphabet

An alphabet A is any (finite) set of symbols. Examples: {a,b,c}, the English alphabet {a,b,c,...,z}, the alphabet of digits {0,1,2,...,9}, etc.

String

A string is a finit sequence of symbols of the given alphabet. Examples: lambda (the empty string), a, ab, aba, etc.

String concatenation

If x and y are strings, then xy is the string obtained by concatenating x and y (first x then y). Example: if x = ab and y = bc, then xy = abbc.

Length of a string

The length of a string x, which we will denote by |x|, is the number of symbols occurring in x, counting the repetitions of the same symbol. Example: |abbc| = 4.

Language

A language is any set of strings on the given alphabet. Example: L = {lambda, a, ab, abcaa}.

Language concatenation

If L1 and L2 are languages, then we define
L1L2 = {x1x2 | x1 is in L1, x2 is in L2}

Exponential and Kleene's star

The exponential is defined inductively (or recursively) in the following way
L0 = {lambda}
Ln+1 = LnL
Note that the concatenation is associative, hence we could have equivalently defined Ln+1 = LLn. The star is defined as follows
L* = the union of all Ln for n greater than or equal to 0
Sometimes we want to exclude L0 from this construction, therefore we define also
L+ = the union of all Ln for n greater than or equal to 1
Note that the set of all strings in a alphabet A is A*.

Definition of languages

The first part of the course will focus on the formal definition of languages (i.e. the formal specification of a particular language as a particular set of strings), the study of properties of a language, and the recognition of the strings of a language.

We will start with the class of regular languages, which are rather simple from the point of view of definition and recognition, but yet interesting, in the sense that they can be infinite and can contain strings of arbitrary length.

Regular Expressions and Regular Languages

Regular expressions

The regular expressions are all those expressions that can be constructed on We will use parentheses to represent the structure of an expression, and we will assume that * has precedence over concatenation, concatenation has precedence over +, and that concatenation and + are associative.

Language represented by a regular expression

The empty string lambda stands for {lambda}, an alphabet symbol a stands for {a}, + stands for union, and concatenation and * stand for the homonymous operations on languages.

Examples The set of strings on {a,b} with even lenght can be represented by the regular expression (aa + ab + ba + bb)*, or equivalently by the regular expression ((a + b)(a + b))*.

Regular Languages

The class of Regular Languages is constituted by all (and only) the languages represented by regular expressions.