Fall 2000, CSE 468:
Lecture 5 (Sep 6)
The notions of Alphabet, string, and language
Alphabet
An alphabet A is any (finite) set of symbols. Examples: {a,b,c},
the English alphabet {a,b,c,...,z}, the alphabet of digits
{0,1,2,...,9}, etc.
String
A string is a finit sequence of symbols of the given alphabet.
Examples: lambda (the empty string), a, ab, aba, etc.
String concatenation
If x and y are strings, then xy is the string obtained
by concatenating x and y (first x then y). Example:
if x = ab and y = bc, then
xy = abbc.
Length of a string
The length of a string x, which we will denote by |x|, is the
number of symbols occurring in x, counting the repetitions of the same symbol.
Example: |abbc| = 4.
Language
A language is any set of strings on the given alphabet. Example:
L = {lambda, a, ab, abcaa}.
Language concatenation
If L1 and L2 are languages,
then we define
L1L2 = {x1x2 |
x1 is in L1, x2 is in
L2}
Exponential and Kleene's star
The exponential is defined inductively (or recursively) in the following way
L0 = {lambda}
Ln+1 = LnL
Note that the concatenation is associative, hence we could have
equivalently defined Ln+1 = LLn.
The star is defined as follows
L* = the union of all Ln for n greater than or equal to 0
Sometimes we want to exclude L0 from this construction,
therefore we define also
L+ = the union of all Ln for n greater than or equal to 1
Note that the set of all strings in a alphabet A is A*.
Definition of languages
The first part of the course will focus
on the formal definition of languages (i.e. the formal specification of
a particular language as a particular set of strings), the
study of properties of a language, and the recognition of
the strings of a language.
We will start with the class of regular languages, which are rather simple
from the point of view of definition and recognition,
but yet interesting, in the sense that they can be infinite
and can contain strings of arbitrary length.
Regular Expressions and Regular Languages
Regular expressions
The regular expressions are all those expressions that can
be constructed on
- lambda
- symbols of the alphabet
- + (binary)
- concatenation (binary)
- * (Kleene's star, unary)
We will use parentheses to represent the structure
of an expression, and we will assume that * has precedence
over concatenation, concatenation has precedence over +, and
that concatenation and + are associative.
Language represented by a regular expression
The empty string
lambda stands for {lambda}, an alphabet symbol a stands for {a},
+ stands for union, and concatenation and * stand for the homonymous
operations on languages.
Examples The set of strings on {a,b}
with even lenght can be represented by the regular expression
(aa + ab + ba + bb)*, or equivalently by the regular expression
((a + b)(a + b))*.
Regular Languages
The class of Regular Languages is constituted by all (and only) the
languages represented by regular expressions.