Fall 2000, CSE 468:
Lecture 6 (Sep 8)
Regular Languages
In previous lecture we have seen that the regular languages as all those
languages that can be represented by regular expressions.
Regular languages
are important in computer science, because they are particularly
simple to define and to reason about (by using algebraic properties of
regular expressions), are easy
to recognize (as we will see later), and there are interesting
parts of a programming language that can be expressed as regular languages.
Typically, the tokens of a programming language are
a regular language. The component of the system which
recognizes them is called "scanner".
Regular expressions are also used in search engines and
are allowed in many systems (like for instance Unix)
as parameters of commands.
Examples
- The identifiers in a programming language
(strings of letters or digits starting with a letter).
The corresponding expression is
(a + b + ... + z)(a + b + ... + z + 0 + 1 + ... + 9)*
- The representation of positive natural numbers
(0 and strings of digits starting with a symbol different from 0).
The corresponding expression is
0 + (1 + ... + 9)(0 + 1 + ... + 9)*
- The representation of positive (approximations of) real numbers
(integer part followed by a dot followed by a sequence of digits).
The corresponding expression is
(0 + (1 + ... + 9)(0 + 1 + ... + 9)*).(0 + 1 + ... + 9)*
Language recognition
Given the formal specification of a language,
one of the main questions which
arise is, how to tell whether or not a certain string
belongs to the given language. And of course,
we want to do it automatically.
This concept is fundamental in computer science:
When we write a program, the first thing we want
the computer to check is whether the program is syntactically
correct. This is nothing else than the problem of recognizing
whether our program (a string of symbols) belongs to the
language specified by the syntax.
Abstract machine for language recognition
A device for recognizing strings in a given language,
in general, will take
in input one simbol of the string at the time, and will take a decision
(reject/accept/go on) depending on
- the current symbol, and
- the symbols previously examined.
Finite Automata
In the above description of an abstract machine for
language recognition,
it will turn out that, if the language is regular,
the amount of memory necessary for point 2 is
bounded. I.e. the size of the memory does not depend on the string
to be recognized.
(But depends, of course, on the language. Note that
each machine corresponds to a paticular language.)
The bounded memory is the characteristic of a
class of machines called
Finite Automata.
Example. Consider an automaton for recognizing
the strings on {a,b} with even length.
At each step, the only information we need to remember, about the
portion of the string already examined, is whether it is odd or even.