CSE 428: Lecture 3
Derivation Tree
Derivation trees (also called "parse
trees" in Sethi's book) are a way to represent the generation of
strings in a grammar. They also give information about the structure
of the strings, i.e. the way they are organized in syntactical categories.
-
Definition
-
Given a grammar G = < T , N , s , P > ,
a derivation tree t
for G is a tree such that:
-
the root is labeled by s
-
the leaves are labeled by terminal symbols
-
each intermediate node is labeled by a non-terminal symbol, and,
if its label is A, then its children
are labeled by symbols s1 , s2
, ... , sn such that there exists a production
A ::= s1 s2 ... sn
in P
The labels of the leaves (fringe) represent the string generated
by t.
We will indicate it by string(t).
It is easy to see that a derivation tree represents a set of
derivations
(usually more than one) for the same string, and that for each derivation
there is a derivation tree for the same string. Hence L(G)
coincides with the set of strings generated by all possible derivation
trees for G. More formally, if
we denote by DT(G) the set of all derivation
trees for G, we have the following result:
-
Proposition
-
L(G) = { alpha in T* | alpha = string(t) for some t in DT(G) }
-
Example
-
Let us consider again the language of numerical expressions, with productions
-
Exp ::= Num | Exp + Exp | Exp * Exp
-
We have that a possible derivation tree for the string
2 + 3 * 5 is the following:
Exp
/|\
/ | \
/ | \
Exp + Exp
| /|\
| / | \
| / | \
Num Exp * Exp
| | |
2 Num Num
| |
3 5
-
This tree corresponds to several derivations for the same string, which
differ only for the choice of the non-terminal to expand at each derivation
step.
Ambiguity
The structure of an expression is usually essential to interpret its meaning.
The expression 2 + 3 * 5 for
example has two different values depending on its intended structure: If
we assume it to be 2 + (
3 * 5 ) (i.e.
3 and 5
grouped together by *) then the result
is 17. If, on the other hand, we assume
it to be ( 2
+ 3 ) * 5,
then the result is 25. In order to
avoid this kind of ambiguity, it is essential that the grammar
generates only one possible structure for each string in the language.
Since the structure is represented by the derivation tree, we have the
following definition:
-
Definition
-
A grammar G is ambiguous
if there exist a string in L(G) which
can be derived by two (or more) different derivation trees.
-
Example
-
The grammar in the example above is ambiguous,
in fact the string 2 + 3 * 5
can be generated also by the following tree:
Exp
/|\
/ | \
/ | \
Exp * Num
/|\ |
/ | \ 5
/ | \
Exp + Exp
| |
Num Num
| |
2 3
This tree corresponds to the grouping (
2 + 3 )
* 5, while the tree in the example above corresponds to 2
+ ( 3 * 5
).
There are languages which are intrinsically ambiguous,
i.e. it is not possible to eliminate their ambiguities without changing
the language.
-
Definition
-
A language L is intrinsically ambiguous
if can be generated only by ambiguous grammars, i.e. for every grammar
G such that L=L(G),
we have that G is ambiguous.
Luckily, languages which are interesting from the point of view of programming
usually are not intrinsically ambiguous, and therefore we can find non-ambiguous
grammars which generates them. When a (non-intrinsically ambiguous) language
L is presented by an ambiguous grammar
G, "to eliminate the ambiguities of
G" means to find another grammar
G',
which is non ambiguous, and which generates the same language
L.
We will consider three common examples of ambiguities, and the way
to eliminate them:
-
Precedence
-
Associativity
-
Dangling-else
Precedence
In the examples above, the ambiguity in the interpretation of
2 + 3 * 5
can be eliminated by imposing the
precedence of one operator over the other.
We say that op
has precedence over op' if
an expression of the form
e1 op e2 op' e3
(respectively e1 op' e2 op e3 )
is interpreted only as
(e1
op e2)
op' e3 (respectively e1
op' (e2
op e3) )
In other words, the
grouping power of op
is greater than the grouping power of op'.
From the point of view of derivation trees, the fact that
e1 op e2 op' e3
is interpreted as (e1
op e2)
op' e3means that the introduction of
op must be done at a level strictly
lower than op', i.e. in a sub-tree
whose root is a child of op'.
In order to modify the grammar so that it generates only this kind of tree,
a possible solution is to introduce a new syntactic category producing
expressions of the form
e1 op e2,
and to force a hierarchical order w.r.t. to
the main category of expressions of the form e1
op' e2.
-
Example
-
We can eliminate the ambiguities from the grammar in the example
above by introducing a new syntactic category Term
producing expressions of the form
-
e1 * e2
-
where e1 and
e2 may contain *
again, but not +. This can be
done by organizing hierarchically the productions as follows:
-
Exp ::= Exp + Term | Term
-
Term ::= Term * Num | Num
-
This modification corresponds to assigning *
a higher priority w.r.t. + (following
the mathematical convention). Consider again the string 2
+ 3 * 5. It is easy to see that in the new grammar there
is only one tree which can generate it:
Exp
/|\
/ | \
/ | \
Exp + Term
| /|\
| / | \
| / | \
Term Term * Num
| | |
Num Num 5
| |
2 3
Associativity
Consider again the grammar for numerical expressions in previous
example, and consider the grammar obtained by modifying the productions
for Exp in the following way:
Exp ::= Exp + Exp | Term
This new grammar is ambiguous. In fact, it allows two different derivation
trees for the string 2 + 3 + 5:
one corresponding to the structure
(2
+ 3) + 5
and one corresponding to the structure
2 +
(3 + 5).
In the case of the +
operator, this kind of ambiguity is not a problem, because of its algebraic
properties: + is
associative, i.e.
(2 + 3)
+ 5 and
2 + (3
+ 5) have the same value.
In general, however, an operator might not be associative. This
is for instance the case for the -
and ^ (exponentiation) operators:
(5 - 3) - 2
and
5 - (3
- 2)
have different values,
as well as (5
^ 3) ^ 2
and
5 ^ (3
^ 2).
In order to eliminate this kind of ambiguity, we mush establish whether
the operator is left-associative or
right-associative.
Left-associative means that e1 op e2
op e3 is interpreted as
(e1
op e2)
op e3 (op
associates to the left). Vice versa, right-associative means that
it is interpreted as e1 op
(e2
op e3) (op
associates to the right).
We can impose left-associativity (resp. right-associativity) by using
the following technique: In the production introducing op,
we place the syntactic category producing op
to the left (resp. to the right) of op.
Note that in previous example this is done for
both + and
* : they are forced to be left-associative.
-
Example
-
Consider the following grammar (productions) for numerical expressions
constructed with the - operation:
-
Exp ::= Num | Exp - Exp
-
This grammar is ambiguous since it allows both the interpretations
(5 - 3)
- 2
and
5 - (3
- 2).
If we want to impose the
left-associativity (following the mathematical convention), it is sufficient
to modify the productions in the following way:
-
Exp ::= Num | Exp - Num
-
Example
-
Consider the following grammar (productions) for numerical expressions
constructed with the ^ operation:
-
Exp ::= Num | Exp ^ Exp
-
This grammar is ambiguous since it allows both the interpretations
(5 ^ 3)
^ 2
and
5 ^ (3
^ 2).
If we want to impose the
right-associativity (following the mathematical convention), it is sufficient
to modify the productions in the following way:
-
Exp ::= Num | Num ^ Exp