CSE 428: Lecture 3
Derivation Tree
Derivation trees (also called "parse
trees" in Sethi's book) are a way to represent the generation of
strings in a grammar. They also give information about the structure
of the strings, i.e. the way they are organized in syntactical categories.

Definition

Given a grammar G = < T , N , s , P > ,
a derivation tree t
for G is a tree such that:

the root is labeled by s

the leaves are labeled by terminal symbols

each intermediate node is labeled by a nonterminal symbol, and,
if its label is A, then its children
are labeled by symbols s_{1} , s_{2}
, ... , s_{n} such that there exists a production
A ::= s_{1} s_{2} ... s_{n}
in P
The labels of the leaves (fringe) represent the string generated
by t.
We will indicate it by string(t).
It is easy to see that a derivation tree represents a set of
derivations
(usually more than one) for the same string, and that for each derivation
there is a derivation tree for the same string. Hence L(G)
coincides with the set of strings generated by all possible derivation
trees for G. More formally, if
we denote by DT(G) the set of all derivation
trees for G, we have the following result:

Proposition

L(G) = { alpha in T^{*}  alpha = string(t) for some t in DT(G) }

Example

Let us consider again the language of numerical expressions, with productions

Exp ::= Num  Exp + Exp  Exp * Exp

We have that a possible derivation tree for the string
2 + 3 * 5 is the following:
Exp
/\
/  \
/  \
Exp + Exp
 /\
 /  \
 /  \
Num Exp * Exp
  
2 Num Num
 
3 5

This tree corresponds to several derivations for the same string, which
differ only for the choice of the nonterminal to expand at each derivation
step.
Ambiguity
The structure of an expression is usually essential to interpret its meaning.
The expression 2 + 3 * 5 for
example has two different values depending on its intended structure: If
we assume it to be 2 + (
3 * 5 ) (i.e.
3 and 5
grouped together by *) then the result
is 17. If, on the other hand, we assume
it to be ( 2
+ 3 ) * 5,
then the result is 25. In order to
avoid this kind of ambiguity, it is essential that the grammar
generates only one possible structure for each string in the language.
Since the structure is represented by the derivation tree, we have the
following definition:

Definition

A grammar G is ambiguous
if there exist a string in L(G) which
can be derived by two (or more) different derivation trees.

Example

The grammar in the example above is ambiguous,
in fact the string 2 + 3 * 5
can be generated also by the following tree:
Exp
/\
/  \
/  \
Exp * Num
/\ 
/  \ 5
/  \
Exp + Exp
 
Num Num
 
2 3
This tree corresponds to the grouping (
2 + 3 )
* 5, while the tree in the example above corresponds to 2
+ ( 3 * 5
).
There are languages which are intrinsically ambiguous,
i.e. it is not possible to eliminate their ambiguities without changing
the language.

Definition

A language L is intrinsically ambiguous
if can be generated only by ambiguous grammars, i.e. for every grammar
G such that L=L(G),
we have that G is ambiguous.
Luckily, languages which are interesting from the point of view of programming
usually are not intrinsically ambiguous, and therefore we can find nonambiguous
grammars which generates them. When a (nonintrinsically ambiguous) language
L is presented by an ambiguous grammar
G, "to eliminate the ambiguities of
G" means to find another grammar
G',
which is non ambiguous, and which generates the same language
L.
We will consider three common examples of ambiguities, and the way
to eliminate them:

Precedence

Associativity

Danglingelse
Precedence
In the examples above, the ambiguity in the interpretation of
2 + 3 * 5
can be eliminated by imposing the
precedence of one operator over the other.
We say that op
has precedence over op' if
an expression of the form
e_{1} op e_{2} op' e_{3}
(respectively e_{1} op' e_{2} op e_{3 })
is interpreted only as
(e_{1}
op e_{2})
op' e_{3} (respectively e_{1}
op' (e_{2}
op e_{3}) )
In other words, the
grouping power of op
is greater than the grouping power of op'.
From the point of view of derivation trees, the fact that
e_{1} op e_{2} op' e_{3}
is interpreted as (e_{1}
op e_{2})
op' e_{3}means that the introduction of
op must be done at a level strictly
lower than op', i.e. in a subtree
whose root is a child of op'.
In order to modify the grammar so that it generates only this kind of tree,
a possible solution is to introduce a new syntactic category producing
expressions of the form
e_{1} op e_{2},
and to force a hierarchical order w.r.t. to
the main category of expressions of the form e_{1}
op' e_{2}.

Example

We can eliminate the ambiguities from the grammar in the example
above by introducing a new syntactic category Term
producing expressions of the form

e_{1} * e_{2}

where e_{1} and
e_{2 }may contain *
again, but not +. This can be
done by organizing hierarchically the productions as follows:

Exp ::= Exp + Term  Term

Term ::= Term * Num  Num

This modification corresponds to assigning *
a higher priority w.r.t. + (following
the mathematical convention). Consider again the string 2
+ 3 * 5. It is easy to see that in the new grammar there
is only one tree which can generate it:
Exp
/\
/  \
/  \
Exp + Term
 /\
 /  \
 /  \
Term Term * Num
  
Num Num 5
 
2 3
Associativity
Consider again the grammar for numerical expressions in previous
example, and consider the grammar obtained by modifying the productions
for Exp in the following way:
Exp ::= Exp + Exp  Term
This new grammar is ambiguous. In fact, it allows two different derivation
trees for the string 2 + 3 + 5:
one corresponding to the structure
(2
+ 3) + 5
and one corresponding to the structure
2 +
(3 + 5).
In the case of the +
operator, this kind of ambiguity is not a problem, because of its algebraic
properties: + is
associative, i.e.
(2 + 3)
+ 5 and
2 + (3
+ 5) have the same value.
In general, however, an operator might not be associative. This
is for instance the case for the 
and ^ (exponentiation) operators:
(5  3)  2
and
5  (3
 2)
have different values,
as well as (5
^ 3) ^ 2
and
5 ^ (3
^ 2).
In order to eliminate this kind of ambiguity, we mush establish whether
the operator is leftassociative or
rightassociative.
Leftassociative means that e_{1} op e_{2}
op e_{3} is interpreted as
(e_{1}
op e_{2})
op e_{3} (op
associates to the left). Vice versa, rightassociative means that
it is interpreted as e_{1} op
(e_{2}
op e_{3}) (op
associates to the right).
We can impose leftassociativity (resp. rightassociativity) by using
the following technique: In the production introducing op,
we place the syntactic category producing op
to the left (resp. to the right) of op.
Note that in previous example this is done for
both + and
* : they are forced to be leftassociative.

Example

Consider the following grammar (productions) for numerical expressions
constructed with the  operation:

Exp ::= Num  Exp  Exp

This grammar is ambiguous since it allows both the interpretations
(5  3)
 2
and
5  (3
 2).
If we want to impose the
leftassociativity (following the mathematical convention), it is sufficient
to modify the productions in the following way:

Exp ::= Num  Exp  Num

Example

Consider the following grammar (productions) for numerical expressions
constructed with the ^ operation:

Exp ::= Num  Exp ^ Exp

This grammar is ambiguous since it allows both the interpretations
(5 ^ 3)
^ 2
and
5 ^ (3
^ 2).
If we want to impose the
rightassociativity (following the mathematical convention), it is sufficient
to modify the productions in the following way:

Exp ::= Num  Num ^ Exp