CSE 428: Lecture notes 2  

Derivation Tree

Derivation trees (also called "parse trees" in Sethi's book) are a way to
represent the generation of strings in a grammar. They also give
information about the structure of the strings, i.e. the way they are
organized in syntactical categories. 

Definition 
      Given a grammar  G = < T , N , s , P > , a derivation tree t for G is
      a tree such that: 
            the root is labeled by s 
            the leaves are labeled by terminal symbols 
            each intermediate node is  labeled by a non-terminal symbol,
            and, if its label is A, then its children are labeled by symbols
            s_1 , s_2 , ... , s_n  such that there exists a production 
            A ::= s_1 s_2 ... s_n in P 
      The labels of the leaves (fringe) represent the string generated by t.
      We will indicate it by string(t).

It is easy to see that a derivation tree represents a set of derivations
(usually
more than one) for the same string, and that for each derivation there is a
derivation tree for the same string. Hence L(G) coincides with the set of
strings generated by all possible derivation trees for G. More formally, if
we denote by DT(G) the set of all derivation trees for G, we have the
following result: 

Proposition 
      L(G) = { alpha in T* | alpha = string(t) for some t in DT(G) } 

Example 
      Let us consider again the language of numerical expressions, with
      productions 

            Exp ::= Num | Exp + Exp | Exp * Exp 

      We have that a possible derivation tree for the string 2 + 3 * 5
      is the following: 


                Exp 
                /|\ 
               / | \ 
              /  |  \ 
            Exp  +  Exp 
             |      /|\ 
             |     / | \ 
             |    /  |  \ 
            Num Exp  *  Exp 
             |   |       | 
             2  Num     Num 
                 |       | 
                 3       5 


      This tree corresponds to several derivations for the same string,
      which differ only for the choice of the non-terminal to expand at
      each derivation step. 

Ambiguity

The structure of an expression is usually essential to interpret its meaning.
The expression 2 + 3 * 5 for example has two different values depending
on its intended structure: If we assume it to be 2 + ( 3 * 5 ) (i.e. 3 and 5
grouped together by *) then the result is 17. If, on the other hand, we
assume it to be ( 2 + 3 ) * 5, then the result is 25. In order to avoid this
kind of ambiguity, it is essential that the grammar generates only one
possible structure for each string in the language. Since the structure is
represented by the derivation tree, we have the following definition: 

Definition 
      A grammar G is ambiguous if there exist a string in L(G) which can
      be derived by two (or more) different derivation trees. 
Example 
      The grammar in the example above  is ambiguous, in fact the string
      2 + 3 * 5 can be generated also by the following tree: 


                   Exp 
                   /|\ 
                  / | \ 
                 /  |  \ 
               Exp  *  Num 
               /|\      | 
              / | \     5 
             /  |  \ 
           Exp  +  Exp 
            |       | 
           Num     Num 
            |       | 
            2       3 

      This tree corresponds to the grouping ( 2 + 3 ) * 5, while the tree in
      the example above corresponds to 2 + ( 3 * 5 ). 

There are languages which are intrinsically ambiguous, i.e. it is not possible
to eliminate their ambiguities without changing the language. 

Definition 
      A language L is intrinsically ambiguous if can be generated only by
      ambiguous grammars, i.e. for every grammar G such that L=L(G),
      we have that G is ambiguous. 

Luckily, languages which are interesting from the point of view of
programming usually are not intrinsically ambiguous, and therefore we can
find non-ambiguous grammars which generates them. When a
(non-intrinsically ambiguous) language L is presented by an ambiguous
grammar G, "to eliminate the ambiguities of G" means to find another
grammar G', which is non ambiguous, and which generates the same
language L. 

We will consider three common examples of ambiguities, and the way to
eliminate them: 

   1.Precedence 
   2.Associativity 
   3.Dangling-else 

Precedence

In the examples above, the ambiguity in the interpretation of 2 + 3 *
5 can be eliminated by imposing the precedence of one operator over the
other. We say that op has precedence over op' if an expression of the form 

      e_1 op e_2 op' e_3 (respectively e_1 op' e_2 op e_3 ) 

is interpreted only as 

      (e_1 op e_2) op' e_3 (respectively e_1 op' (e_2 op e_3) ) 

In other words, the grouping power of op is greater than the grouping
power of op'. 

>From the point of view of derivation trees, the fact that e_1 op e_2 op' e_3  is
interpreted as (e_1 op e_2) op' e_3 means that the introduction of op must be
done at a level strictly lower than op', i.e. in a sub-tree whose root is a
child of op'. In order to modify the grammar so that it generates only this
kind of tree, a possible solution is to introduce a new syntactic category
producing expressions of the form e_1 op e_2, and to force a hierarchical
order w.r.t. to the main category of expressions of the form e_1 op' e_2. 

Example 
      We can eliminate the ambiguities from the grammar in the example
      above by introducing a new syntactic category Term producing
      expressions of the form 

            e_1 * e_2 

      where e1 and e2 may contain * again, but not +. This can be done by
      organizing hierarchically the productions as follows: 

            Exp ::= Exp + Term | Term 
            Term ::= Term * Num | Num 

      This modification corresponds to assigning * a higher priority w.r.t.
      + (following the mathematical convention). Consider again the
      string 2 + 3 * 5. It is easy to see that in the new grammar there is
      only one tree which can generate it: 

               Exp 
               /|\ 
              / | \ 
             /  |  \ 
           Exp  +  Term 
            |      /|\ 
            |     / | \ 
            |    /  |  \ 
          Term Term *  Num 
            |    |      | 
           Num  Num     5 
            |    | 
            2    3 

Associativity

Consider again the grammar for numerical expressions in previous example,
and consider the grammar obtained by modifying the productions for  Exp
in the following way: 

      Exp ::= Exp + Exp | Term

This new grammar is ambiguous. In fact, it allows two different derivation
trees for the string 2 + 3 + 5: one corresponding to the structure (2 + 3) + 5
and one corresponding to the structure 2 + (3 + 5). 

In the case of the + operator, this kind of ambiguity is not a problem,
because of its algebraic properties: + is associative, i.e. (2 + 3) + 5 and 2 +
(3 + 5) have the same value. 

In general, however, an operator might not be associative. This is for
instance the case for the - and ^ (exponentiation) operators: (5 - 3) - 2
and 5 - (3 - 2) have different values, as well as (5 ^ 3) ^ 2 and 5 ^ (3 ^
2). 

In order to eliminate this kind of ambiguity, we mush establish whether the
operator is left-associative or right-associative. Left-associative means that
e_1 op e_2 op e_3 is interpreted as (e_1 op e_2) op e_3 (op associates to the
left).
Vice versa, right-associative means that it is interpreted as e_1 op (e_2 op
e_3)
(op associates to the right). 

We can impose left-associativity (resp. right-associativity) by using the
following technique: In the production introducing op, we place the
syntactic category producing op to the left (resp. to the right) of op. Note
that in previous example this is done for both + and * : they are forced to
be left-associative. 

Example 
      Consider the following grammar (productions) for numerical
      expressions constructed with the - operation: 

            Exp ::= Num | Exp - Exp 

      This grammar is ambiguous since it allows both the interpretations
      (5 - 3) - 2 and 5 - (3 - 2). If we want to impose the
      left-associativity (following the mathematical convention), it is
      sufficient to modify the productions in the following way: 

            Exp ::= Num | Exp - Num 

Example 
      Consider the following grammar (productions) for numerical
      expressions constructed with the ^ operation: 

            Exp ::= Num | Exp ^ Exp 

      This grammar is ambiguous since it allows both the interpretations
      (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2). If we want to impose the
      right-associativity (following the mathematical convention), it is
      sufficient to modify the productions in the following way: 

            Exp ::= Num | Num ^ Exp