CSE 428: Lecture notes 2

Derivation Tree

Derivation trees (also called "parse trees" in Sethi's book) are a way to represent the generation of strings in a grammar. They also give information about the structure of the strings, i.e. the way they are organized in syntactical categories.

Definition: Given a grammar G = < T , N , s , P > , a derivation tree t for G is a tree such that:

It is easy to see that a derivation tree represents a set of derivations (usually more than one) for the same string, and that for each derivation there is a derivation tree for the same string. Hence L(G) coincides with the set of strings generated by all possible derivation trees for G. More formally, if we denote by DT(G) the set of all derivation trees for G, we have the following result:

Proposition: L(G) = { alpha in T^* | alpha = string(t) for some t in DT(G) }

Example: Let us consider again the language of numerical expressions, with productions; Exp ::= Num | Exp + Exp | Exp * Exp; We have that a possible derivation tree for the string 2 + 3 * 5 is the following:; This tree corresponds to several derivations for the same string, which differ only for the choice of the non-terminal to expand at each derivation step.

Ambiguity

The structure of an expression is usually essential to interpret its meaning. The expression 2 + 3 * 5 for example has two different values depending on its intended structure: If we assume it to be 2 + ( 3 * 5 ) (i.e. 3 and 5 grouped together by *) then the result is 17. If, on the other hand, we assume it to be ( 2 + 3 ) * 5, then the result is 25. In order to avoid this kind of ambiguity, it is essential that the grammar generates only one possible structure for each string in the language. Since the structure is represented by the derivation tree, we have the following definition:

Definition: A grammar G is ambiguous if there exist a string in L(G) which can be derived by two (or more) different derivation trees.
Example: The grammar in the example above is ambiguous, in fact the string 2 + 3 * 5 can be generated also by the following tree:

There are languages which are intrinsically ambiguous, i.e. it is not possible to eliminate their ambiguities without changing the language.

Definition: A language L is intrinsically ambiguous if can be generated only by ambiguous grammars, i.e. for every grammar G such that L=L(G), we have that G is ambiguous.

Luckily, languages which are interesting from the point of view of programming usually are not intrinsically ambiguous, and therefore we can find non-ambiguous grammars which generates them. When a (non-intrinsically ambiguous) language L is presented by an ambiguous grammar G, "to eliminate the ambiguities of G" means to find another grammar G', which is non ambiguous, and which generates the same language L.

We will consider three common examples of ambiguities, and the way to eliminate them:

Precedence
Associativity
Dangling-else

Precedence

In the examples above, the ambiguity in the interpretation of 2 + 3 * 5 can be eliminated by imposing the precedence of one operator over the other. We say that op has precedence over op' if an expression of the form

e₁ op e₂ op' e₃ (respectively e₁ op' e₂ op e₃)

is interpreted only as

(e₁ op e₂) op' e₃ (respectively e₁ op' (e₂ op e₃) )

In other words, the grouping power of op is greater than the grouping power of op'.

From the point of view of derivation trees, the fact that e₁ op e₂ op' e₃ is interpreted as (e₁ op e₂) op' e₃means that the introduction of op must be done at a level strictly lower than op', i.e. in a sub-tree whose root is a child of op'. In order to modify the grammar so that it generates only this kind of tree, a possible solution is to introduce a new syntactic category producing expressions of the form e₁ op e₂, and to force a hierarchical order w.r.t. to the main category of expressions of the form e₁ op' e₂.

Example: We can eliminate the ambiguities from the grammar in the example above by introducing a new syntactic category Term producing expressions of the form; e₁ * e₂; where e₁ and e₂may contain * again, but not +. This can be done by organizing hierarchically the productions as follows:; Exp ::= Exp + Term | Term; Term ::= Term * Num | Num; This modification corresponds to assigning * a higher priority w.r.t. + (following the mathematical convention). Consider again the string 2 + 3 * 5. It is easy to see that in the new grammar there is only one tree which can generate it:

Associativity

Consider again the grammar for numerical expressions in previous example, and consider the grammar obtained by modifying the productions for Exp in the following way:

Exp ::= Exp + Exp | Term

This new grammar is ambiguous. In fact, it allows two different derivation trees for the string 2 + 3 + 5: one corresponding to the structure (2 + 3) + 5 and one corresponding to the structure 2 + (3 + 5).

In the case of the + operator, this kind of ambiguity is not a problem, because of its algebraic properties: + is associative, i.e. (2 + 3) + 5 and 2 + (3 + 5) have the same value.

In general, however, an operator might not be associative. This is for instance the case for the - and ^ (exponentiation) operators: (5 - 3) - 2 and 5 - (3 - 2) have different values, as well as (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2).

In order to eliminate this kind of ambiguity, we mush establish whether the operator is left-associative or right-associative. Left-associative means that e₁ op e₂ op e₃ is interpreted as (e₁ op e₂) op e₃ (op associates to the left). Vice versa, right-associative means that it is interpreted as e₁ op (e₂ op e₃) (op associates to the right).

We can impose left-associativity (resp. right-associativity) by using the following technique: In the production introducing op, we place the syntactic category producing op to the left (resp. to the right) of op. Note that in previous example this is done for both + and * : they are forced to be left-associative.

Example: Consider the following grammar (productions) for numerical expressions constructed with the - operation:; Exp ::= Num | Exp - Exp; This grammar is ambiguous since it allows both the interpretations (5 - 3) - 2 and 5 - (3 - 2). If we want to impose the left-associativity (following the mathematical convention), it is sufficient to modify the productions in the following way:; Exp ::= Num | Exp - Num

Example: Consider the following grammar (productions) for numerical expressions constructed with the ^ operation:; Exp ::= Num | Exp ^ Exp; This grammar is ambiguous since it allows both the interpretations (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2). If we want to impose the right-associativity (following the mathematical convention), it is sufficient to modify the productions in the following way:; Exp ::= Num | Num ^ Exp