CSE 428: Lecture notes

CSE 428: Lecture 3

Elimination of ambiguities from a grammar

We will consider three common examples of ambiguities, and we will see how to eliminate them:

Precedence
Associativity
Dangling-else

Precedence

In the previous example about arithmetic expressions, the ambiguity in the interpretation of 2 + 3 * 5 can be eliminated by imposing the precedence of one operator over the other. We say that op has precedence over op' if an expression of the form

e₁ op e₂ op' e₃ (respectively e₁ op' e₂ op e₃)

is interpreted only as

(e₁ op e₂) op' e₃ (respectively e₁ op' (e₂ op e₃) )

In other words, op binds tighter than op'.

From the point of view of derivation trees, the fact that e₁ op e₂ op' e₃ is interpreted as (e₁ op e₂) op' e₃ means that the introduction of op must be done at a level strictly lower than op', i.e. in a sub-tree whose root is introduced by the same production which has introduced op'. In order to modify the grammar so that it generates only this kind of tree, a possible solution is to introduce a new syntactic category producing expressions of the form e₁ op e₂, and to force a hierarchical order wrt to the main category of expressions of the form e₁ op' e₂.

Example

We can eliminate the ambiguities from the grammar in the example of the arithmetic expressions by introducing a new syntactic category Term producing expressions of the form

e₁ * e₂

where e₁ and e₂may contain * again, but not +. This can be done by organizing hierarchically the productions as follows:

Exp ::= Exp + Exp | Term
Term ::= Term * Term | Num

This modification corresponds to assigning * a higher priority wrt + (following the mathematical convention). Consider again the string 2 + 3 * 5. It is easy to see that in the new grammar there is only one tree which can generate it:

         Exp 
         /|\ 
        / | \ 
       /  |  \ 
     Exp  +  Exp  
      |       | 
    Term     Term  
      |      /|\    
      |     / | \
      |    /  |  \
     Num Term * Term         
      |   |       |  
      2  Num     Num
          |       |
          3       5

Associativity

Previous grammar is unambiguous regarding the precedence of * wrt +, but still has ambiguities of another kind. In fact, it allows two different derivation trees for the string 2 + 3 + 5, one corresponding to the structure (2 + 3) + 5 and one corresponding to the structure 2 + (3 + 5). An analogous example can be shown with the operator *.

In the particular case of the + and the * operators, this kind of ambiguity does not cause problems semantically, because they are both associative, i.e. (2 + 3) + 5 and 2 + (3 + 5) have the same value. Analogously for *. In general, however, an operator might be not associative. This is for instance the case for the - and ^ (exponentiation) operators: (5 - 3) - 2 and 5 - (3 - 2) have different values, as well as (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2).

In order to eliminate this kind of ambiguity, we mush establish whether the operator is left-associative or right-associative. Left-associative means that e₁ op e₂ op e₃ is interpreted as (e₁ op e₂) op e₃ (op associates to the left). Vice versa, right-associative means that it is interpreted as e₁ op (e₂ op e₃) (op associates to the right).

We can impose left-associativity (resp. right-associativity) by using a left-recursive (resp. right-recursive) production for op. For instance, in the example of arithmetic expressions, we can enforce left-associativity of + and * in the following way

Exp ::= Exp + Term | Term
Term ::= Term * Num | Num

This grammar is now unambiguous.

Example

Consider the following grammar (productions) for numerical expressions constructed with the - operation:

Exp ::= Num | Exp - Exp

This grammar is ambiguous since it allows both the interpretations (5 - 3) - 2 and 5 - (3 - 2). If we want to impose the left-associativity (following the mathematical convention), it is sufficient to modify the productions in the following way:

Exp ::= Num | Exp - Num

Example

Consider the following grammar (productions) for numerical expressions constructed with the ^ operation:

Exp ::= Num | Exp ^ Exp

This grammar is ambiguous since it allows both the interpretations (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2). If we want to impose the right-associativity (following the mathematical convention), it is sufficient to modify the productions in the following way:

Exp ::= Num | Num ^ Exp

Dangling-else

Imperative languages often allow two kinds of conditional statements: the if-then and the if-then-else. Let us consider a possible grammar generating these statements:

Stm ::= if Exp then Stm | if Exp then Stm else Stm | ... (other stms)

This grammar is ambiguous, in fact, the statement

if x > 0 then if x = 1 then print(1) else print(2)

can be interpreted both as

if x > 0 then ( if x = 1 then print(1) else print(2) )

and as

if x > 0 then ( if x = 1 then print(1) ) else print(2)

This ambiguity is clearly relevant for the semantics: if the value of x is 2 for example, in the first case the machine should print 2, in the second case should do nothing.

This ambiguity originates whenever a statement contains an unbalanced number of then and else (i.e. more then than else). In order to eliminate it, we must establish a rule which determines, for each else, its matching then. Usually the convention is the following:

Each else matches the last (from left-to-right) unmatched then

In order to impose this rule, one possibility is to modify the productions in the following way:

Stm ::= Bal_Stm | Unbal_Stm
Bal_Stm ::= if Exp then Bal_Stm else Bal_Stm | ... (other stms)
Unbal_Stm ::= if Exp then Stm | if Exp then Bal_Stm else Unbal_Stm

In this new grammar, the statement above can only have the first kind of structure.