## CSE 428: Lecture 3

### Elimination of ambiguities from a grammar

We will consider three common examples of ambiguities, and we will see how to eliminate them:

### Precedence

In the previous example about arithmetic expressions, the ambiguity in the interpretation of 2 + 3 * 5 can be eliminated by imposing the precedence of one operator over the other. We say that op has precedence over op' if an expression of the form
e1 op e2 op' e3 (respectively e1 op' e2 op e3 )
is interpreted only as
(e1 op e2) op' e3 (respectively e1 op' (e2 op e3) )
In other words, op binds tighter than op'.

From the point of view of derivation trees, the fact that e1 op e2 op' e3  is interpreted as (e1 op e2) op' e3 means that the introduction of op must be done at a level strictly lower than op', i.e. in a sub-tree whose root is introduced by the same production which has introduced op'. In order to modify the grammar so that it generates only this kind of tree, a possible solution is to introduce a new syntactic category producing expressions of the form e1 op e2, and to force a hierarchical order wrt to the main category of expressions of the form e1 op' e2.

### Associativity

Previous grammar is unambiguous regarding the precedence of * wrt +, but still has ambiguities of another kind. In fact, it allows two different derivation trees for the string 2 + 3 + 5, one corresponding to the structure (2 + 3) + 5 and one corresponding to the structure 2 + (3 + 5). An analogous example can be shown with the operator *.

In the particular case of the + and the * operators, this kind of ambiguity does not cause problems semantically, because they are both associative, i.e. (2 + 3) + 5 and 2 + (3 + 5) have the same value. Analogously for *. In general, however, an operator might be not associative. This is for instance the case for the - and ^ (exponentiation) operators: (5 - 3) - 2 and 5 - (3 - 2) have different values, as well as (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2).

In order to eliminate this kind of ambiguity, we mush establish whether the operator is left-associative or right-associative. Left-associative means that e1 op e2 op e3 is interpreted as (e1 op e2) op e3 (op associates to the left). Vice versa, right-associative means that it is interpreted as e1 op (e2 op e3) (op associates to the right).

We can impose left-associativity (resp. right-associativity) by using a left-recursive (resp. right-recursive) production for op. For instance, in the example of arithmetic expressions, we can enforce left-associativity of + and * in the following way

Exp ::= Exp + Term | Term
Term ::= Term * Num | Num
This grammar is now unambiguous.

#### Example

Consider the following grammar (productions) for numerical expressions constructed with the - operation:

Exp ::= Num | Exp - Exp

This grammar is ambiguous since it allows both the interpretations (5 - 3) - 2 and 5 - (3 - 2). If we want to impose the left-associativity (following the mathematical convention), it is sufficient to modify the productions in the following way:

Exp ::= Num | Exp - Num

#### Example

Consider the following grammar (productions) for numerical expressions constructed with the ^ operation:

Exp ::= Num | Exp ^ Exp

This grammar is ambiguous since it allows both the interpretations (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2). If we want to impose the right-associativity (following the mathematical convention), it is sufficient to modify the productions in the following way:

Exp ::= Num | Num ^ Exp

### Dangling-else

Imperative languages often allow two kinds of conditional statements: the if-then and the if-then-else. Let us consider a possible grammar generating these statements:
Stm ::= if Exp then Stm | if Exp then Stm else Stm | ... (other stms)
This grammar is ambiguous, in fact, the statement
if x > 0 then if x = 1 then print(1) else print(2)
can be interpreted both as
if x > 0 then ( if x = 1 then print(1) else print(2) )
and as
if x > 0 then ( if x = 1 then print(1) ) else print(2)
This ambiguity is clearly relevant for the semantics: if the value of x is 2 for example, in the first case the machine should print 2, in the second case should do nothing.

This ambiguity originates whenever a statement contains an unbalanced number of then and else (i.e. more then than else). In order to eliminate it, we must establish a rule which determines, for each else, its matching then. Usually the convention is the following:

Each else matches the last (from left-to-right) unmatched then
In order to impose this rule, one possibility is to modify the productions in the following way:
Stm ::= Bal_Stm | Unbal_Stm
Bal_Stm ::= if Exp then Bal_Stm else Bal_Stm | ... (other stms)
Unbal_Stm ::= if Exp then Stm | if Exp then Bal_Stm else Unbal_Stm
In this new grammar, the statement above can only have the first kind of structure.