Our running example will be a grammar for generating the language of the strings with the same number of a's and b's, namely the language

L = { x in {a,b}We want to define a non-ambiguous grammar G for such a language, and prove formally that indeed L(G) = L and that G is non-ambiguous.^{*}| #_{a}(x) = #_{b}(x) }

Note that the grammar

S -> lambda | a S b | b S a | S Sgenerates (intuitively) the given language, but it is ambiguous. For instance, the string abab has two different derivation trees:

S S / | \ / \ a S b S S / | \ / | \ / | \ b S a a S b a S b | | | lambda lambda lambda

Consider now the grammar

S -> lambda | a S b S | b S a SAlso this grammar generates the given langauge, but it is still ambiguous. The string abab has still two different derivation trees:

S S // | \ // | \ a S b S a S b S | // | \ //| \ | lambda a S b S b S a S lambda | | | | lambda lambda lambda lambda

Intuitively, the reson why the second grammar is ambiguous is because in the production S -> a S b S we do not enforce the b to be the "matching b" for the a. An analogous problem is related to the production S -> b S a S. Intuitively, we could eliminate the ambiguity by forging the first S in a S b S to generate "the shortest string" with the same number of a's and b's. Analogously for b S a S. This can be done by introducing new syntactic categories (non-terminal symbols) T and U, and by stratifying the productions as follows:

S -> lambda | a T b S | b U a SWe will call G the above grammar consisting of all the productions for S, T and U. We will also call G

T -> lambda | a T b T

U -> lambda | b U a U

T -> lambda | a T b Tand we will call G

U -> lambda | b U a UIntuitively, G

ab , abab , aabb , aabbab , ... etc.This language can be formally defined as the language

LNote that if we replaced "a" by "(" and "b" by ")", then G_{1}= { x in {a,b}^{*}| #_{a}(x) = #_{b}(x) and for all x_{1}, x_{2}s.t. x = x_{1}b x_{2}, #_{a}(x_{1}) >= 1 + #_{b}(x_{1}) }

( ) , ( )( ) , (( )) , (( ))( ) , ... etc.The grammar G

L_{2}= { x in {a,b}^{*}| #_{a}(x) = #_{b}(x) and for all x_{1}, x_{2}s.t. x = x_{1}a x_{2}, #_{b}(x_{1}) >= 1 + #_{a}(x_{1}) }

We are now going to prove by mathematical induction and by structural induction that G generates the required language, namely

**Proposition 1**
L(G) = L

**Proposition 2**
For all x in L(G), there exists only one derivation tree in G for x.

In order to prove Propositions 1 and 2,
we need to prove analogous properties for the subgrammars G_{1}
and G_{2}. Specifically, we need to prove the following lemmata:

**Lemma 1**
L(G_{1}) = L_{1}

**Lemma 2**
For all x in L(G_{1}), there exists only one derivation tree in G_{1} for x.

**Lemma 3**
L(G_{2}) = L_{2}

**Lemma 4**
For all x in L(G_{2}), there exists only one derivation tree in G_{2} for x.

**Part 1:**L(G_{1}) is contained in L_{1}- We need to prove that, for every x in L(G
_{1}), x enjoys the following properties:- #
_{a}(x) = #_{b}(x) - for all x
_{1}, x_{2}s.t. x = x_{1}b x_{2}, #_{a}(x_{1}) >= 1 + #_{b}(x_{1})

_{1}) can be seen as defined inductively).- base case: x = lambda. We have:
- #
_{a}(lambda) = 0 = #_{b}(lambda) - there exists no x
_{1}, x_{2}s.t. x = x_{1}a x_{2}, hence this point is trivially satisfied

- #
- inductive step: x = a y b z where y and z are in L(G
_{1}) as well. We have:- #
_{a}(x) = #_{a}(a y b z) = 1 + #_{a}(y) + #_{a}(z)

#_{b}(x) = #_{b}(a y b z) = #_{b}(y) + 1 + #_{b}(z).

By inductive hypothesis #_{a}(y) = #_{b}(y) and #_{a}(z) = #_{b}(z), hence #_{a}(x) = #_{b}(x). - Let x = x
_{1}b x_{2}. We have three cases:- x
_{1}= a y. In this case we have:

#_{a}(x_{1}) = 1 + #_{a}(y) = (by inductive hypothesis) 1 + #_{b}(y) = 1 + #_{b}(x_{1}) - x
_{1}= a y_{1}where y_{1}is a prefix of y, namely y = y_{1}b y_{2}. In this case we have:

#_{a}(x_{1}) = 1 + #_{a}(y_{1}) >= (by inductive hypothesis) 1 + 1 + #_{b}(y_{1}) = 2 + #_{b}(x_{1}) > 1 + #_{b}(x_{1}) - x
_{1}= a y b z_{1}where z_{1}is a prefix of z, namely z = z_{1}b z_{2}. In this case we have:

#_{a}(x_{1}) = 1 + #_{a}(y) + #_{a}(z_{1}) >= (by inductive hypothesis) 1 + #_{b}(y) + 1 + #_{b}(z_{1}) = 1 + #_{b}(x_{1})

- x

- #

- #
**Part 2:**L_{1}is contained in L(G_{1})- We are going to prove this part by strong mathematical
induction on the length of the strings. Remember that the principle
of strong mathematical induction has the following schema:
If for all n and for all k < n P(k) implies P(n), then we can deduce that for all n, P(n) holds.

In practice, the principle of strong mathematical induction allows us to use the inductive hypothesis on all k < n, instead than just on n-1.Let x be a string in L

_{1}.- if |x| = 0, then x = lambda and therefore x can be generated by using the production T -> lambda.
- if |x| > 0, then x must start with "a" (by definition of
L
_{1}), and must contain a matching "b" somewhere. Let us consider the shortest y such that x = a y b z for some z and #_{a}(y) = #_{b}(y). By a case analysis similar to the one in previous part, we can show that y and z are also in L_{1}(for proving that for all y_{1}, y_{2}s.t. y = y_{1}b y_{2}, #_{a}(y_{1}) >= 1 + #_{b}(y_{1}) holds, we must use the fact that y is the shortest string satisfying the properties x = a y b z and #_{a}(y) = #_{b}(y)).

By inductive hypothesis we have that y and z are in L(G_{1}), namely T ->^{*}y and T ->^{*}z.

Hence we can obtain a derivationT -> a T b T ->

which shows that the string a y b z, namely x, is in L(G^{*}a y b T ->^{*}a y b z_{1})

- base case: x = lambda. In this case, there can be only one derivation tree for x: the one obtained by applying the production T -> lambda.
- inductive step: x = a y b z, with y, z in L(G
_{1}). Observe that y and z are uniquely defined by these properties. Namely, if x = a v b w, with v, w in L(G_{1}), then y = v and z = w.

In fact, assume by contradition that y and v are different, and consider the case that y is a prefix of v. But then we must have v = y b v' for some v'. By definition of L_{1}, #_{a}(y) >= 1 + #_{b}(y) holds, which implies that y is not in L_{1}. Contradiction.

The case in which v is a prefix of y is analogous.

Since y and v are equal, also z and w must be equal.

Now, by inductive hypothesis, y and z must have each a unique derivation tree. Let us call these trees ty and tz respectively. We have that x admits only one derivation tree, which is the following one:T // | \ a T b T | | ty tz

**Part 1:**L(G) is contained in L- We need to prove that, for every x in L(G), #
_{a}(x) = #_{b}(x). We are going to prove this property by structural induction.- base case: x = lambda. We have: #
_{a}(lambda) = 0 = #_{b}(lambda) - inductive step, case 1: x = a y b z where y is in L(G
_{1}) and z is in L(G). We have:

#_{a}(x) = #_{a}(a y b z) = 1 + #_{a}(y) + #_{a}(z)

#_{b}(x) = #_{b}(a y b z) = #_{b}(y) + 1 + #_{b}(z).

By Lemma 1 #_{a}(y) = #_{b}(y) and by inductive hipothesis #_{a}(z) = #_{b}(z), hence #_{a}(x) = #_{b}(x). - inductive step, case 2: x = b y a z where y is in L(G
_{2}) and z is in L(G). We have:

#_{a}(x) = #_{a}(b y a z) = #_{a}(y) + 1 + #_{a}(z)

#_{b}(x) = #_{b}(b y a z) = 1 + #_{b}(y) + #_{b}(z).

By Lemma 3 #_{a}(y) = #_{b}(y) and by inductive hipothesis #_{a}(z) = #_{b}(z), hence #_{a}(x) = #_{b}(x).

- base case: x = lambda. We have: #
**Part 2:**L is contained in L(G)- We are going to prove this part by strong mathematical
induction on the length of the strings. Let x be a string in L
_{1}.- if |x| = 0, then x = lambda and therefore x can be generated by using the production S -> lambda.
- if |x| > 0, and x starts with "a": Since
#
_{a}(x) = #_{b}(x), x must contain a matching "b" somewhere. Let us consider the shortest y such that x = a y b z for some z and #_{a}(y) = #_{b}(y). By a case analysis similar to the one in the proof of Lemma 1, we can show that y is in L_{1}and that z is in L. (for proving that for all y_{1}, y_{2}s.t. y = y_{1}b y_{2}, #_{a}(y_{1}) >= 1 + #_{b}(y_{1}) holds, we must use the fact that y is the shortest string satisfying the properties x = a y b z and #_{a}(y) = #_{b}(y)).

By Lemma 1 we have that y is in L(G_{1}), namely T ->^{*}y.

By inductive hypothesis we have that z is in L(G), namely S ->^{*}z.

Hence we can obtain a derivationS -> a T b S ->

which shows that the string a y b z, namely x, is in L(G).^{*}a y b S ->^{*}a y b z - if |x| > 0, and x starts with "b": the proof is analogous to previous case.

- base case: x = lambda. In this case, there can be only one derivation tree for x: the one obtained by applying the production S -> lambda.
- inductive step: x = a y b z, with y in L(G
_{1}) and z in L(G). Observe that y and z are uniquely defined by these properties. Namely, if x = a v b w, with v in L(G_{1}) and w in L(G), then y = v and z = w.

In fact, assume by contradition that y and v are different, and consider the case that y is a prefix of v. But then we must have v = y b v' for some v'. By definition of L_{1}, #_{a}(y) >= 1 + #_{b}(y) holds, which implies that y is not in L_{1}. Contradiction.

The case in which v is a prefix of y is analogous.

Since y and v are equal, also z and w must be equal.

Now, by inductive hypothesis, y and z must have each a unique derivation tree, with roots T and S respectively. Let us call these trees ty and tz respectively. We have that x admits only one derivation tree, which is the following one:S // | \ a T b S | | ty tz

- inductive step: x = b y a z, with y in L(G
_{2}) and z in L(G). The proof is analogous to previous case.