CSE 428: Exercises on grammars


Notes:

The superscript "(d)" stands for "difficult". Exercises similar to those marked with "(d)" might appear in candidacy exams, but not in the standard exams of CSE 428.

The superscript "(s)" stands for "solution provided". You can find the solution of these exercises here.


  1. (s) Give unambiguous context-free grammars for each of the following languages:
    1. The language of identifiers, whose elements are sequences of letters or digits, starting with a letter
    2. The language of natural numbers, whose elements are sequences of digits, starting with a digit different from 0
    3. The language of real numbers, in which either the integer part or the fractional part can be empty, but not both. Thus the grammar must allow 12., 1.2, .12, but not a decimal point by itself.
  2. (s) The set of formulas (First-Order Logic Formulas) can be defined inductively as follows:
    1. An atomic formula is a formula
    2. if "A" and "B" are formulas, then also "not A", "A and B", "A or B", and "A implies B" are formulas
    3. If "A" is a formula and "x" is an identifier, then also "exists x. A" and "forall x. A" are formulas
    We want the following properties to hold:
    1. The precedence order is as follows (from highest to lowest priority): (1) not, (2) and, (3) or, (4) implies, (5) exists, forall.
    2. "and" and "or" are left-associative, "implies" is right-associative.
    3. Formulas can be precedeed by an arbitrary number of negations (i.e. "not not A", "not not not A", etc. are allowed)

    Give an unambiguous context-free grammar generating the language of formulas and respecting these properties. You may assume given productions for generating identifiers and atomic formulas starting from the non-terminals Ide and Atom respectively.

  3. (s) Consider the following productions for commands
       Cmd ::= print(Exp) | Cmd ; Cmd | if Exp then Cmd else Cmd
    
    Show that this grammar is ambiguous and explain all possible causes of ambiguity, distinguishing between those which are not relevant at the semantical level and those which are relevant. Note: the sequentialization operator ";" is associative.
  4. (s) Exercise 2.6 in Sethi's book. Note: by "parse tree" Sethi means "derivation tree"
  5. Consider the following grammar defining the language of regular expression on the alphabet {a,b,c}
       E ::= \ | 0 | a | b | c | E + E | EE | E* | (E)
    
    Where "\", "0", "a", "b", "c", "+", "*", "(" and ")" are the terminal symbols. (Remark: "\" represent the regular expression "lambda" and "0" represents the regular expression "emptyset". Here in this grammar they are just terminals, they should not be confused with the production of an empty string.)
    1. Show that the grammar is ambiguous.
    2. Give an unambiguous grammar which generates exactly the same set of strings as this one.
    3. State what associativity and precedence rules your grammar enforces.
  6. Consider the following grammar generating the language of the untyped lambda-calculus:
       M ::= Var | M M | \Var. M  | (M)
    
    in which "\", ".", "(", and ")" are terminal symbols and Var is a nonterminal which unambiguously generates identifiers. The productions for Var are not represented here.
    1. Show that the grammar is ambiguous, by giving two distinct parse trees for a single string.
    2. Give an unambiguous grammar which generates exactly the same set of strings as the one above.
  7. The following grammar generates the language of Types of mini ML:
  8.    Type ::= TVar | Type -> Type | Type * Type
    
    Note: Type and TVar are non-terminals. "->" and "*" are terminals. TVar is assumed to generate a set of names (the Type variables) "alpha", "beta", "gamma", etc. The productions for TVar are not represented here.

    Show that this grammar is ambiguous by giving two derivation trees for each of the following type expressions:

       alpha -> beta -> gamma
       alpha -> beta * gamma
       alpha * beta * gamma
    

  9. (s) Modify the grammar of previous exercise so to obtain an unambiguous grammar, generating the same language, by enforcing the following properties:
    1. -> is right associative
    2. * has higher priority than ->
    3. * is left associative

  10. Consider the following grammar, generating a subset of the language of types of a generic imperative language:
       Type ::= int | bool | ^Type | Type * Type | (Type)
    
    where "int", "bool", "^", "(" and ")" are terminal symbols.
    1. Show that the grammar is ambiguous.
    2. Give an unambiguous grammar which generates exactly the same set of strings as this one.
    3. State what associativity and precedence rules your grammar enforces.
  11. (s) Exercise 2.15 in Sethi's book. Remember that by "parse tree" Sethi means "derivation tree".
    Note1: in this exercise, the tokens are the words in boldface and the symbols between quotes. The words in italics are the syntactical categories.
    Note 2: In Question (c), the sentence "Suppose that the first production for Declarator is changed so to make * a postfix operator" means: "suppose that the first production for Declarator is changed into the following production:
    Declarator ::= Declarator *
  12. Consider the following grammar, which generates all the non-empty strings of balanced parentheses:
       P ::= () | (P) | PP
    
    1. Prove that this grammar is ambiguous.
    2. Modify the grammar so to obtain a non-ambiguous grammar generating the same language.
    3. (d) Show that the new grammar is not ambiguous.
  13. (s) Define a grammar which generates all palindromes over the English alphabet. Note: a palindrome is any string which reads the same backward as forward. For example, the empty string, "a", "aa", "aba", and "abba" are palindromes, whereas "ab", "abb", and "abc" are not.
  14. Define a grammar which generates all the strings over the alphabet {a,b} which have the same number of a's and b's. For instance, the empty string, "ab", "ba", "aabb" and "abab" belong to the language, while "a", "bba", and "aaa" don't.
  15. Define a grammar over the alphabet {a,b,c} generating all the strings of the form akbmcn with n = m + k. For instance, the empty string, "ac", "bc", "abcc" and "aabccc" belong to the language, while "a", "bacc", and "aac" don't.