Spring 2000, CSE 428

Evaluation of expressions

We illustrate an example of interpreter for a simple language: the language of expressions seen in previous lectures, enriched with identifiers and declarations.

The language

The language is specified by the following grammar:

   Exp ::=  Num | Ide | Exp Op Exp | let Ide = Num in Exp end
   Op  ::=  + | * | - | /

Num generates the natural numbers, that can be represented as sequences of digits. The first digit cannot be 0, except for the number 0 itself. A possible grammar for Num is the following:

   Num ::= 0 | Non_Zero_Digit Seq_Digit
   Non_Zero_Digit ::= 1 | 2 | 3 | ... | 9
   Digit ::= 0 | Non_Zero_Digit
   Seq_Digit ::= lambda | Digit Seq_Digit

Ide generates the identifiers, which can be choosen to simply be sequences of letters. A possible grammar for Ide is therefore:

   Ide    ::=  Letter | Letter Ide
   Letter ::=  a | b | c | ... | z

Note that the grammar is ambiguous, but we will not worry about that: the input of the interpreter will be the parse tree, hence there cannot be ambiguities about the structure. A grammar which is used to illustrate the parse trees directly is called "abstract syntax".

We will also not worry about the static correctness of our expressions. We will assume that the static correctness has been already checked wen the parse tree is given in input to the interpreter. Remember the scheme of an interpretation-based implementation:

           _________      ________              __________      _____________
          |         |    |        |   parse    | static   |    |             |
source -> | scanner | -> | parser | - tree  -> | analyzer | -> | interpreter | -> result
          |_________|    |________|            |__________|    |_____________|

We will assume that a correct expression contains the value declarations (Ide = Num) for all its identifiers. In this way we don't need to give in input any data during execution. This concept will be made clearer in the following section.

Correct expressions

The language contains expressions like

x + 2,
let x = 2 in 3 end + x,
let x = 2 in 3 + x end

of these, only the last can be considered a correct expression (and its value is 5). The other two contain an undeclared identifier, i.e. an identifier that is not in the scope of any declaration.

Remember that the scope of the declaration x = n in the expression

   let x = n in e end

is the expression e, except for the parts of e in which x is redeclared, if any.

This rule tells how the "scopes" are structured in an expression. They can be seen as nested blocks. Each identifier occurrence is associated to the innermost block (containing the occurrence) where the identifier is declared. Hence:

   let x = 2
    in let x = 3   
        in x + 1
       end
   end

has value 4, while

   let x = 2
    in let x = 3   
        in x + 1
       end
       + x
   end

has value 6.

The requisite that every occurrence of an identifier is in the scope of some declaration should be checked by the static analyzer. It cannot be specified by the context free grammar, because it is a typical context-dependent information.

Structures necessary for the interpreter

We will specify our interpreter in a C++like language.

Parse trees

First of all, we need to represent parse trees. Remembers that parse trees are simplified representations of derivation trees. Examples of parse trees are:

     +
    / \
   2   *      2 + ( 3 * 4 )
      / \
     3   4 


     x
    / \
   2   +       let x = 2 in x + 3 end
      / \
     x   3

We can easily see that all the expressions can be represented by using binary trees. Hence we will declare a structure of the following kind:

   class tree{
      node* root;
      tree* left;
      tree* right;
      ...
   }

It will be convenient to allocate in the node the following fields:

a string representing the type of node, which can be
- "num": The node represents an expression containing only a number.
- "ide": The node represents an expression containing only an identifier.
- "op": The node represents an expression of the form Exp Op Exp. In this case the operation symbol ("+", "*", "-" or "/"). is stored in the node itself, the first expression is represented in the left subtree, and the second expression in the right subtree.
- "dec" (for "declaration"): the node is the root of an expression of the form let Ide = Num in Exp. In this case the Ide is stored in the node itself, Num in the left subtree, and Exp in right subtree.
one or more fields representing the "value" in the case "num", the identifier in the case "ide" and "dec", and the actual operation ("+", "*", "-" or "/") in the case "op".

In summary, we have

   class node{
      string type;
      string st; // identifier or operation in case type is "ide", "dec" or "op"
      int    value;   // value in case type is "num"
      ...
   }

Environments

When evaluating the parse tree, we must evaluate an identifier occurrence according to its corresponding declaration. To this purpose, we need to keep a list of associations (identifier, value). Such list is usually called environment. Each time we encounter a declaration we add an association to the environment. Each time we need the value of an identifier, we look in the environment for the most recent association inserted for that identifier (LIFO discipline). Each time we exit a block we go back to the previous environment (i.e. we disregard the association made when entering the block). This treatment of the environment corresponds to the scoping rule seen before.

Environments can be represented as objects of the following class:

   class environment{
      class association{
          string ide;
          int    value;
      } 
      association  assoc;
      environment* next;
      ...
   }

The interpreter

We are now ready to outline the function representing the interpreter. We will call such a function "eval" (for "evaluation"). We will define it recursively. We need to pass the environment as a parameter of eval. In fact, when evaluating a block like

   let x = n in e end

the recursive call of eval on e will need to be executed in an environment enriched with the association (x,2). In the following, we use various methods to access the information in the parse tree and in the environment. We use significant names in the hope that their meaning will be clear.

Note: the program is written in C++like, meaning that we use features that we find convenient, even if they are not allowed in real C++ programs (for instance, the type string in the switch statement). Translating the program to a real C++ program should not be difficult.

   int eval(tree* t, environment* r){
          node* n = t->get_root();
          string ty = n->get_type();
          switch (ty) {
            case "num": 
               return n->get_value();
            case "ide": 
               return r->lookup(n->get_ide());
            case "op" : {  
               int k1 = eval(t->get_left(), r); 
               int k2 = eval(t->get_right(), r); 
               switch (n->get_op()) {
                  case "+": 
                     return k1 + k2;
                  case "*": 
                     return k1 * k2;
                  case "-": 
                     return k1 - k2;
                  case "/": 
                     return k1 / k2;
               }
            } 
            case "dec": {  
               string x = n->get_ide();
               int k = t->get_left()->get_root()->get_value();
               environment* r1 = r->add(x,k);
               return eval(t->get_right(), r1);                   
            }    
          }
   }