The language is specified by the following grammar:
Exp ::= Num | Ide | Exp Op Exp | let Ide = Num in Exp end Op ::= + | * | - | /Num generates the natural numbers, that can be represented as sequences of digits. The first digit cannot be 0, except for the number 0 itself. A possible grammar for Num is the following:
Num ::= 0 | Non_Zero_Digit Seq_Digit Non_Zero_Digit ::= 1 | 2 | 3 | ... | 9 Digit ::= 0 | Non_Zero_Digit Seq_Digit ::= lambda | Digit Seq_DigitIde generates the identifiers, which can be choosen to simply be sequences of letters. A possible grammar for Ide is therefore:
Ide ::= Letter | Letter Ide Letter ::= a | b | c | ... | zNote that the grammar is ambiguous, but we will not worry about that: the input of the interpreter will be the parse tree, hence there will not be ambiguities about the structure. A grammar which is used to illustrate the parse trees directly is called "abstract syntax".
In order to define the parse tree, we have to decide what are the tokens of our language. We will assume that the tokens are the numbers, the identifiers, the operators, and the keywords "let", "in", and "end". The characteristic of the tokens is that they don't generate a complex (parse tree) structure, the information about a token will be all contained in one node of the parse tree.
We will not worry about the static correctness of our expressions. We will assume that the static correctness has been already checked wen the parse tree is given in input to the interpreter. Remember the scheme of an interpretation-based implementation:
_________ ________ __________ _____________ | | | | parse | static | | | source -> | scanner | -> | parser | - tree -> | analyzer | -> | interpreter | -> result |_________| |________| |__________| |_____________|
We will assume that a correct expression contains the value declarations (Ide = Num) for all its identifiers. In this way we don't need to give in input any data during execution. This concept will be made clearer in the following section.
Remember that the scope of the declaration x = n in the expression
let x = n in e endis the expression e, except for the parts of e in which x is redeclared, if any.
This rule tells how the "scopes" are structured in an expression. They can be seen as nested blocks. Each identifier occurrence is associated to the innermost block (containing the occurrence) where the identifier is declared. Hence:
let x = 2 in let x = 3 in x + 1 end endhas value 4, while
let x = 2 in let x = 3 in x + 1 end + x endhas value 6.
The requisite that every occurrence of an identifier is in the scope of some declaration should be checked by the static analyzer. It cannot be specified by the context free grammar, because it is a typical context-dependent information.
First of all, we need to represent parse trees. Remembers that parse trees are simplified representations of derivation trees. Examples of parse trees are:
+ / \ 2 * 2 + ( 3 * 4 ) / \ 3 4 x / \ 2 + let x = 2 in x + 3 end / \ x 3We can easily see that all the expressions can be represented by using binary trees. Hence we will declare a structure of the following kind:
class tree{ node* root; tree* left; tree* right; ... }It will be convenient to allocate in the node the following fields:
class node{ string type; string st; // identifier or operation in case type is "ide", "dec" or "op" int value; // value in case type is "num" ... }
Environments can be represented as objects of the following class:
class environment{ class association{ string ide; int value; } association assoc; environment* next; ... }
let x = n in e endthe recursive call of eval on e will need to be executed in an environment enriched with the association (x,2). In the following, we use various methods to access the information in the parse tree and in the environment. We use significant names in the hope that their meaning will be clear.
Note: the program is written in C++like, meaning that we use features that we find convenient, even if they are not allowed in real C++ programs (for instance, the type string in the switch statement). Translating the program to a real C++ program should not be difficult.
int eval(tree* t, environment* r){ node* n = t->get_root(); string ty = n->get_type(); switch (ty) { case "num": return n->get_value(); case "ide": return r->lookup(n->get_ide()); case "op" : { int k1 = eval(t->get_left(), r); int k2 = eval(t->get_right(), r); switch (n->get_op()) { case "+": return k1 + k2; case "*": return k1 * k2; case "-": return k1 - k2; case "/": return k1 / k2; } } case "dec": { string x = n->get_ide(); int k = t->get_left()->get_root()->get_value(); r->add(x,k); int result = eval(t->get_right(), r); r->pop(); return result; } } }