CSE 428: Lecture Notes 9

Memory management and activation records

In the compilation-based implementation of a language, the source code is translated into a program in the language of the machine. Usually one instruction in the source corresponds to several instructions at the machine level.

The variables of the source languages are mapped into memory addresses.

Example

Source instruction

   x = x + 1;

Translated code (in some assembly language). Assume that x is mapped into the memory location 1010 and that operations can only be performed on registers. Let R1 be a register.

   LOAD 1010,R1  //copy content of location 1010 in R1
   ADD  R1,#1    //add constant 1 to R1
   STO  R1,1010  //copy content of R1 in location 1010

One of the main issues is the association between variables and locations: it is in general impossible to know at compile time how many variables are going to be created at run time (because of recursive procedures containing local variables and because of dynamic variables). Hence we cannot solve all the allocations at compile time. Additionally, we want to be efficient in the management of the memory: We only want to allocate storage for a variable when needed, and we want to deallocate when possible.

We will call lifetime of a variable the period of time during execution in which the variable has storage allocated for it.

We want lifetime to correspond to the need for the variable - must be at least as long as the need. Typically, we compromise and choose standard times for allocation and deallocation:

Global variables: Lifetime is entire runtime of program
Local variables (variables declared in a block/procedure): Lifetime is during activation of procedure
User-allocated variables (aka dynamic variables - variables created with new and destroyed with delete): Lifetime is from user allocation to user deallocation

Correspondingly, we have the following storage allocation policies:

Static Allocation (for globals only)
- Done at compile time
- Lifetime = entire runtime of program
- Advantage: efficient execution time
Dynamic Allocation of local variables
- Done at run time
- Lifetimes = duration of procedure activation
- Advantage: efficient storage use
  - Two Methods:
    1. Stack Allocation (requires language restrictions - in particular cannot be used for languages like ML, which support higher order functions)
    2. Heap Allocation (requires garbage collection)
Dynamic Allocation of user-allocated variables
- Done at run time
- Lifetimes = until the user deletes it (or until it is garbage-collected)
- Advantage: permits creation of dynamic structures, like lists, trees, etc.
- Heap Allocation

Stack-based allocation

This is the most used technique for imperative languages (although some imperative languages like Pascal have also implementations based on Heap-allocation).

In this kind of implementation, the memory of the machine is divided in three parts:

A space for the globals and the code of the program
The stack, for the locals
The heap, for the dynamic variables

The division between 1 and 2 is only logical: physically the locals and the code are (usually) placed at the base of the stack.

We will discuss here how the stack is handled. Let us consider the various operations that take place when a procedure is called:

Caller processes actual parameters (evaluation, address calculation) and stores them
Caller stores some control information (e.g., return address, dynamic link)
Control transferred from Caller to Callee
Callee allocates storage for locals
Callee executes
Callee deallocates storage for locals
Callee stores return value (if function)
Control transferred back to Caller
Caller deallocates storage used for control information and actual parameters

The memory space in which the parameters, control information, locals, etc. are stored is called Activation Records (AR) or stack frame. At runtime, a procedure call causes the procedure object to be bound to an AR, which is then stored in the stack

When a procedure is called, an AR for it is pushed onto the stack
When a procedure returns, its AR (topmost) is popped from the stack

Let us summarize the informations that must be present on an AR:

local bindings (local variables, temporaries)
parameter bindings
return value (for functions)
return address: the adress in the code of the caller where the execution must continue when the callee returns. Namely, the value which the program counter must be set to when the callee returns.
dynamic link: the address in the stack of the previous AR, which in the stack-like (i.e. LIFO) discipline is always the AR of the callee. We need to assume, of course, that the language is designed so that procedures can only return to their calling point.

For languages like C and C++, which have no nested procedure declarations, the above information are all what we need in the AR.

In languages with nested procedures, however, the situation is more complicated. Next section is dedicated to discussing this issue.

Nested procedures, static and dynamic scope

In a language that allows nested declarations of procedures, like Pascal, a procedure p might contain occurrences of variables which are neither local to p, nor global: they are local to some other procedure. Such occurrences are called non-local (in p).

There are two possible scoping rules which determine the declaration which should be associated to the occurrence of a non-local variable x in p:

Static scope (aka lexical scope): the declaration valid for x is the one valid for x where p is declared
Dynamic scope: the declaration valid for x is the one valid for x where p is called

Almost all languages choose static scope, because it makes programs more clear and understandable. It is however more complicated to implement. In particular, we need an additional information in the activation rtecord: the so-called static link (aka access link).

Static link

The static link of an activation record of a procedure p contains the address of the last activation record on the stack of the procedure where p is declared.

In this way, we can always find the addess of a non-local variable x: we just follow the chain of the static links until we "find" a declaration for x. (Actually, in real implementations the number of static links we need to traverse is determined statically, and the address for x in the AR is dertermined by a fixed offset.)

In dynamically-scoped languages we don't need a static link: the declaration valid for a variable x can be found by following the chain of the dynamic links.

Determining the static link

In languages with nested procedure declarations, there is usually the following restriction on ta procedure call: in the tree representing the hierarchy of procedure declarations, the callee cannot be at a lower lever than the caller, unless it is the son:

Example

procedure p                         p
   procedure q                     / \
      <body of q>;                /   \
   procedure r                   q     r
       procedure s                     |
          <body of s>;                 |
       <body of r>;                    s
   <body of p>;

We have that:

p can call q and r, but not s
q can call p and r, but not s
r and s can call everybody

In order to determine the static link at run time, it is sufficient to associate at each procedure call, at compile time, the difference in level between the caller and the callee, plus 1. For instance, if s calls q, the number is 2. If s calls p, the number is 3. If r calls s, the number is 0. At run time, such number indicates the number of AR that we have to traverse, starting from the AR of the caller and following the static links, in order to find the AR which the static link of the callee must point to.

Example

   procedure p; 
      var x,y: integer;   // variables x,y local to p
      procedure q(y: integer);   // procedure q local to p 
         begin if x = y then r else write(x) end; // body of q 
      procedure r;        // procedure r local to p 
         var x : integer;    // variable x local to r
         begin x := 2; if y = 2 then q(x) else write(x) end;  // body of r 
      begin    // begin body of p
      x:= 1; 
      y:= 2; 
      q(x)
      end;     // end body of p

We have that an activation of p prints:

2, under dynamic scope

1, under static scope. The snapshot (i.e. the configuration of the activation records on the stack) at the moment in which the write instruction is executed is the following:



caller of p        p I           A.R. where p is declared
        ^   ------------------      ^ 
     CL |___|_               |      | 
            |                |      |
            ------------------      |
            |               _|______| SL
            |                |
            ------------------
        |-->|        -----   |<-|---|----| 
        |   |      x | 1 |   |  |   |    |           
        |   |        -----   |  |   |    |
        |   |      y | 2 |   |  |   |    |
        |   |        -----   |  |   |    |
     CL |   ------------------  |   |    |
        |                       |   |    |
        |          q I          |   |    |
        |   ------------------  |   |    |
        |___|_               |  |   |    | 
            |                |  |   |    |
            ------------------  |   |    |
            |               _|__|SL |    |
            |                |      |    |
            ------------------      |    |
        |-->|        -----   |      |    |
        |   |      y | 1 |   |      |    |
        |   |        -----   |      |    |
     CL |   ------------------      |    |
        |                           |    |
        |          r I              |    |
        |   ------------------      |    | 
        |___|_               |      |    |
            |                |      |    |
            ------------------      |    | 
            |               _|______| SL |
            |                |           |
            ------------------           |
        |-->|        -----   |           |
        |   |      x | 2 |   |           |
        |   |        -----   |           |
     CL |   ------------------           |
        |                                | 
        |          Q II                  |
        |   ------------------           |
        |___|_               |           |
            |                |           |
            ------------------           |
            |               _|___________| SL 
            |                |
            ------------------
            |        -----   |
            |      y | 2 |   |
            |        -----   |
            ------------------      

    CL = Control Link
    SL = Static  Link