CSE 428: Lecture Notes 9
Memory management and activation records
In the compilation-based implementation of a language,
the source code is translated into a program in the language of the
machine. Usually one instruction in the source corresponds to several
instructions at the machine level.
The variables of the source languages are mapped into memory addresses.
Example
Source instruction
x = x + 1;
Translated code (in some assembly language). Assume that x is mapped into the
memory location 1010 and that operations can only be performed on registers.
Let R1 be a register.
LOAD 1010,R1 //copy content of location 1010 in R1
ADD R1,#1 //add constant 1 to R1
STO R1,1010 //copy content of R1 in location 1010
One of the main issues is the association between variables and locations:
it is in general impossible to know at compile time
how many variables are going to be created at run time
(because of recursive procedures containing local variables and because of
dynamic variables). Hence we cannot solve all the allocations at compile time.
Additionally, we want to be efficient in the management of the memory:
We only want to allocate storage for a variable when needed, and we
want to deallocate when possible.
We will call lifetime of a variable
the period of time during execution in which the variable
has storage allocated for it.
We want lifetime to correspond to the need for the variable - must be at least as long as
the need. Typically, we compromise and choose standard times for allocation and deallocation:
- Global variables: Lifetime is entire runtime of program
- Local variables (variables declared in a block/procedure):
Lifetime is during activation of procedure
- User-allocated variables (aka dynamic variables - variables created with new and destroyed with delete): Lifetime is from user allocation to user deallocation
Correspondingly, we have the following storage allocation policies:
- Static Allocation (for globals only)
- Done at compile time
- Lifetime = entire runtime of program
- Advantage: efficient execution time
- Dynamic Allocation of local variables
- Done at run time
- Lifetimes = duration of procedure activation
- Advantage: efficient storage use
- Two Methods:
- Stack Allocation (requires language restrictions - in particular
cannot be used for languages like ML, which support higher order functions)
- Heap Allocation (requires garbage collection)
- Dynamic Allocation of user-allocated variables
- Done at run time
- Lifetimes = until the user deletes it (or until it is garbage-collected)
- Advantage: permits creation of dynamic structures, like lists, trees, etc.
- Heap Allocation
Stack-based allocation
This is the most used technique for imperative languages
(although some imperative languages like Pascal have
also implementations based on Heap-allocation).
In this kind of implementation, the memory of the machine is divided in three parts:
- A space for the globals and the code of the program
- The stack, for the locals
- The heap, for the dynamic variables
The division between 1 and 2 is only logical: physically the locals and the code are
(usually) placed at the base of the stack.
We will discuss here how the stack is handled.
Let us consider the various operations that take place when
a procedure is called:
- Caller processes actual parameters (evaluation, address calculation)
and stores them
- Caller stores some control information (e.g., return address, dynamic link)
- Control transferred from Caller to Callee
- Callee allocates storage for locals
- Callee executes
- Callee deallocates storage for locals
- Callee stores return value (if function)
- Control transferred back to Caller
- Caller deallocates storage used for control information and
actual parameters
The memory space in which the parameters, control information, locals, etc. are stored is called
Activation Records (AR) or stack frame.
At runtime, a procedure call causes the procedure
object to be bound to an AR, which is then stored in the stack
- When a procedure is called, an AR
for it is pushed onto the stack
- When a procedure returns, its AR
(topmost) is popped from the stack
Let us summarize the informations that must be present on an AR:
- local bindings (local variables, temporaries)
- parameter bindings
- return value (for functions)
- return address: the adress in the code of the caller where
the execution must continue when the callee returns. Namely, the value which the
program counter must be set to when the callee returns.
- dynamic link: the address in the stack of the previous AR, which in the stack-like
(i.e. LIFO) discipline is always
the AR of the callee. We need to assume, of course, that the language is designed so that
procedures can only return to their calling point.
For languages like C and C++, which have no nested procedure declarations, the above information are
all what we need in the AR.
In languages with nested procedures, however, the situation is more complicated. Next section is
dedicated to discussing this issue.
Nested procedures, static and dynamic scope
In a language that allows nested declarations of procedures, like Pascal,
a procedure p might contain occurrences of variables which are neither local
to p, nor global: they are local to some other procedure. Such occurrences are
called non-local (in p).
There are two possible scoping rules which determine the declaration which should be
associated to the occurrence of a non-local variable x in p:
- Static scope (aka lexical scope): the declaration valid for x
is the one valid for x where p is declared
- Dynamic scope: the declaration valid for x
is the one valid for x where p is called
Almost all languages choose static scope, because it makes programs more clear
and understandable. It is however more complicated to implement.
In particular, we need an additional information in the activation rtecord:
the so-called static link (aka access link).
Static link
The static link of an activation record of a procedure p contains the address of the
last activation record on the stack of the procedure where p is declared.
In this way, we can always find the addess of a non-local variable x:
we just follow the chain of the static links until we "find" a declaration for x.
(Actually, in real implementations the number of static links we need to traverse is
determined statically, and the address for x in the AR is dertermined by a fixed offset.)
In dynamically-scoped languages we don't need a static link: the declaration valid for a
variable x can be found by following the chain of the dynamic links.
Determining the static link
In languages with nested procedure declarations, there is usually the following restriction on
ta procedure call: in the tree representing the hierarchy of procedure declarations,
the callee cannot be at a lower lever than the caller, unless it is the son:
Example
procedure p p
procedure q / \
<body of q>; / \
procedure r q r
procedure s |
<body of s>; |
<body of r>; s
<body of p>;
We have that:
- p can call q and r, but not s
- q can call p and r, but not s
- r and s can call everybody
In order to determine the static link at run time, it is sufficient to
associate at each procedure call, at compile time, the difference in level
between the caller and the callee, plus 1. For instance, if s calls q, the number is 2.
If s calls p, the number is 3. If r calls s, the number is 0.
At run time, such number indicates the number of AR that we have
to traverse, starting from the AR of the caller and following the static links,
in order to find the AR which the static link of the callee must point to.
Example
procedure p;
var x,y: integer; // variables x,y local to p
procedure q(y: integer); // procedure q local to p
begin if x = y then r else write(x) end; // body of q
procedure r; // procedure r local to p
var x : integer; // variable x local to r
begin x := 2; if y = 2 then q(x) else write(x) end; // body of r
begin // begin body of p
x:= 1;
y:= 2;
q(x)
end; // end body of p
We have that an activation of p prints: