CSE 428: Lecture Notes 7.2
Memory management and activation records
In the compilation-based implementation of a language,
the source code is translated into a program in the language of the
machine. Usually one instruction in the source corresponds to several
instructions at the machine level.
The variables of the source languages are mapped into memory addresses.
Example
Source instruction
x = x + 1;
Translated code (in some assembly language). Assume that x is mapped into the
memory location 1010 and that operations can only be performed on registers.
Let R1 be a register.
LOAD 1010,R1 //copy content of location 1010 in R1
ADD R1,#1 //add constant 1 to R1
STO R1,1010 //copy content of R1 in location 1010
One of the main issues is the association between variables and locations:
it is in general impossible to know at compile time
how many variables are going to be created at run time
(because of recursive procedures containing local variables and because of
dynamic variables). Hence we cannot solve all the allocations at compile time.
Additionally, we want to be efficient in the management of the memory:
We only want to allocate storage for a variable when needed, and we
want to deallocate when possible.
We will call lifetime of a variable
the period of time during execution in which the variable
has storage allocated for it.
We want lifetime to correspond to the need for the variable - must be at least as long as
the need. Typically, we compromise and choose standard times for allocation and deallocation:
- Global variables: Lifetime is entire runtime of program
- Local variables (variables declared in a block/procedure):
Lifetime is during activation of procedure
- User-allocated variables (aka dynamic variables - variables created with new and destroyed with delete): Lifetime is from user allocation to user deallocation
Correspondingly, we have the following storage allocation policies:
- Static Allocation (for globals only)
- Done at compile time
- Lifetime = entire runtime of program
- Advantage: efficient execution time
- Dynamic Allocation of local variables
- Done at run time
- Lifetime = duration of procedure activation
- Advantage: efficient storage use
- Two Methods:
- Stack Allocation (requires language restrictions - in particular
cannot be used for languages like ML, which support higher order functions)
- Heap Allocation (requires garbage collection)
- Dynamic Allocation of user-allocated variables
- Done at run time
- Lifetime = until the user deletes it (or until it is garbage-collected)
- Advantage: permits creation of dynamic structures, like lists, trees, etc.
- Heap Allocation
Stack-based allocation
This is the most used technique for imperative languages
(although some imperative languages like Pascal have
also implementations based on Heap-allocation).
In this kind of implementation, the memory of the machine is divided in three parts:
- A space for the globals and the code of the program
- The stack, for the locals
- The heap, for the dynamic variables
The division between 1 and 2 is only logical: physically the globals and the code are
(usually) placed at the base of the stack.
We will discuss here how the stack is handled.
Let us consider in detail the various operations that take place when
a procedure is called.
We describe here what happens in the standard implementation of C
and similar ones;
other languages there may be some differences, mainly due to different
parameter-passing strategies. We consider here call by vale and call by value-result.
- Caller allocates storage for formal parameters
- Caller evaluates the actual parameters
and stores their value in the locations of the formal parameters
- Caller allocates storage for some control information
(e.g., return address, dynamic link) and stores them.
- Control transfers from Caller to Callee
- Callee allocates storage for locals and temporaries
- Callee executes
- Callee deallocates storage for locals and temporaries
- Callee stores return value (if function)
- Callee stores result of value-result parameters (if any)
- Control transfers back to Caller
- Caller deallocates storage used for control information and
parameters and for return value (if function)
The memory space in which the parameters, control information, locals, etc.
are stored is called
Activation Record (AR) or stack frame.
At runtime, a procedure call causes the procedure
object to be bound to a new AR, which is inserted on the stack. More precisely:
- When a procedure is called, an AR
for it is pushed onto the stack
- When a procedure returns, its AR
(topmost) is popped from the stack
Typically, an AR contains storage (locations) for:
- local variables and temporaries
- formal parameters (in call by value and call by value-result)
- return value (for functions)
- return address: the address in the code of the caller where
the execution must continue when the callee returns. Namely, the value which the
program counter must be set to when the callee returns.
- dynamic link (aka control link):
the address in the stack of the AR of the caller.
We need to assume, of course, that the language is designed so that
procedures can only return to their calling point.
For languages like C and C++,
which have no nested procedure declarations, the above information are
all what we need in the AR.
In languages with nested procedures, however,
the situation is more complicated. Next section is
dedicated to discussing this issue.
Nested procedures, static and dynamic scope
In a language that allows nested declarations of procedures, like Pascal,
a procedure p might contain occurrences of variables which are neither local
to p, nor global: they are local to some other procedure. Such occurrences are
called non-local (in p).
There are two possible scoping rules which determine the declaration which should be
associated to the occurrence of a non-local variable x in p:
- Static scope (aka lexical scope): the declaration valid for x
is the one valid for x where p is declared
- Dynamic scope: the declaration valid for x
is the one valid for x where p is called
Almost all languages choose static scope, because it makes programs more clear
and understandable. It is however more complicated to implement.
In particular, we need an additional information in the activation record:
the so-called static link (aka access link).
Static link
The static link of an activation record of a procedure p contains the address of the
last activation record on the stack of the procedure where p is declared.
In this way, we can always find the address of a non-local variable x:
we just follow the chain of the static links until we "find" a declaration for x.
(Actually, in real implementations the number of static links we need to traverse is
determined statically, and the address for x in the AR is determined by a fixed offset.)
In dynamically-scoped languages we don't need a static link: the declaration valid for a
variable x can be found by following the chain of the dynamic links.
Determining the static link
In languages with nested procedure declarations, there is usually the following
restriction on
the procedure call: in the tree representing the hierarchy of procedure declarations,
the callee cannot be at a lower lever than the caller, unless it is the son:
Example
procedure p p
procedure q / \
<body of q>; / \
procedure r q r
procedure s |
<body of s>; |
<body of r>; s
<body of p>;
We have that:
- p can call q and r, but not s
- q can call p and r, but not s
- r and s can call everybody
In order to determine the static link at run time, it is sufficient to
associate at each procedure call, at compile time, the difference in level
between the caller and the callee, plus 1. For instance, if s calls q, the number is 2.
If s calls p, the number is 3. If r calls s, the number is 0.
At run time, such number indicates the number of AR that we have
to traverse, starting from the AR of the caller and following the static links,
in order to find the AR which the static link of the callee must point to.
Example
procedure p;
var x,y: integer; // variables x,y local to p
procedure q(y: integer); // procedure q local to p
begin if x = y then r else write(x) end; // body of q
procedure r; // procedure r local to p
var x : integer; // variable x local to r
begin x := 2; if y = 2 then q(x) else write(x) end; // body of r
begin // begin body of p
x:= 1;
y:= 2;
q(x)
end; // end body of p
We have that an activation of p prints: