CSE 428: Lecture Notes 8 and 9


Dynamic variables

Dynamic variables are those variables that are created by an instruction of dynamic memory allocation. Note that these are the only variables that are allocated by an instruction rather than by a declaration. As a consequence, they are not subject to scoping rules: the end of their lifetime is determined by an explicit delete instruction or by garbage collection (if there is garbage collection).

Dynamic variables are used for creating dynamic structures, like lists, trees, etc. Namely structures which can expand or shrink at run-time.

Pointers

Dynamic variables are variables without a name (anonymous variables). Namely, there is no name in the program which refer to them directly, because they are not created by a declaration. They are accessible via variables of type pointer.

A variable x of type "pointer to type T" can be declared in C++ as follows

   T* x;

Creation of dynamic variables

A dynamic variable can be allocated in C++ with the operation new T. The location is taken from the heap. The heap is divided, at each moment of the execution, in two parts:

Meaning of the operator "new T": Allocate a variable from the heap (i.e. remove a location from FL), and return its address. Actually, depending on the "size" of T there might be more than one location removed. However we will not be concerned with this issue for the time being.

Thus an instruction of the form

   x = new T;
in C++ assigns to the pointer x the address of the location taken from the FL. Note that there are two locations involved: the location l associated to x by the declaration T* x; and the location l' taken from the FL. The content of l is the address of l'.

In order to access the location l', we must use x, dereferenced. The dereferencing operator in C++ is the unary *. Thus an assignment to l', or the access to the value of l', can be done by using *x.

Memory leaks

If we change the value of x (i.e. of the location l), the link to l' is lost, so l' is not accessible anymore, and cannot be reallocated either. Such a situation is called memory leak. More precisely:
A memory leak is a heap location that cannot be reached anymore from the active variables and yet it is not in the FL
Clearly a memory leak represents an undesirable situation, because it's a waste of memory. If too much leakage is generated, then the execution might abort due to lack of free memory.

For instance, we have a memory leak if, after the instruction x = new T; we execute any of the following instructions

We also get a memory leak if the pointer goes out of scope, like in the case of the following procedure:
   void p(){
      int* x;
      x = new int;
   }  // when p returns x is deallocated and the location pointed by x becomes a memory leak
Finally, a common situation in which the risk of memory leak may arise is when the pointer is itself in the heap, and gets deallocated. This is typical when we have dynamic structures, like lists and trees. We will discuss this situation later.

Deallocation of dynamic variables

In order to avoid memory leaks, the user must deallocate the location which is not needed anymore. In C++ this is done with an instruction of the form
   delete x;
Meaning of the instruction "delete x;": place back in FL the location pointed by x

For instance, in the procedure p above, we could avoid memory leak by adding the instruction delete x; before p returns. Analogously, we could add such instruction before x = NULL; etc.

Deallocation of dynamic structures

Consider a class "tree" declared as follows:
   class tree{
          int info;
          tree* left;
          tree* right;
       public:
          tree(int n){info = n; left = right = NULL; }
          tree(int n, tree* l, tree* r){info = n; left = l; right = r; }
   }
Suppose that we create a tree with an instructions like the following:
   tree* t = new tree(2, new tree(1), new tree(3));        
Now, when we want to destroy the tree, we cannot simply use
   delete t;        
because this would deallocate only the node pointed by t, and would leave the nodes pointed by t->left and t->right as memory leaks. Note that the pointers t->left and t->right were living in the heap.

In order to deallocate the all tree we should write

   delete (t->left);
   delete (t->right);
   delete t;        

A similar problem occurs when t is actually on the stack, and goes out of scope. For instance:

   void q(){
      tree t = tree(2, new tree(1), new tree(3));
   }  // when q returns the nodes pointed by t.left and t.right become memory leaks
In the latter case the pointers t.left and t.right were living in the stack.

In order to avoid memory leaks in this second case, we should write:

   void q(){
      tree t = tree(2, new tree(1), new tree(3));
      delete (t.left);
      delete (t.right);
   }  // no memory leaks when q returns

Destructor methods

C++ and other OO languages support destructor methods, which are methods invoked automatically when the corresponding objects are destroyed, either because they go out of scope (if they were local variables, living in the stack) or because of a delete operation (if they were dynamic variables, living in the heap). Destructors are primarily used to cope with memory deallocation.

Example

A destructor method for the class tree seen before can be defined in C++ as follows:
   ~tree(){ delete left; delete right; }
This destrucor will deallocate all nodes in a tree of arbitrary size. The deletion of each node, in fact, causes recursively the application of the destructor to the left and right subtrees. The terminal case is when left and right are NULL (leaf). delete applied to NULL has no effect.

Dangling pointers

In C++, after the instruction delete x; is executed, x becomes a dangling pointer. Namely it points to a location which is now in the FL (dangling pointer to the heap). More in general
A dangling pointer is a pointer to a location which is considered free and may be reallocated later
A dangling pointer is considered a dangerous situation, because it is a potential source of errors difficult to detect. Consider for instance the following situation:
   int* x = new int;
   delete x;
   int* y = new int;
   *y = 5;
   *x = *x + 1;
   cout << *y;  //  it may print 6 instead of 5. 
The cout instruction in the code above prints 6 in case the location allocated for the pointer y is the one which was returned to the FL by delete x;. (This will be the case if the FL is handled with a LIFO discipline.) In C++ delete x; does not erase from x the address of the location, hence *x still refers to the same location that it was referring to before delete was executed.

Note that, even if delete x would "cancel" the content of x, delete can still cause dangling pointers. Consider for instance the following fragment:

   int* x = new int;
   int* y = x;
   delete x; //y becomes a dangling pointer
In general, delete cannot cancel the content of all pointers which are pointing to the location which is being returned to the FL: it would be too expensive.

As another example, consider the following program, which uses a list. In the main function, after the instruction delete L; the pointer L1 becomes a dangling reference and the instruction L2 = new list(2,L1) may create a circular list.

   const int NULL = 0;

   #include 

   class list {

   public:
      int info;
      list* next;

      list(int n, list* l){
         info = n;
         next = l;
      }

   };

   void main(){

      list* L1 = new list(1,NULL);
      list* L = L1;
      delete L;               // L1 becomes a dangling reference
      list* L2 = new list(2,L1);  // It may create a circular list

      while (L2 != NULL) {    // It will loop forever if L2 is a circular list
         cout << L2->info;
         L2 = L2->next;
      }

   } 

Dangling pointers to the stack

In C and C++ it is possible to store in a pointer the address of a location in the stack. This can be done for instance by using the referencing operator (aka address operator) &.

Example

Consider the following declaration
   void p(){
      int y;
      x = &y;  // x is a global variable of type int*
   } 
During a call to p, x is set to point to the address of y, which is on the stack. After p returns, y is deallocated and x becomes therefore a dangling pointer to the stack.

A dangling pointer to the stack is a very dangerous situation, which may lead to catastrophic and not easily detectable errors. Like, for instance, changing the value of the control link or the static link, etc.

In Pascal the operations on the pointers have been restricted so that dangling pointers are confined to the heap. Namely, a dangling pointer to the stack can never occur. Pascal does not allow pointer aritmetic, and the referencing operator does not exists.

Garbage collection

Garbage collection is a mechanism that automatically (from time to time during the execution) recollects memory leaks and put them back in the FL. In languages where garbage collection is implemented the programmer does not need to worry about deleting variables. In fact, delete is (usually) not even supported. In this way, the programmer does not need to worry about dangling references either.

Garbage collection is a very convenient mechanism from the programmer's point of view. The obvious disadvantage is that it is expensive. In fact, it is in general costly to determine whether a location is a memory leak (garbage) or not, because we need to check whether or not it can be reached from the active variables. Typically, there might be locations that are linked in structures like trees or lists, and in order to determine whether they are leaks or not we need to trace all chain of links.

Garbage collection is used in Java, in some implementations of Pascal, and in most functional and logical languages, where the heap allocation and deallocation is transparent to the user. Additionally, it is used in the implementation of languages that (because of their features) cannot be implemented in a stack-based manner and need to allocate the activation records in the heap. One particular features that makes stack-based allocation impossible is the presence of higher-order functions, in combination with static scope. We will discuss this problem later in the course when we will introduce functional programming in ML.