Week 11 - Chapter 15 - Access Types and Dynamic Data Structures

Theory

Types of Storage

Basically, you have two types:

Stack - Normally, if you declare variables in a program, they are allocated from the stack, which is to say, within the programs own execution space in memory.
Heap - When you create dynamic structures, you cannot know their total size and organization at compile-time, therefore it is impossible to allocate memory for them on the stack. Instead, you must draw from the pool of reserve memory available on the machine.

Up until now, all your programs have taken memory off the stack. Tonight, we'll learn how to get stuff off the heap.

Pointer Principles

The need for Pointers

In order to create dynamic data structures, at some level, you need pointers or references. As explained above, this is because you cannot know the storage requirements, size, or location of your dynamic structures at compile-time.

Pointers Defined

All a pointer is, is a location in memory that contains the address of another location in memory.

Some pointer implementations, like C, are very low-level and afford the programmer a great deal of manipulation. Other implementations, like Java and Ada, have more rigid guidelines regarding pointer usage, hide much of the detail from the programmer, and don't allow for as much flexibility.

Dereferencing

To see what a pointer is referencing, you need to dereference the pointer. That is to say, you need to tell your program to look at the location in memory that the pointer is pointing to.

In C and C++, pointer dereferencing needs to be done explicitly. If you had a scalar value pointed to by a variable named pval, you would need to say *pval to view the value it pointed to. If you had a pointer to a struct named prec, you would need to say prec->field to access the fields of that struct. Contrast this to the normal way of field access in C done as rec.field.

Other languages dereference pointers implicitly, providing a more homogenous interface. For example, to access a record field you would say rec.field. To access the same field from a pointer that points to the record, you would also say ptr.field. Examples of languages which support implicit dereferencing include Java, and Ada.

Shallow vs. Deep Copy

We've mentioned these before, but it's time to spell out the difference as it is very apropos to pointers. To frame the discussion, consider a situation where we have a pointer pointing to a dynamically allocated object and a second pointer waiting to point to something. Java code example:

	Object p1 = new Object();
	Object p2;

A Shallow Copy means that a second pointer is set to point to the same object. In other words, an alias is created. This would be done by: p2 = p1;
A Deep Copy means that the object pointed to is cloned, property by property; new memory is allocated to hold the clone which is an exact duplicate of the first object, but not the same object. This would be done with: p2 = p1.clone();

We'll go over this in class to make sure it's crystal clear to everybody.

Approaches to Memory Allocation/Deallocation

Explicit Memory (De)Allocation

In this approach, you have to do all memory management by hand.

In C this is done with a malloc() + free() pair. When you call malloc(), the operating system attempts to find a large enough chunk of memory from its free list to satisfy the request. When you call free(), it returns the memory to the free list. In the object-oriented world, the corolaries to malloc() and free() is a constructor (invoked with new) and a destructor.

Doing memory allocation by hand would be the best approach if you had a program which was time-critical, couldn't support a seperate garbage collection thread, or had to run in a very small execution space.

Examples of explicit memory-management languages are C, C++, and Pascal.

Garbage Collection

Garbage collection means that memory management is done for you. Easy on the programmer, but may result in slower execution time. This is typically done in an interpreted language. Garbage collection is discussed in the middle of page 639.

There are many levels of sophistication in a garbage collection scheme:

When the Program Exits - This is unquestionably the least sophisticated approach, only useful for small programs which have a short execution time. On the plus side, you don't have a seperate GC thread slowing down execution time.
Simple Run-Time Deallocation - This is more sophisticated and usually requires a multi-threaded environment. Basically, you have a GC thread running in concurrent execution with your program which tracks pointer variables that have had memory dynamically allocated to them. When these variables are no longer being used (they are "unreferenced") the GC thread returns the allocated memory to the pool.
Sophisticated Run-Time Deallocation - The problem with the previous approach is that it can break down when you have dynamic data structures which might be self-referencing like a circular queue. Since all the nodes have a reference, they will never be deleted! Therefore, such a situation requires a more thorough examination of memory (usually implemented with "weak references" and when it is in use. Needless to say, this approach is the most process-intensive.

The occasions where a garbage-collection approach to memory management would be desirable are those situations where you have lots of available resources, execution of the program is not time-critical, and you want to be able to get something done quickly without worrying about all the nitty-gritty details.

The downside to garbage collection is that it can involve more overhead.

Examples of languages which support garbage collection are LISP, BASIC, Perl, Python, and Java.

Which Approach Is Best?

There has been a lot of debate about which memory allocation approach is better.

Proponents of Explicit Allocation typicall criticize garbage collection saying that it is too slow and requires additional system resources, making GC unacceptable for high-performance or embedded system computing.

Proponents of Garbage Collection typically criticize explicit allocation saying that freeing all memory used is inherently error-prone and often results in memory leaks. (Especially when a program is complicated by multi-threading, cross-process passing of variables, and distributed execution). They also will tout the benefits of GC saying that it ensures that all allocated memory is deallocated (resulting in no memory leaks) and encourages rapid development because programmers needn't worry about writing destructors all the time.

Since Ada was designed for real-time embedded systems, it chooses the Explicit approach. This means that (you guessed it) you have to write more code to make memory management happen.

The majority of emerging programming languages feature garbage collection. This might suggest it is "winning out" as the better of the two, but it probably also suggests that computers are getting more powerful, minimizing the "too much overhead" argument. Above all, it probably proves (once again) that the first cardinal virtue of a programmer is laziness.

Ada Stuff

Access Types (Pointers) - (p. 631-632) An "Access Type" in Ada is just a pointer which can refrence a certain type of object. A step-by-step example of how they're declared and used found on page 632. For some example code, look at the "Creating a Linked Structure" example that begins on page 634 (it goes on for quite awhile). For a synopsis, look at the "Summary of Operations on Access Values" on p. 638-639.

New - (p. 632) NEW is how you create new objects off the heap and assign them to a pointer. If you try to make a new object with NEW and there is not enough memory to satisfy the request, a Storage_Error exception will be thrown. See "Running Out of Dynamic Storage?" on p. 640 for details. Also, read the white box titled "Programs Should Clean Up after Themselves" on the same page.

Deallocators - (p. 639) There is no garbage collection in Ada. Just like with C++, Ada makes you free memory by hand. Look at the explanation and syntax on page 639 under the section "Returning Dynamic Storage to the Pool". Basically to create your own destructor, you will need to put

WITH Unchecked_Deallocation;

at the top of your program and a

PROCEDURE MyDestructor IS
	NEW Unchecked_Deallocation(Object => [record type], Name => [pointer type]

Also, note the bulleted list at the top of page 640. This gives the rules for deallocation.

Silly 'ALL' Keyword - (p. 635) About two-thirds of the way down the page on p. 635 it explains how the ptr.ALL syntax was originally used to dereference a pointer. That syntax is now obsolete. It's also very stupid. Don't use it for dereferencing.

Shallow / Deep Copies in Ada - Normal assignment between two pointers (nee 'ACCESS' types) in Ada is a shallow copy. That is to say, one pointer gets assigned the address of the other pointer. To do a deep copy, you can use that silly ALL keyword. The chapter summary on page 671 makes that clear.

Dereferencing NULL Pointers - (p. 669) Summary: Don't do it. Page 669 introduces the short circuit AND THEN operator which shows how you can avoid dereferencing null pointers. Worth a gander.

The book talks about 'Subunits' in this chapter too, but we are not going to worry about them at all.

Assignment 9

Remember your assignment # 5? That's the one where you made an array of records to hold fractional numbers. Copy it to assn9.adb by typing:

	cp assn5.adb assn9.adb

then modify it in the following ways:

Rather than declaring an array of Fraction records, declare an array of pointers to Fraction records. (Hint: Use that ACCESS keyword.)
Create new objects off the heap using the NEW keyword.
Keep all the other stuff from assignment 5 where you assign values to the members and print the members.
After you're done printing the values, destroy the objects with a hand-made destructor. This means deriving from that Unchecked_Deallocation function.

Call it "assn9.adb". You will not need more than one file for this assignment. I will be looking for the keywords: ACCESS, NEW, and Unchecked_Deallocation.

This assignment is due on December 5th, 2001, the night of the Final Exam!