CS 4536 Homework 4: Garbage Collection

Due: Monday, Feburary 20, 11:59pm via turnin (assignment name hwk4)
Collaboration Policy: Pairs Permitted (but see the warning two paragraphs down)

The Assignment

In this assignment, you will implement two garbage collectors: mark & sweep, and stop & copy. As we have seen, garbage collection involves starting from a root set of references, which reside in local variables and stack frames, and searching for reachable values on the heap. We are providing support code that gives your collector access to a running Scheme program's root set. You just have to implement the memory allocator and the garbage-collection algorithms.

Note that even though this is a pairs assignment, we do not want to see pairs that each write one collector and turn them in under both names. The contrasts behind the collectors are the point of this exercise. If the styles of your two collectors are sufficiently different so as to suggest that both partners didn't collaborate on both collectors, we reserve the right to deduct substantial points from each of you.

For this assignment, you will no longer be using the PLAI language levels. Instead, set the language level to (module ...).

Download the support code (as tar.gz or zip) for the garbage collector. This includes several files:

memory-core.ss

You will need to require this file into your allocator. It defines the following abstractions:

allocator.ss

The support code provides a trivial (sample) allocator that does not perform garbage collection (i.e., it just gives up after running out of memory). You will need to provide your own allocator modules—one for each collection algorithm—that implement the same interface. The interface is as follows:

gc:set-heap-size! : num -> () This procedure should set the size of the heap to the given number. This is the first thing any mutator must do, so no allocation should be possible until this call is made.
gc:alloc-flat : num U sym U bool U empty -> loc This procedure should allocate a flat Scheme value (number, symbol, boolean, or empty list) on the heap, returning its location (a number). The value should occupy a single heap cell, though you may use additional space to store a tag, etc. You are also welcome to pre-allocate common constants (e.g., the empty list). This procedure may need to perform a garbage-collection. If there is still insufficient space, it should raise an error.
gc:number? : loc -> boolean This procedure should return true iff the content of the given location is a number.
gc:symbol? : loc -> boolean This procedure should return true iff the content of the given location is a symbol.
gc:boolean? : loc -> boolean This procedure should return true iff the content of the given location is a boolean.
gc:empty? : loc -> boolean This procedure should return true iff the content of the given location is an empty list.
gc:deref : loc -> num U sym U bool U empty Given the location of a flat Scheme value, this procedure should return that value; if the location points instead to a cons cell, this function should raise an error. Note that this function may get called with the boolean value returned by a recognizer, instead of a location. Just return the same boolean value in that case.
gc:cons : loc loc -> loc This procedure should allocate a cons cell on the heap, whose first and rest are the given arguments. These fields must be stored in separate heap cells, and you may use additional space for a tag, etc. As with alloc-flat, this procedure may need to perform a garbage-collection, and if there is still insufficient space it should raise an error.
gc:cons? : loc -> boolean This should return true iff the location refers to a cons cell
gc:first : loc -> loc If the given loc refers to a cons cell, this should return the first field. Otherwise, it should raise an error.
gc:rest : loc -> loc If the given loc refers to a cons cell, this should return the rest field. Otherwise, it should raise an error.
gc:loc->scheme-val : loc -> any For debugging: this procedure consumes a location and returns a Scheme value corresponding to the contents of the location.

make-mutator.ss

This module defines a language in which you can implement test applications (mutators) for your collectors. A program that uses this language module will expose its roots through get-root-set, allocate its data on your allocator's heap, and use the versions of primitives defined in your allocator. (When the mutator refers to procedures like cons or symbol?, the calls are automatically redirected to the gc:cons and gc:symbol? defined in your allocator.)

sample-mutator.ss

This serves as an example of a mutator. It takes the form of a module defined in the make-mutator.ss language. It declares a heap size of 100 cells, then defines and applies some simple functions, printing out the results. You will want to test against this module, probably with smaller heap sizes, and also to write your own mutators. When you write your own mutator, make sure the first thing you do is to set the heap size by calling (heap-size size).

How to Use The Files

Allocator.ss and sample-mutator.ss are examples of the files you are expected to submit for this assignment. The mutator file requires (loads) the allocator file (thus letting you control which collector gets used for a set of tests). The others are support files that the other two reference as shown in the examples. You do not need to understand how the support files work to do this assignment.


Notes

During collection, you will need to use get-root-set to obtain the current set of roots. In gc:alloc-flat, you should not list any additional identifiers. However, in gc:cons, you must provide the arguments (the first and rest fields), since these are roots. (If you don't list them, you will need to treat them specially and be sure to update them with set!).

In the mark & sweep collector, you should maintain a free list in the heap. That is, you should not use any auxiliary data structure; instead, use the available space in the heap to keep track of the free list. You may use one extra box (“register”) to point to the start of the free list. You may need to adjust your allocation accordingly to have enough space to maintain free list pointers!

You will probably want to use numbers to represent locations, but you may use another representation if it eases your implementation. Also, to help see what is happening, you might want to print a representation of the heap before and after every garbage collection. You can use the Scheme printf function:

(printf "Heap is ~a~n" Heap)

Some final words of advice:


What to Turn In

Submit three files: (1) mark-sweep.ss, (2) stop-copy.ss, and (3) tests.ss. The first two contain the allocator modules that implement the corresponding collection algorithms. The third contains the mutator module with your test cases for the collectors. Document each test with a comment describing what behavior the test is demonstrating. Your tests file can load either of your collectorss (doesn't matter which one at the time of submission), but we will assume that you ran the test file against each collector before submitting.

In general, you will get more partial credit for one working collector rather than two barely working collectors. If you run out of time to write both, focus on one.


FAQ

get-root-set seems to return duplicate roots. Is this a bug in the support code? No, this is the intended behavior. Roots come from walking the stack, and several frames may refer to the same variable. Moreover, different variables may refer to the same heap location. You need to update all of them in your stop & copy collector.

It would be a lot easier if I could just make all objects consume the same amount of space. Would that be acceptable? No! We've been extremely easy-going in only making you handle two sizes. To reduce it to one size would make the problem trivial and unrealistic.

Can we create two vectors for the two spaces in stop and copy?
Yes, this is fine.