1 Saving Previously-Computed Nodes
2 Implementing Node Maps via Hashtables
3 Testing
4 From Trees to Graphs
5 Statics
6 Summary

Memoization

Kathi Fisler

In the previous lecture, we showed how to create infinite trees through promises and force. While laziness saves us from creating nodes before we need them, our particular implementation wasn’t very smart because it could create the same node twice. Consider the node with the int 1 as its data: that node is the right child of the 0-node, but also the left child of the 2-node. As we keep expanding the tree, we will often create copies of nodes that already exist. This lecture addresses that problem.

1 Saving Previously-Computed Nodes

One way to keep from creating the same node multiple times is to maintain a data structure in which we can look up existing nodes. For example, imagine that we had a data structure that mapped each int to the node for that int (if such a node exists). For now, let’s call that data structure a "node map". The following generic interface would support node maps, where the "key" is the int (data), and the "value" is the node that we want to associate with that key.

  interface NodeMap<KEY,VALUE> {

    VALUE get(KEY forKey);

    void put(KEY forKey, VALUE newval);

    boolean containsKey(KEY forKey);

  }

  

  Axioms: NodeMap.put(k,v).containsKey(k) = true

          NodeMap.put(k,v).get(k) = v

          NodeMap.put(k,v1).put(k,v2).get(k) = v2

(The axioms don’t work directly as test cases, since put returns void rather than a NodeMap, but you could rewrite these into test cases with the same spirit without much effort.)

Given a NodeMap, we could store the node corresponding to each int as we create it, and check to see whether a node exists in the map before creating a new one. The only place we create new nodes in our current code is inside of force. The following code augments force with the checks and stores against NodeMaps; it assumes that we have a NodeMap named memtab (we’ll explain the name momentarily):

  // force: if we have created this node before, return

  // the previously-created node.  Otherwise, create a new node.

  public INode force() {

    if (memtab.containsKey(this.data))

      return memtab.get(this.data);

    else {

      // create node, add to memtab, then return

      Node newnode =

        new Node(this.data,

                 new NodePromise(this.data - 1, this.memtab),

                 new NodePromise(this.data + 1, this.memtab));

      memtab.put(this.data, newnode);

      return newnode;

    }

  }

This technique of storing and looking up previously-computed values is called memoization. It is an important optimization in many CS algorithms that generate the same data over and over. Memoization is an operation on functions: we say that a function is memoized if we have a table that stores the output produced for each input previously given to the function. This concept only makes sense if the function is guaranteed to return the same answer every time on the same inputs. Functions that mutate data interfere with memoization, as we shall see in more detail next week. The name memtab is short for "memoization table".

2 Implementing NodeMaps via Hashtables

Now, we need actual data structures to implement node maps. Lists of pairs would be one way to do this, for example:

  List ((1, node-for-1), (-1, node-for-neg1), ...)

(with an class that creates a pair of an int and an INode). You could provide such an implementation against the NodeMap interface fairly easily based on the functions you’ve written earlier this term. Lists aren’t a great choice for this application, however, because we expect to do a lot of lookups (every time a node is forced). Traversing the entire list every time is too expensive. Ideally, we want a way to get directly to the node for each key without traversing the whole data structure.

To achieve this, we will use a common data structure called a hashtable (or hashmap). A hashtable maps keys to values without traversing a data structure over all the keys. The diagram at the top right of the Wikipedia entry on hashtables illustrates the concept nicely. Every hashtable has a fixed number of "buckets" to which it maps keys. If you have more buckets than actual keys, hashtables provide a perfect match between keys and data. If you have more keys than buckets, keys sometimes collide, costing the accuracy of the data retrieved from the table. We won’t talk about hashtables in depth here, as you’ll see them in detail in Algorithms.

Those of you who have seen arrays prior to this class might be wondering why we aren’t using those here. Two reasons: first, our keys include negative integers, which arrays don’t natively support. More importantly, however, we shouldn’t assume that our keys will always be integers. In yesterday’s motivating example, we made nodes for geometric coordinates. In other applications, keys can be complex values. Hashtables provide a level of indirection, in which keys are converted to positive integers, then used to access memory directly (as an array might).

Hashtables are built into Java. To use them, include the line

  import java.util.Hashtable;

at the top of your file. The operations listed in the NodeMap interface exist (with the given names and types) in the HashTable interface. Using hashtables, we define the memtab variable as follows:

  class NodePromise extends AbsNode {

    Hashtable<Integer,Node> memtab;

    ...

  }

For the memoization table to be useful, every call to force in every NodePromise class must use the same table. One way to achieve this is to create a single hashtable in the GameTree class, and pass it to every NodePromise constructor (if you hadn’t introduced a GameTree wrapper class before now, this could be a motivation to do so):

  

  class GameTree {

    INode root;

    private Hashtable<Integer,Node> memtab = new Hashtable<Integer,Node>();

  

    GameTree() {

      this.root = new NodePromise(0, memtab);

    }

  }

(where the NodePromise constructor sets its local memtab variable to the passed-in argument.)

3 Testing

How would you check whether your memoizer is working? One quick check would be that two paths to the same logical node are getting the same object in memory. Here’s a test sequence that demonstrates that:

> GameTree g = new GameTree();

> ((NodePromise)g.root).force()

Node@1bde3d2

> ((NodePromise)g.root.getLeft().getRight()).force()

Node@1bde3d2

> ((NodePromise)g.root.getLeft().getRight().getLeft()).force()

Node@1562c67

The number after each @ sign is the memory address at which the object (of the type given before the @) resides. These examples show two paths to the same int referencing the same memory address, while the node for a different int references a different address.

4 From Trees to Graphs

The test sequence above shows that our game tree is no longer a tree. By sharing references to nodes, we now have a data structure with cycles among the nodes. Such a data structure is called a graph. There’s a lot to say about graphs, and we will focus on them over the coming week.

This raises an important point about memoization though: memoization turns tree-shaped computations into graph-shaped computations, but it is useful for computations over data other than trees. For example, consider the following definition of the Fibonacci sequence of numbers:

  fib(0) = 0

  fib(1) = 1

  fib(n) = fib(n-1) + fib(n-2)   when n > 1

If we were to compute fib(4), we get the following tree of computations:

                  fib(4)

                 /      \

           fib(3)        fib(2)

          /     \        /     \

    fib(2)   fib(1)   fib(1)   fib(0)

   /     \

fib(1)  fib(0)

Notice how the computation expands fib(2) twice? This is another good use of memoization: the memoization table would store the results of calls to fib. Doing so would turn this tree-shaped computation into a graph:

                  fib(4)

                 /  |

           fib(3)   |

          /   |      |

    fib(2)----------/

   /     \    |

fib(1)--- \---/

           \

         fib(0)

The computation itself, however, is on numbers, not on trees. It is merely a coincidence that on the infinite-tree example, memoization turned tree-shaped data into graph-shaped data.

5 Statics

This is a good example on which to raise another Java-ism that you should know about: statics. In this example, we wanted all NodePromise objects to share the same memtab hashtable. By default, whenever you make an object for a class, Java creates fresh local versions of each field in that class. As a result, to share the same hashtable across all objects, we created the hashtable in the GameTree class and passed it around to all the constructors. This clutters up the code.

Wanting to share a single field value across all objects of the same class is a common pattern in OOP. To support this, Java provides a keyword called static. If you annotate a field with static, Java will share a single object for that field across all objects of that class. Here’s how we would use statics with NodePromise:

  class NodePromise extends AbsNode {

    static Hashtable<Integer,Node> memtab =

        new Hashtable<Integer,Node>();

  

    // note the constructor no longer takes memtab as input

    NodePromise(int data) {

      super(data);

    }

    ...

  }

With this version, we would no longer create memtab in the GameTree class.

The static modifier can be used on methods as well as fields. When you learned to create Main classes in lab two weeks ago, you saw static used in the declaration of Main. This guarantees that there is unique main method for the entire application.

6 Summary

The main points of this lecture are: