Memoization
In the previous lecture, we showed how to create infinite trees through promises and force. While laziness saves us from creating nodes before we need them, our particular implementation wasn’t very smart because it could create the same node twice. Consider the node with the int 1 as its data: that node is the right child of the 0-node, but also the left child of the 2-node. As we keep expanding the tree, we will often create copies of nodes that already exist. This lecture addresses that problem.
1 Saving Previously-Computed Nodes
One way to keep from creating the same node multiple times is to maintain a data structure in which we can look up existing nodes. For example, imagine that we had a data structure that mapped each int to the node for that int (if such a node exists). For now, let’s call that data structure a "node map". The following generic interface would support node maps, where the "key" is the int (data), and the "value" is the node that we want to associate with that key.
interface NodeMap<KEY,VALUE> { |
VALUE get(KEY forKey); |
void put(KEY forKey, VALUE newval); |
boolean containsKey(KEY forKey); |
} |
|
Axioms: NodeMap.put(k,v).containsKey(k) = true |
NodeMap.put(k,v).get(k) = v |
NodeMap.put(k,v1).put(k,v2).get(k) = v2 |
Given a NodeMap, we could store the node corresponding to each int as we create it, and check to see whether a node exists in the map before creating a new one. The only place we create new nodes in our current code is inside of force. The following code augments force with the checks and stores against NodeMaps; it assumes that we have a NodeMap named memtab (we’ll explain the name momentarily):
// force: if we have created this node before, return |
// the previously-created node. Otherwise, create a new node. |
public INode force() { |
if (memtab.containsKey(this.data)) |
return memtab.get(this.data); |
else { |
// create node, add to memtab, then return |
Node newnode = |
new Node(this.data, |
new NodePromise(this.data - 1, this.memtab), |
new NodePromise(this.data + 1, this.memtab)); |
memtab.put(this.data, newnode); |
return newnode; |
} |
} |
This technique of storing and looking up previously-computed values is called memoization. It is an important optimization in many CS algorithms that generate the same data over and over. Memoization is an operation on functions: we say that a function is memoized if we have a table that stores the output produced for each input previously given to the function. This concept only makes sense if the function is guaranteed to return the same answer every time on the same inputs. Functions that mutate data interfere with memoization, as we shall see in more detail next week. The name memtab is short for "memoization table".
2 Implementing NodeMaps via Hashtables
Now, we need actual data structures to implement node maps. Lists of pairs would be one way to do this, for example:
List ((1, node-for-1), (-1, node-for-neg1), ...) |
To achieve this, we will use a common data structure called a hashtable (or hashmap). A hashtable maps keys to values without traversing a data structure over all the keys. The diagram at the top right of the Wikipedia entry on hashtables illustrates the concept nicely. Every hashtable has a fixed number of "buckets" to which it maps keys. If you have more buckets than actual keys, hashtables provide a perfect match between keys and data. If you have more keys than buckets, keys sometimes collide, costing the accuracy of the data retrieved from the table. We won’t talk about hashtables in depth here, as you’ll see them in detail in Algorithms.
Those of you who have seen arrays prior to this class might be wondering why we aren’t using those here. Two reasons: first, our keys include negative integers, which arrays don’t natively support. More importantly, however, we shouldn’t assume that our keys will always be integers. In yesterday’s motivating example, we made nodes for geometric coordinates. In other applications, keys can be complex values. Hashtables provide a level of indirection, in which keys are converted to positive integers, then used to access memory directly (as an array might).
Hashtables are built into Java. To use them, include the line
import java.util.Hashtable; |
class NodePromise extends AbsNode { |
Hashtable<Integer,Node> memtab; |
... |
} |
|
class GameTree { |
INode root; |
private Hashtable<Integer,Node> memtab = new Hashtable<Integer,Node>(); |
|
GameTree() { |
this.root = new NodePromise(0, memtab); |
} |
} |
3 Testing
How would you check whether your memoizer is working? One quick check would be that two paths to the same logical node are getting the same object in memory. Here’s a test sequence that demonstrates that:
> GameTree g = new GameTree(); |
> ((NodePromise)g.root).force() |
Node@1bde3d2 |
> ((NodePromise)g.root.getLeft().getRight()).force() |
Node@1bde3d2 |
> ((NodePromise)g.root.getLeft().getRight().getLeft()).force() |
Node@1562c67 |
4 From Trees to Graphs
The test sequence above shows that our game tree is no longer a tree. By sharing references to nodes, we now have a data structure with cycles among the nodes. Such a data structure is called a graph. There’s a lot to say about graphs, and we will focus on them over the coming week.
This raises an important point about memoization though: memoization turns tree-shaped computations into graph-shaped computations, but it is useful for computations over data other than trees. For example, consider the following definition of the Fibonacci sequence of numbers:
fib(0) = 0 |
fib(1) = 1 |
fib(n) = fib(n-1) + fib(n-2) when n > 1 |
fib(4) |
/ \ |
fib(3) fib(2) |
/ \ / \ |
fib(2) fib(1) fib(1) fib(0) |
/ \ |
fib(1) fib(0) |
fib(4) |
/ | |
fib(3) | |
/ | | |
fib(2)----------/ |
/ \ | |
fib(1)--- \---/ |
\ |
fib(0) |
5 Statics
This is a good example on which to raise another Java-ism that you should know about: statics. In this example, we wanted all NodePromise objects to share the same memtab hashtable. By default, whenever you make an object for a class, Java creates fresh local versions of each field in that class. As a result, to share the same hashtable across all objects, we created the hashtable in the GameTree class and passed it around to all the constructors. This clutters up the code.
Wanting to share a single field value across all objects of the same class is a common pattern in OOP. To support this, Java provides a keyword called static. If you annotate a field with static, Java will share a single object for that field across all objects of that class. Here’s how we would use statics with NodePromise:
class NodePromise extends AbsNode { |
static Hashtable<Integer,Node> memtab = |
new Hashtable<Integer,Node>(); |
|
// note the constructor no longer takes memtab as input |
NodePromise(int data) { |
super(data); |
} |
... |
} |
The static modifier can be used on methods as well as fields. When you learned to create Main classes in lab two weeks ago, you saw static used in the declaration of Main. This guarantees that there is unique main method for the entire application.
6 Summary
The main points of this lecture are:
Memoization as a technique to reuse the results of previously-performed computations.
Hashtables as an efficient data structure for mapping keys to values.
Reusing nodes can turn data from a tree into something new, a graph.
Statics as a way to share a single instance of a data structure across all objects in the same class.