Terminating Programs on Graphs
Recall that yesterday, our graph program ran into an infinite loop. We attributed the problem to the cycle between the nodes for Boston and Worcester; specifically, the problem traces to the following sequence of calls to hasRoute that occurs when we try to find a route from manc to prov:
G.hasRoute(manc,prov) |
=> manc.hasRoute(prov) |
=> bost.hasRoute(prov) |
=> worc.hasRoute(prov) |
=> bost.hasRoute(prov) |
=> worc.hasRoute(prov) |
... |
Looking at this sequence of calls, we have a sense of what should happen: the second time we call bost.hasRoute(prov), hasRoute should return false, since we know we have already tried to find a route using Boston. However, functions by definition should return the same outputs on the same inputs. So asking two identical calls to the same method to yield different results seems problematic.
One way to fix this problem is to add a parameter to hasRoute that would distinguish the two calls. In this case, we want to maintain a list of those nodes we have already tried searching from. If we ever try to search from a node we already tried, hasRoute should return false. In effect, we are aiming for the following sequence:
G.hasRoute(manc,prov) |
=> manc.hasRoute(prov, List()) |
=> bost.hasRoute(prov, List(manc)) |
=> worc.hasRoute(prov, List(manc,bost)) |
=> bost.hasRoute(prov, List(manc,bost,worc)) :: returns false |
=> prov.hasRoute(prov, List(manc,bost,worc)) :: continuing for loop |
=> returns true |
Here, we are using List(...) as a shorthand for a Java list containing the listed nodes (to keep this explanation readable): Now, notice that we do NOT make the same call to hasRoute more than once because the list argument is different each time.
This example sequence motivates the implementation. hasRoute should now check whether we have already tried to find a route from the current node: if so, return false. Otherwise, save the current node in the list of nodes we have tried before we follow the edges out of the current node. The code in the Node class is as follows:
// determine if exists a route from this Node to the given node |
boolean hasRoute(Node to, LinkedList<Node> visited) { |
if (this.equals(to)) |
return true; |
else if (visited.contains(this)) |
return false; |
else { |
visited.add(this); |
for (Node c : this.getsTo) { |
if (c.hasRoute(to,visited)) { |
return true; |
} |
} |
return false; |
} |
} |
1 Understanding How the Code Works
To understand how hasRoute works, it helps a lot to draw out the sequence of calls that gets made while searching for a route. In particular, you should remember that the recursive calls to hasRoute lie within a for-loop. When hasRoute returns false (based on having tried a node before, for example), it is only returning false for the most recent call to hasRoute, not the original call. If the for loop still had other edges to try, it will still try them after one try returns false.
2 Does this Version Terminate?
Earlier, we argued that we could avoid an infinite loop by not making the same method call multiple times. How do you argue that you won’t make the same call again?
Looking at our example trace of method calls with the List arguments above gives a hint: note that the list seems to grow bigger on every subsequent method call while we are in a cycle. This means that no two calls can ever have the same list input, so we cannot call the same list twice. Furthermore, since there are only a finite number of nodes that could be in the list, the code is guaranteed to reach a point in which the current node is in visited if there is a cycle in the graph.
This is a careful, but still somewhat informal argument. How do we document it more rigorously?
2.1 Documenting Termination Arguments
A termination argument needs to explain why a method will eventually return an answer (note we haven’t claimed that it will return the right answer – that’s a separate argument). What makes a method return?
A method returns when its execution stops generating new method calls
Looking at the code, we can easily identify conditions under which a method makes no new method calls. hasRoute stops making additional calls when the this and to nodes are the same, or when this is in the visited list. To argue termination, we have to argue that any call to hasRoute from the Graph class initially satisfies one of these two conditions.
We cannot guarantee that the this and to nodes will eventually be the same (because there may be no route between them). We therefore must focus on termination through the visited list.
The essential features of the visited list are (1) each time hasRoute starts a search from a new node, that node gets added to visited and (2) hasRoute does not lose elements as the computation progresses. Since there are only a finite number of nodes in the graph, the computation either stops trying new nodes or tries one in visited, either of which terminates the computation.
There are two components to this description: information about the contents of the visited list over the computation, and an argument about why that information guarantees termination. The description of visited is called its invariant. We document the invariant separately, then use it to argue termination:
/** |
* Determine whether exists a route from this Node to the given node |
* |
* INVARIANT: Node n is in visited iff n.hasRoute has been called |
* since the most recent call to hasRoute on the overall graph. |
* |
* TERMINATES because code checks visited list before recurring, |
* visited list grows every time hasRoute is called from a new node, |
* the invariant guarantees that nodes added to visited remain there |
* until computation completes, and there are a finite number of |
* possible nodes on which to call hasRoute. |
*/ |
boolean hasRoute(Node to, LinkedList<Node> visited) { |
if (this.equals(to)) |
return true; |
else if (visited.contains(this)) |
return false; |
else { |
visited.add(this); |
for (Node c : this.getsTo) { |
if (c.hasRoute(to,visited)) { |
return true; |
} |
} |
return false; |
} |
} |
We saw the term invariant earlier in the course. We use it in a different context here, but the meaning is the same: an invariant is a statement about data. Previously, we used an invariant to relate one piece of a data structure to other pieces. Here, we use an invariant to relate data to the progress of a computation. Either way, the invariant describes some property of data that is relevant to computing with it.
3 Maintaining Visited Information By Editing the Nodes
Our current approach suffers the problem of requiring both additional time (to search for visited nodes in a data structure) and space (to store the data structure). We could get more efficient lookup by using something like a hashtable, but that takes more space than a list. How might you maintain visited node information if you are on tight budgets for time AND space?
One common proposal here is to add a field to each node that stores whether or not the node has been visited:
class Node { |
private String cityname; |
private LinkedList<Node> getsTo; |
private boolean checked; |
|
Node(String cityname) { |
this.cityname = cityname ; |
this.connects = new LinkedList<Node>(); |
this.checked = false; |
} |
} |
Where our previous version checked and updated the visited list, this version would check and update the checked flat on each node:
/** |
* Determine whether exists a route from this Node to the given node |
* |
* INVARIANT: Node n is marked as checked iff n.hasRoute has been called |
* since the most recent call to hasRoute on the overall graph. |
* |
* TERMINATES because code checks checked flag before recurring, |
* a new node is marked as checked every time hasRoute is called |
* from a new node, the invariant guarantees that nodes remain |
* checked until computation completes, and there are a finite |
* number of possible nodes on which to call hasRoute. |
*/ |
boolean hasRoute(Node to) { |
if (this.equals(to)) |
return true; |
else if (this.checked) |
return false; |
else { |
this.checked=true; |
for (Node c : this.getsTo) { |
if (c.hasRoute(to)) { |
return true; |
} |
} |
return false; |
} |
} |
If you use this method, you also need to reset all the checked flags to false between each call to hasRoute in the Graph class. We leave that as an exercise to the reader.
Note that here, we got rid of the visited list. Last class, we argued that we needed that to make different method calls return different answers. Aren’t we back to assuming that the same method call must return different answers? Tune in later this week ...
4 Summary
Here are the key takeaways from this lecture:
To prevent cycles arising from performing the same computation multiple times, we have to add additional information that distinguishes the computation.
Document the property (invariant) of the additional data that should hold as the computation progresses. This is a reminder to yourself (or whoever modifies your code) that your code depends on this property for its correct execution. It also helps you think out where and how to maintain the data in the first place.
Document a careful argument as to why your computation will terminate. The argument should reference the new information and its properties that will prevent an infinite sequence of expressions from being generated.