Terminating Programs on Graphs

Kathi Fisler

Recall that yesterday, our graph program ran into an infinite loop. We attributed the problem to the cycle between the nodes for Boston and Worcester; specifically, we noticed the following sequence of calls:

worc.hasRoute("Manchester")

=> worc.connects.hasRouteConnects ("Manchester")

=> bost.hasRoute("Manchester")

=> bost.connects.hasRouteConnects ("Manchester")

=> worc.hasRoute("Manchester")

...

Our goal for today is to prevent the loop from happening.

Would memoization help? After all, that was all about tracking previous computations. The problem, however, is that memoization reuses previously computed answers. In this case, we went into a loop before we finished computing an answer, so there is nothing to memoize. We therefore need a different approach.

Effectively, we need a way to distinguish the first call to worc.hasRoute("Manchester") from the second. If the two calls didn’t always yield the same answer, we might not get stuck in the loop. Broadly speaking, there are two ways to do this:

Add an explicit additional parameter to hasRoute that has different values on the two calls for worc.hasRoute("Manchester"). This would eliminate the repetition of identical expressions in the calling sequence.
Add some data structure or information under the hood that changes as hasRoute runs (as we did with the memoization table). If hasRoute produces different answers on different calls with the same inputs based on this information, we would eliminate the identical results in the calling sequence.

(You might have noticed that neither of these modifications guarantees termination, as we still have to argue that there will be a finite number of method calls – we’ll return to that later in the lecture.)

Regardless of which approach we take, the additional information we maintain needs to help us determine whether we already tried to search for the target city from each node. Let’s consider maintaining this information under each high-level approach in turn.

1 Maintaining Visited Nodes through a Parameter

In this approach, we will add a paramter to hasRoute (and, by association, hasRouteConnects) to record which nodes we have already visited in our search for a route. We need to use this information to prevent a cycle in the expressions we evaluate. Let’s first demonstrate how that parameter might work on our program trace that led to the infinite loop. (To keep the code both readable and agnostic on list implementation, we’ll capture the list contents, but not their exact syntax)

worc.hasRoute("Manchester", List())

=> worc.connects.hasRouteConnects ("Manchester", List(worc))

=> bost.hasRoute("Manchester", List(worc))

=> bost.connects.hasRouteConnects ("Manchester", List(bost,worc))

=> worc.hasRoute("Manchester", List(bost,worc)) --> false

=> prov.hasRoute("Manchester", List(bost,worc))

=> ...

Now, notice that we do NOT make the same call to hasRoute more than once. The first time we search for a route from Worcester, the list is empty and we keep searching. The second time we search for a route from Worcester, the list of previously-visited nodes contains Worcester, so we return false. This lets the search proceed to check for routes from Providence.

This example sequence motivates the implementation: when does a new Node appear in the list? On the call to hasRouteConnects just after a call to hasRoute. So our implementation should extend the visited list within hasRoute:

boolean hasRoute(String tocity, LinkedList<Node> visited) {

if (this.cityname.equals(tocity))

return true;

else {

visited.add(this);

return this.hasRouteConnects(tocity, visited);

}

[Side note: if you are new to Java, you may have noticed that we sometimes put the answer part of an if/else clause in curly brackets and sometimes not. If the answer involves more than one expression, you need the curly brackets. If the answer involves only one expression, the curly brackets are optional.]

We still need to make some edit that will make this method stop processing already visited nodes. Our example sequence hints at where this should happen. Consider

=> worc.hasRoute("Manchester", List(bost,worc)) --> false

Looks like this happens inside hasRoute. That observation suggests the following code:

boolean hasRoute(String tocity, LinkedList<Node> visited) {

if (this.cityname.equals(tocity))

return true;

else if (visited.contains(this))

return false;

else {

visited.add(this);

return this.hasRouteConnects(tocity, visited);

}

We could have reversed the order of the first two checks. That makes a bit more logical sense if we are storing the visited nodes in a way that makes the visited node check highly efficient (like a hashtable).

Note that nothing in our example sequence suggested a modification to hasRouteConnects, so that remains as it was in the original implementation. Once again, this drives home the benefit to structuring your methods around the principle of "one method processes one datatype" (ie, what templates do): it isolates edits, leaving you fewer tests to run when you have to modify your code.

While the current code runs (and terminates!), we really should make three modifications to turn it into good code.

We should document the role of the new visited parameter, since it not part of the original method description.
We should document why this program will terminate, since it processes cyclic data.
We should provide a wrapper function that gives a client access to hasRoute without them having to initialize the visited parameter.

The following code shows all three modifications (explanation follows the code):

  boolean hasRoute(String tocity) {
    return this.hasRouteVisit(tocity, new LinkedList<Node>());
  }

  /**
   * INVARIANT: Node n is in visited iff previously called
   *    n.hasRouteVisit<br><br>
   *
   * TERMINATES because base case considers visited list, nodes added
   * to visited remain in visited until computation completes, and
   * there are a finite number of possible Nodes in visited.
   */
  private boolean hasRouteVisit(String tocity,
                                LinkedList<Node> visited) {
    if (this.cityname.equals(tocity))
      return true;
    else if (visited.contains(this))
      return false;
    else {
      visited.add(this);
      return this.hasRouteConnects(tocity, visited);
    }
  }

  /**
   * INVARIANT: Node n is in visited iff previously called
   *    n.hasRouteVisit
   */
  private boolean hasRouteConnects(String tocity,
                                   LinkedList<Node> visited) {
    for (Node c : this.connects) {
      if (c.hasRouteVisit(tocity, visited))
        return true;
    }
    return false;
  }

To get the wrapper, we naintain the original hasRoute name, and rename the version that takes the visited parameter to hasRouteVisit. Since we only want to allow another class to start from hasRoute, we make the other two methods private.
Technically, Java would allow us to reuse the hasRoute name for both versions (Java allows multiple implementations of the same method name, as long as they have different contracts). This works, but it makes the code harder to maintain, as someone could mistakenly call the wrong version when updating the code. Using a different name is just better software engineering, especially since we intend to make one of those versions private. Only overload names if you are trying to expose two different versions to a user of your class.
We add Javadoc comments to each of the private methods explaining the relationship between the visited list and the computation so far. This comment is labeled an invariant; an invariant on the arguments to a method relates the data in the arguments to the computation.
We add Javadoc comments to hasRouteVisit arguing why this method is guaranteed to terminate.

2 Maintaining Visited Nodes through an External List

An alternative to the preceding code would maintain the visited list as an external "global" list, rather than passing it as a parameter. We’ll return to this idea when we continue our discussion of mutation next week.

3 Maintaining Visited Nodes Editing the Nodes

Our current approach suffers the problem of requiring both additional time (to search for visited nodes in a data structure) and space (to store the data structure). We could get more efficient lookup by using something like a hashtable, but that takes more space than a list. How might you maintain visited node information if you are on tight budgets for time AND space?

One common proposal here is to add a field to each node that stores whether or not the node has been visited:

  class Node {
    private String cityname;
    private LinkedList<Node> connects;
    private boolean visited;

    Node(String cityname) {
      this.cityname = cityname ;
      this.connects = new LinkedList<Node>();
      this.visited = false;
    }
  }

Where our previous version checked and updated the visited list, this version would check and update the visited flat on each node:

/**

* INVARIANT: Node n is marked visited iff previously called

* n.hasRouteVisit<br><br>

* TERMINATES because base case considers visited flag, nodes

* marked as visited remain visited until computation completes,

* and there are a finite number of possible Nodes to visit.

boolean hasRouteVisit(String tocity) {

if (this.cityname.equals(tocity))

return true;

else if (this.visited)

return false;

else {

this.visited = true;

return this.hasRouteConnects(tocity);

}

(Note that hasRouteConnects remains unchanged again. One datatype per method. Just do it.)

If you use this method, you also need to reset all the visited flags to false before calling hasRouteVisit from hasRoute. We leave that as an exercise to the reader.

4 Summary

Here are the key takeaways from this lecture:

To prevent cycles arising from performing the same computation multiple times, we have to add additional information that distinguishes the computation.
Provide a wrapper that initializes your additional information, so that a user of your code doesn’t have to know about it.
Document the property (invariant) of the additional data that should hold as the computation progresses. This is a reminder to yourself (or whoever modifies your code) that your code depends on this property for its correct execution. It also helps you think out where and how to maintain the data in the first place.
Document a careful argument as to why your computation will terminate. The argument should reference the new information and its properties that will prevent an infinite sequence of expressions from being generated.
Maintaining the coding principle that each method processes one datatype makes it easier to modify code that performs complex traversals.

1	Maintaining Visited Nodes through a Parameter
2	Maintaining Visited Nodes through an External List
3	Maintaining Visited Nodes Editing the Nodes
4	Summary