Introduction to Graphs

Kathi Fisler

We want to write programs that process connections between cities (say for airline connections, tracking distances, etc). Our information consists of names of several cities and which other cities have direct routes from them. For example,

There is a route from Boston to Worcester
There is a route from Boston to Providence
There is a route from Worcester to Boston
There is a route from Providence to Hartford
There is a route from Manchester to Boston

Data of this shape, in which we have a set of nodes and a set of edges that connect the nodes, is called a graph.

1 Data Structures for Graphs

What would be a reasonable data structure for our cities graph? There are three reasonable choices:

a list of edges, where an edge contains two strings
a list of cities, where a city contains its name and a list of names of cities to which there are direct routes
a list of cities, where a city contains its name and a list of other cities to which there are direct routes

We are going to write a program to find multi-segment routes between cities. Which option makes the most sense for this? To write this, we will need to chain together direct routes to make multi-segment ones. With options 1 or 2, each time we pick a direct route, we’ll need to search all of the edges or cities to find the struct/class corresponding to the next city. Option 3 saves this lookup, so it makes more sense for this problem.

Let’s develop option 3 in Java. We need a class for nodes, each of which has the name of a city and list of other nodes representing the connections or edges of the graph:

class NodeV1 {

private String cityname;

private IList<Node> connects;

NodeV1(String cityname, IList<Node> connects) {

this.cityname = cityname ;

this.connects = connects;

}

Let’s use this definition to build our example set of cities and connections:

NodeV1 H = new Node("Hartford", new MtList<Node>());

NodeV1 W = new Node("Worcester", new MtList<Node>());

NodeV1 P = new Node("Providence", new Cons<Node>(H, new MtList<Node>()));

NodeV1 B = new Node("Boston", new Cons<Node>(W, new Cons<Node>(P, new MtList<Node>())));

NodeV1 M = new Node("Providence", new Cons<Node>(B, new MtList<Node>()));

Oops. Worcester needs a connection to Boston, which these examples don’t reflect. We could not add Boston as a connection when we made the node for Worcester, because the Boston node didn’t yet exist. Creating the Boston node before the Worcester node would not have helped either, as then Boston would not have its connection to Worcester (as it does in the example above).

In order to make the node for Worcester refer to the node for Boston, we have to be able to change the contents of the Worcester node after the Boston node is created (or vice-versa). This is outside of the style of programming we have used so far, in which we return new objects instead of changing the values inside existing objects. If you want to create cyclic data, you have no choice but to change values. Changing values within objects is called mutation.

To finish our example, we could change the contents of the Worcester node to connect to the Boston node as follows:

W.connects = new Cons<Node>(B, W.connects);

Well, not quite. This works if connects is public. Since connects is private, we need both a getter and setter for the connects field in the Node class. With these, the correct code is:

class NodeV1 {

private String cityname;

private IList<Node> connects;

NodeV1(String cityname, IList<Node> connects) {

this.cityname = cityname ;

this.connects = connects;

}

IList<Node> getConnects() { return this.connects; }

void setConnects(IList<Node> newCon) {

this.connects = newCon;

}

W.setConnects(new Cons<Node>(B, W.getConnects()));

Actually, as a public interface for a Node class, we would like a cleaner interface to connects that simply takes the new node and adds a connection (edge) for it:

  In NodeV1:
    // this could alternatively return the new list of connects by
    // changing the return type to IList<Node> and the return
    // statement to this.connects
    void addEdge(Node toNode) {
      this.connects = new Cons<Node>(toNode, this.connects);
    }

  In Examples:
    W.addEdge(B);

Let us reiterate the key point demonstrated so far: in most languages, creating cyclic data requires mutation. In Java, you have no choice but to partly create one of the nodes at first, then modify its connections later if you want to build the graph.

What if we had used one of the other data structures for building lists (creating a separate list of edges that lives outside the nodes, or referencing cities by name rather than node)? In these representations, the cycles are implicit (hidden behind the names in the second option, separated from the nodes in the first). You would not need mutation to create our example graph in one of these representations.

But hang on – didn’t we turn our infinite tree into a graph on Tuesday without using mutation? True indeed. Puzzle on that one a while.

2 Switching to Java Lists

Now that we know we need mutation to manage the list of connections, we might as well use Java’s LinkedLists (which do mutation under the hood). Here’s a second version of the Node class that uses a LinkedList for connects:

class Node {

private String cityname;

private LinkedList<Node> connects;

Node(String cityname) {

this.cityname = cityname ;

this.connects = new LinkedList<Node>();

}

void addEdge(Node toNode) {

this.connects.add(toNode);

}

class Examples {

Node bost = new Node("Boston");

Node worc = new Node("Worc");

Node hart = new Node("Hartford");

Node prov = new Node("Providence");

Node manc = new Node("Manchester");

void initGraph() {

bost.addEdge(worc);

bost.addEdge(prov);

worc.addEdge(bost);

prov.addEdge(hart);

manc.addEdge(bost);

}

Since the LinkedList operators don’t return a list, there is no point trying to initialize the list contents when we create the nodes. Simply adding all the edges outside the Node constructors fits the LinkedList interface a bit better. We can’t do this directly in the Examples class, however, because Java allows only field definitions, field initializations, and method definitions to lie outside of methods. We therefore create a method (initGraph) that will finish building our graph.

Where should we call initGraph though? One idea is to call it in each test case, since the test cases will expect that the graph has been set up. But that will add all of the edges on every test case (so the graph has two copies of the edges after the second test, three copies after the third, and so on). Instead, we initialize the graph in the constructor for the Examples class, as follows:

  class Examples {
    Examples(){
      this.initGraph();
    }
    // rest of Examples class
  }

3 Checking for Routes

Now we turn to traversing graphs. We start by developing a method hasRoute on Nodes that consumes a city name and determines whether there is a route from the node to a node for the named city.

For example, our current graph includes routes from Boston to Worcester, Boston to Hartford (via Providence), and Mancester to Hartford (via Boston and Providence), but does not have routes from Hartford to Providence or Providence to Boston. These suggest several test cases:

boolean testbw (Tester t) {

return t.checkExpect(bost.hasRoute("Worcester"), true);

}

boolean testhp (Tester t) {

return t.checkExpect(hart.hasRoute("Providence"), false);

}

boolean testpb (Tester t) {

return t.checkExpect(prov.hasRoute("Boston"), false);

}

boolean testbh (Tester t) {

return t.checkExpect(bost.hasRoute("Hartford"), true);

}

We should also consider the seemingly trivial case: what if we ask for a route from a city to itself? This case should also return true, as otherwise it suggests that you can’t get to where you are from where you are. We should therefore include a test case like:

  boolean testbb (Tester t) {
    return t.checkExpect(bost.hasRoute("Boston"), true);
  }

With the tests in hand, let’s write the code. hasRoute is a method on nodes. The methods has to check whether the current node is for the city we are looking for; if so, it can return true, otherwise we need to search from the nodes to which this node connects. That gets us as far as:

  boolean hasRoute(String tocity) {
    if (this.cityname.equals(tocity))
      return true;
    else
      // search the nodes in this.connects
      return ???
  }

Since this.connects is a LinkedList, you need to know how to traverse one of Java’s built-in lists (if you look at the documentation for LinkedList, there is no rest operator). We can do this using a iteration construct called a for loop. Let’s write the loop first, then explain what it says:

  boolean hasRoute(String tocity) {
    if (this.cityname.equals(tocity))
      return true;
    else
      return this.hasRouteConnects(tocity);
  }

  boolean hasRouteConnects(String tocity) {
    for (Node c : this.connects) {
      if (c.hasRoute(tocity)) {
        return true;
      }
    }
    return false;
  }

The loop, which is inside the hasRouteConnects method, says "take each Node from this.connects in turn, call it c, then do the computation in the body of the for-loop over c." Inside the body of the loop, we return true if we find a route to the desired city from c. The return false after the for-loop picks up those cases in which none of the c values yielded true (a Java method stops executing as soon as it encounters a return statement).

Note that this pair of methods follows the template structure that we learned in 1101/1102 for processing structures containing lists: there is one method for the structure (hasRoute on Node) and one method for the list-of-structure (hasRouteConnects on this.connects).

You could have written this without the separate method for hasRouteConnects as follows:

boolean hasRoute(String tocity) {

if (this.cityname.equals(tocity))

return true;

else {

for (Node c : this.connects) {

if (c.hasRoute(tocity)) {

return true;

}

return false;

}

The advantage to the first version is that it separates processing of the list into its own method, which makes the code easier to edit if you change how you represent the connections (such as your own list implementation, a BST, or something else that would require the method to live in a different class). We strongly recommend the former code organization with the separate method for hasRouteConnects.

3.1 Termination

Let’s compile and run our tests. What happens? We seem to have an infinite loop somewhere. Consider the sequence of computations if we try to compute a route from Worcester to Manchester

     worc.hasRoute("Manchester")
  => worc.connects.hasRouteConnects ("Manchester")
  => bost.hasRoute("Manchester")
  => bost.connects.hasRouteConnects ("Manchester")
  => worc.hasRoute("Manchester")
  ...

Note that we got a cycle. In the absense of a mutation, functions by definition yield the same answer every time they are called on the same inputs. Once we hit the second use of worc.hasRoute("Manchester"), we will attempt to compute the same answer, which tries to compute this route a third time, and so on.

But wait: we’ve never had this termination problem before. In the past, we’ve argued that following templates makes sure that you visit all the parts of a data structure and that you terminate. What changed? Mutation is what changed. If you create data only following data definitions (ie, no mutation), you cannot create cyclic data, so following the template can’t yield two identical calls to the same template function. Hence, functions terminate. Once you introduce mutation, however, we may end up calling template functions on the same data multiple times, and termination becomes an issue.

In general, functions that do not follow templates or that involve cycle data require documentation arguing why they will terminate. Part of this argument involves adding data or code to guarantee termination. Next lecture, we’ll look at both how to achieve termination, and how to document it.

4 Summary

What have we seen so far today?

To create data with cycles, we need to initially create some datum without references, then modify the references later to get the cycles.
Graphs are a data structure formed of nodes and edges. There are many ways to implement graphs.
We can iterate over LinkedLists using for loops.
Once data is cyclic, programs might not terminate.

1	Data Structures for Graphs
2	Switching to Java Lists
3	Checking for Routes
4	Summary