Introduction to Graphs

Kathi Fisler

We want to write programs that process travel options between cities (say for airline connections, tracking distances, etc). Our information consists of names of several cities and which other cities can be reached directly from them. For example,

There is a route from Worcester to Boston
There is a route from Boston to Worcester
There is a route from Boston to Providence
There is a route from Providence to Hartford
There is a route from Manchester to Boston

Data of this shape, in which we have a set of data (cities) and some connections between individual data (the direct route) is called a graph. The individual pieces of data are called nodes. The connections between individual data are called edges. Graphs may be cyclic, meaning that one can follow the edges from some node back to itself. In this case, we have a cycle between the nodes for Worcester and Boston.

Two of the common computations over graphs include:

Is there a route in the graph from one place to another?
What are all of the places one can get to from a given starting point?

In this next series of lectures, we will look at data structures and functions for performing these computations.

1 Data Structures for Graphs

What would be a reasonable organization and data structure for our cities graph, in light of the two computations we want to support? Setting aside the computations, there are three reasonable organization options:

a set of edges, where an edge contains two strings
a set of cities, where a city contains its name and a set of names of cities to which there are direct routes
a set of cities, where a city contains its name and a set of other cities to which there are direct routes

Which option makes the most sense relative to our computations about routes? To write these, we will need to chain together direct routes to make multi-segment ones. With options 1 or 2, each time we pick a direct route, we’ll need to search all of the edges or cities to find the struct/class corresponding to the next city. Option 3 saves this lookup, so it makes more sense for this problem.

1.1 Classes for Graphs and Nodes

Let’s develop option 3 in Java. We will need classes for graphs and nodes, as well as methods for adding nodes and edges to graphs. Let’s start with the Node class, and its usual constructor:

  class Node {
    private String cityname;          // name of city at this node
    private LinkedList<Node> getsTo;  // edges from this Node

    Node(String cityname, LinkedList<Node> getsTo) {
      this.cityname = cityname;
      this.getsTo = getsTo;
    }
  }

Next, we need a class and interface for graphs.

  interface IGraph {}

  class Graph implements IGraph {
    private LinkedList<Node> nodes;  // all the nodes in the graph

    Graph() {
      this.nodes = new LinkedList<Node>();
    }
  }

Why do we bother making an interface for the graph? Encapsulation. We may want to switch to a different internal data structure later, without affecting other programs built on graphs.

1.2 Building Cyclic Data

Now we want to use these classes to build our example graph over cities (the one described at the start of these notes).

Let’s build our example set of cities and connections using these classes. Observe that our graph needs nodes for Boston and Worcester, with an edge going from each to the other:

  Node Worc = new Node("Worcester", new LinkedList<Node>());
  Node Bost = new Node("Boston", (new LinkedList<Node>()).add(Worc));
  ...

Oops. Worcester needs a connection to Boston, which this code doesn’t reflect. We couldn’t add Boston as a connection when we made the node for Worcester, because the Boston node didn’t yet exist. Creating the Boston node before the Worcester node would not have helped either, as then Boston would not have its connection to Worcester (as it does in the example above).

In order to make the node for Worcester refer to the node for Boston, we have to be able to change the contents of the Worcester node after the Boston node is created (or vice-versa). This is outside of the style of programming we have used so far, in which we build all the data at once, creating new data instead of modifying existing data. If you want to create data with cycles (mutual references), you have no choice but to build it in stages and change initial data values. Changing values within objects is called mutation.

To finish our example, we could change the contents of the Worcester node to connect to the Boston node as follows:

  Node Worc = new Node("Worcester", new LinkedList<Node>());
  Node Bost = new Node("Boston", (new LinkedList<Node>()).add(Worc));
  Worc.getsTo.add(Bost);

Right idea, but this code won’t compile. This code needs getsTo to be public. Since getsTo is private, we need to add methods to Node for adding edges. While we are at it, we should add methods to our overall Graph class for adding nodes and edges as well. The final data structures are as follows:

  class Node {
    private String cityname;          // name of city at this node
    private LinkedList<Node> getsTo;  // edges from this Node

    // constructor only takes the cityname as an argument,
    //   initializing the getsTo list internally
    Node(String cityname) {
      this.cityname = cityname;
      this.getsTo = new LinkedList<Node>();
    }

    // adds an edge from this node to the given toNode
    public void addEdge(Node toNode) {
      this.getsTo.add(toNode);
    }
  }

  interface IGraph {
    // add a new node with the given string as the cityname
    Node newNode(String cityname);

    // add a directed edge from the "from" Node to the "to" Node
    void addDirectedEdge(Node from, Node to);
  }

  class Graph implements IGraph {
    private LinkedList<Node> nodes;  // all the nodes in the graph

    Graph() {
      this.nodes = new LinkedList<Node>();
    }

    // adds a new node to the graph with given string as cityname
    public Node newNode(String cityname) {
      Node newN = new Node(cityname);
      this.nodes.add(newN);
      return newN;
    }

    // adds a directed edge from the "from" node to the "to" node
    public void addDirectedEdge(Node from, Node to) {
      from.addEdge(to);
    }
  }

With these new classes in place, let’s build the objects for our example graph. Rather than build some nodes fully and some partially when we create the nodes, let’s create each node with an empty list of edges, then separately connect them. We might like to do this as follows:

  class Examples {
    Graph G = new Graph();
    Node bost = this.G.newNode("Boston");
    Node worc = this.G.newNode("Worcester");

    G.addDirectedEdge(bost,worc);
    G.addDirectedEdge(bost,prov);


    // constructor
    Examples(){}
  }

This code as written won’t work, because Java allows only field definitions, field initializations, and method definitions to lie outside of methods: the calls to G.addDirectedEdge won’t compile. Instead, we will create a method (initGraph) to build our graph.

  class Examples {
    Graph G = new Graph();
    Node bost, worc, hart, prov, manc;

    void initGraph() {
      bost = this.G.newNode("Boston");
      worc = this.G.newNode("Worcester");
      hart = this.G.newNode("Hartford");
      prov = this.G.newNode("Providence");
      manc = this.G.newNode("Manchester");

      G.addDirectedEdge(bost,worc);
      G.addDirectedEdge(bost,prov);
      G.addDirectedEdge(worc,bost);
      G.addDirectedEdge(prov,hart);
      G.addDirectedEdge(manc,bost);
    }
  }

Where should we call initGraph though? One idea is to call it in each test case, since the test cases will expect that the graph has been set up. But that will add all of the edges on every test case (so the graph has two copies of the edges after the second test, three copies after the third, and so on). Instead, we initialize the graph in the constructor for the Examples class, as follows:

  class Examples {
    ...
    Examples() {
      this.initGraph();
    }
  }

2 Checking for Routes

Now we turn to our desired computations on graphs. We start by developing a method hasRoute on Graph that takes two nodes and determines whether the graph has a route from the first node to the second.

2.1 Test Cases for hasRoute

For example, our current graph includes routes from Boston to Worcester, Boston to Hartford (via Providence), and Mancester to Hartford (via Boston and Providence), but does not have routes from Hartford to Providence or Providence to Boston. These suggest several test cases:

  boolean testbw (Tester t) {
    return t.checkExpect(G.hasRoute(bost,worc), true);
  }

  boolean testhp (Tester t) {
    return t.checkExpect(G.hasRoute(hart,prov), false);
  }

  boolean testpb (Tester t) {
    return t.checkExpect(G.hasRoute(prov,bost), false);
  }

  boolean testbh (Tester t) {
    return t.checkExpect(G.hasRoute(bost,hart), true);
  }

We should also consider the seemingly trivial case: what if we ask for a route from a city to itself? This case should also return true, as otherwise it suggests that you can’t get to where you are from where you are. We should therefore include a test case like:

  boolean testbb (Tester t) {
    return t.checkExpect(G.hasRoute(bost,bost), true);
  }

2.2 Implementing hasRoute

With the tests in hand, let’s write the code. Intuitively, given the from and to nodes, our code should check whether the from and to nodes are the same. If yes, we are done. If not, we have to look for the to node in the getsTo nodes of the from node. This suggests code such as:

  boolean hasRoute(Node from, Node to) {
    if (from==to)
      return true;
    else {
      for (Node n : from.getsTo) {
        if (hasRoute(n,to))
          return true
      }
      return false
    }
  }

Stop and think: in which class does this code belong?

Think about encapsulation. Where is the data needed for this computation? This computation relies on getsTo, which is in the Node class. The very idea of what it means for two nodes to be the same is also a concept on the Node class. This needs to be a method in Node, with a version in the Graph class that calls the method in the Node class.

  class Graph implements IGraph {
   ...
    // determine whether graph contains a route from
    //    "from" node to "to" node
    public boolean hasRoute(Node from, Node to) {
      return from.hasRoute(to);
    }
  }

  class Node {
    // determines whether there is a route from
    //   this Node to the given node
    boolean hasRoute(Node to) {
      if (this.equals(to))
        return true;
      else {
        for (Node c : this.getsTo) {
          if (c.hasRoute(to)) {
            return true;
          }
        }
        return false;
      }
    }
  }

2.3 Testing hasRoute

Let’s compile and run our tests. If you are in DrJava, you can interact with the tests as follows:

  > Examples e = new Examples();
  > e.G.hasRoute(e.bost,e.bost)
  true
  > e.G.hasRoute(e.bost,e.worc)
  true
  > e.G.hasRoute(e.bost,e.prov)
  ...

Oops – something goes wrong on the third example. We seem to have an infinite loop somewhere. Consider the sequence of computations if we try to compute a route from Boston to Providence:

     bost.hasRoute(prov)
  => worc.hasRoute(prov) // because edge from bost to worc
  => bost.hasRoute(prov) // because edge from worc to bost
  ...

By definition, functions by definition yield the same answer every time they are called on the same inputs. Once we hit the second use of bost.hasRoute(prov), we will attempt to compute the same answer, which tries to compute this route a third time, and so on.

But wait: we’ve never had this termination problem before. In the past, we’ve argued that following templates makes sure that you visit all the parts of a data structure and that you terminate. What changed? Mutation is what changed. If you create data only following data definitions (ie, no mutation), you cannot create cyclic data, so following the template can’t yield two identical calls to the same template function. Hence, functions terminate. Once you introduce mutation, however, we may end up calling template functions on the same data multiple times, and termination becomes an issue.

In general, functions that do not follow templates or that involve cycle data require documentation arguing why they will terminate. Part of this argument involves adding data or code to guarantee termination. In the coming days, we’ll look at both how to achieve termination, and how to document it.

3 Summary

What have we seen so far today?

To create data with cycles, we need to initially create some datum without references, then modify the references later to get the cycles.
Graphs are a data structure formed of nodes and edges. There are many ways to implement graphs.
Once data is cyclic, programs might not terminate.