Visitors

Kathi Fisler

In this lecture, we look at a common OO programming pattern called Visitors. Visitors are similar to passing functions as arguments, but they handle functions over class hierarchies instead of functions on single classes (as our functions-as-arguments lecture did). We’ll do this in the context of a concrete, real-world example.

1 Spreadsheet Formulas and Functions Over Them

Spreadsheets are formed of cells. Each cell can contain a concrete value or a formula that may reference other cells. References are specified by the row (labeled with numbers) and column (labeled with letters) of the desired cell. For example:

A | B | C

--------------------------------

1 | Hwk 1 | 120 |

2 | Hwk 2 | 100 |

3 | Hwk 3 | 115 |

4 | Total | =B1 + B2 + B3 |

The = in cell B4 indicates that the value of the cell is computed according to the given formula.

Let’s capture spreadsheet formulas through data structures. The following classes achieve this. The Examples class shows how to create formulas in this representation:

  interface IFormula {}

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }
  }

  class CellRef implements IFormula {
    String cellname;

    CellRef(String cellname) {
      this.cellname = cellname;
    }
  }

  class Plus implements IFormula {
    IFormula left;
    IFormula right;

    Plus(IFormula left, IFormula right) {
      this.left = left;
      this.right = right;
    }
  }

  class Examples {
    Examples(){}

    Num num2 = new Num(2);
    Num num5 = new Num(5);
    CellRef a10 = new CellRef("a10");
    Plus f1 = new Plus(num2, num5);
    Plus f2 = new Plus(a10, num5);
  }

For now, we are interested in three functions on spreadsheet formulas: one (noRefs) checks whether a formula contains any references to other cells; one (countNums) returns the number of Nums in a formula; one (valueOf) computes the value of a formula (assuming it has no references to other cells – a spreadsheet would have to compute values for all referenced cells before evaluating a formula). We add these to the IFormula interface as follows:

  interface IFormula {
    // does formula reference other cells?
    boolean noRefs();
    // compute value of formula; ASSUMES noRefs
    int valueOf() ;
    // how many Nums are in a formula?
    in countNums() ;
  }

Implementations of these operations are shown in the following classes:

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public boolean noRefs() {
      return true;
    }

    public int valueOf() {
      return this.value;
    }

    public int countNums() {
      return 1;
    }
  }

  class CellRef implements IFormula {
    String cellname;

    CellRef(String cellname) {
      this.cellname = cellname;
    }

    public boolean noRefs() {
      return false;
    }

    public int valueOf() {
      throw new RuntimeException("Unresolved cell reference");
    }

    public int countNums() {
      return 0;
    }
  }

  class Plus implements IFormula {
    IFormula left;
    IFormula right;

    Plus(IFormula left, IFormula right) {
      this.left = left;
      this.right = right;
    }

    public boolean noRefs() {
      return this.left.noRefs() && this.right.noRefs();
    }

    public int valueOf() {
      return this.left.valueOf() + this.right.valueOf();
    }

    public int countNums() {
      return this.left.countNums + this.right.countNums();
    }

  }

  class Examples {
    Examples(){}

    Num num2 = new Num(2);
    Num num5 = new Num(5);
    CellRef a10 = new CellRef("a10");
    Plus f1 = new Plus(num2, num5);
    Plus f2 = new Plus(a10, num5);

    boolean test1(Tester t) {
      return t.checkExpect(num2.noRefs(),true);
    }

    boolean test2(Tester t) {
      return t.checkExpect(num2.valueOf(),2);
    }

    boolean test3(Tester t) {
      return t.checkExpect(a10.noRefs(),false);
    }

    boolean test4(Tester t) {
      return t.checkExpect(f1.noRefs(), true);
    }

    boolean test5(Tester t) {
      return t.checkExpect(f2.noRefs(), false);
    }

    boolean test6(Tester t) {
      return t.checkExpect(f2.noRefs(), false);
    }

    boolean test7(Tester t) {
      return t.checkExpect(f2.countNums(), 1);
    }

  }

Make sure you understand these classes and methods before you move on.

2 Abstracting Over the Traversal

Both functions traverse the expression tree and combine the results at each node into the final answer. The computations done at each node are quite different across the two functions (one returns an int while the other returns a boolean, for starters), but the traversals themselves are the same in that we visit every node, recursively process any children nodes, then combine results from the children into the overall function result. We want to write a single abstraction function that we can customize with the computations per node in the tree.

Doing this requires addressing two issues: we have to write a traverse method over expressions that is sure to visit each part of the formula, and we have to parameterize traverse over the functions that process each type of formula. We have to address these simultaneously.

Recall that last lecture we saw how to pass functions as arguments: we dress them up as classes and agree on a method name to serve as an intermediary between the function-as-argument object and the function we want to call. The main difference here is that the functions we are trying to abstract over (noRefs and valueOf) are spread across three classes rather than the one MenuList class from last lecture. But if we look at each of Num, CellRef, and Plus separately, we seem to be facing the same abstraction question as last lecture. So let’s start on a class-by-class basis and see where that gets us.

2.1 Abstracting Class by Class

For sanity sake, let’s start just by abstracting over the two methods that return int; we’ll extend what we do to also cover noRefs later. Start with the Num class:

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public int valueOf() {
      return this.value;
    }

    public int countNums() {
      return 1;
    }
  }

We’re going to add a method called traverse that could perform the tasks of either of valueOf or countNums. Following the pattern of the last lecture, we will take an object as an input that contains the function to call. Since the functions we are abstracting over here have little in common, we’ll use a non-descriptive name, f, for the "function" argument:

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public int traverse(??? f) {
      return f.process(this);
    }
  }

This is exactly the pattern from the last lecture: process is simply the method name we are using instead of shouldSelect from the last lecture.

What type do we need for f? As in the last lecture, let’s make an interface. We’ll call it IProc (Proc for "procedure"):

  interface IProc {
    int process(Num aNum);
  }

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public int traverse(IProc f) {
      return f.process(this);
    }
  }

We could now create the valueOf method on the Num class as follows:

  class ValueOfNum implements IProc {
    int process(Num aNum) {
      return aNum.value;
    }
  }

Now let’s do the same for the Plus class. All methods in the Plus class call a function on each of the left and right formulas, then combines the results. This leads to the following code:

  class Plus implements IFormula {
    IFormula left;
    IFormula right;

    Plus(IFormula left, IFormula right) {
      this.left = left;
      this.right = right;
    }

    public int traverse(??? f) {
      return f.process(this.left.traverse(f), this.right.traverse(f));
    }
  }

What interface should we use for the type of f here? Note that this process method needs a different type from the one in our current IProc (this one takes two ints, not a Num). So perhaps we should create a new interface like the following (where "leftres" and "rightres" mean "left result" and "right result"):

  interface IProcForPlus {
    int process(int leftres, int rightres);
  }

Stop and think: what’s the downside to a separate interface?

Think back to our examples: the left value in a Plus object could be a Num. So the type of f has to be something we can pass to the traverse method on Num. Which means f also needs to be an IProc. But how do we deal with the two different types on the process method then? There are two options.

First, Java allows multiple methods with the same name as long as they have different input parameter types. So we could write

  interface IProc {
    int process(Num aNum);
    int process(int leftres, int rightres);
  }

We could also use different names for the different process methods, as follows:

  interface IProc {
    int processNum(Num aNum);
    int processPlus(int leftres, int rightres);
  }

The latter actually makes more sense, because if we later added another kind of formula (such as one for multiplication) that would have the same type, the different names would help us tell the process methods apart. So we will switch to the different process names for the rest of these notes.

2.2 Summary So Far: Shared Traversals for Methods that Return int

If we do the same abstraction in the CellRef class, we end up with the following IFormula classes:

  interface IFormula {
    int traverse(IProc f);
  }

  interface IProc {
    int processNum(Num n);
    int processCellRef(CellRef c);
    int processPlus(int leftres, int rightres);
  }

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public int traverse(IProc f) {
      return f.processNum(this);
    }
  }

  class CellRef implements IFormula {
    String cellname;

    CellRef(String cellname) {
      this.cellname = cellname;
    }

    public int traverse(IProc f) {
      return f.processCellRef(this);
    }
  }

  class Plus implements IFormula {
    IFormula left;
    IFormula right;

    Plus(IFormula left, IFormula right) {
      this.left = left;
      this.right = right;
    }

    public int traverse(IProc f) {
      return f.processPlus(this.left.traverse(f), this.right.traverse(f));
    }
  }

Note that we added a method to the IProc interface for the CellRef objects as well, and added the traverse method to the IFormula interface. The Java compiler would have complained had you forgotten either of these steps.

We would get back the original valueOf method by creating a class that implements IProc, as follows:

We would then use this class in place of the original valueOf method in the test cases. So where the original test cases might have said f1.valueOf(), the corresponding new code says

f1.traverse(new ValueOf());

The takeaway here is that abstracting over methods that span classes isn’t too much harder than abstracting over methods within a single class: the interface for the function-as-object parameter simply needs one method for each class that contains the function to abstract over. If you look at this problem class-by-class rather than all at once, it is a gentle step over the previous lecture.

3 Allowing Different Return Types

But we’re not quite done. Remember that we still have noRefs, which seems to fit traverse except for the output types (traverse returns int, whereas noRefs returns boolean. How do we handle code that is similar aside from types? Generics.

All we need to do to finish is add some generics that allow traverse to return types other than int.

Let’s look at the valueOf class that implements IProc and what we would like to write as the corresponding class for noRefs:

  class ValueOf implements IProc {
    public int processNum(Num n) {
      return n.value;
    }

    public int processCellRef(CellRef c) {
      throw new RuntimeException("Unresolved cell reference");
    }

    public int processPlus(int leftres, int rightres) {
      return leftres + rightres;
    }
  }

  class NoRefs implements IProc {
    public Boolean processNum(Num n) {
      return true;
    }

    public Boolean processCellRef(CellRef c) {
      return false;
    }

    public Boolean processPlus(Boolean leftres, Boolean rightres) {
      return leftres && rightres;
    }
  }

See how similar these are aside from the types? We just need to let the process methods, which get their types from IProc, work with either int or Boolean. So let’s apply generics to the IProc interface in a way that captures all the differences in the process methods. We’re using R as the type name here to stand for "Return", since we are parameterizing over the return type of the abstracted methods:

  interface IProc<R> {
    R processNum(Num n);
    R processCellRef(CellRef c);
    R processPlus(R leftres, R rightres);
  }

Now that IProc takes a generic argument, we have to supply the generic value whenever we use IProc. This gives us:

  class ValueOf implements IProc<Integer> {
    public Integer processNum(Num n) {
      return n.value;
    }

    public Integer processCellRef(CellRef c) {
      throw new RuntimeException("Unresolved cell reference");
    }

    public Integer processPlus(Integer leftres, Integer rightres) {
      return leftres + rightres;
    }
  }

  class NoRefs implements IProc<Boolean> {
    public Boolean processNum(Num n) {
      return true;
    }

    public Boolean processCellRef(CellRef c) {
      return false;
    }

    public Boolean processPlus(Boolean leftres, Boolean rightres) {
      return leftres && rightres;
    }
  }

The one other place we reference IProc is in the method headers for the traverse methods in each of the Num, CellRef, and Plus classes (as well as the IFormula interface. Here’s how to add the generic to Num (the others are similar):

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public <R> R traverse(IProc<R> f) {
      return f.processNum(this);
    }
  }

The extra <R> at the start of the method header tells Java that it should take the value of R from that of the IProc argument.

See the full code file for the version with these edits to all of the classes.

4 Summary

This example makes the separation of traversal from processing clearer than our categorized-menu example. Both are examples of visitors though, in that the core code visits all the specific classes containing data, and the function-objects indicate how to process that data. Hopefully, the combination of the two gives you a clearer picture of how visitors work.

That may still leave you wondering why anyone would write code using visitors though. The reasoning is clearer on this example than on the categorized-menu one. Traversing data structures is a core operation. We traverse trees, for example, every time we check or restore an invariant, or do some other computation over trees. Visitors allow us to write the traversal once. This can be helpful for two reasons:

Providing code libraries. There isn’t much point providing a tree or list library without providing methods for traversing them. Visitors are how you customize traversals to different methods.
Avoiding traversal errors. It can be easy to forget to traverse part of a complex data structure (how often did you lose a recursive case in 1101/2 if you forgot to use the template?) Abstracting over the traversal simply reduces the chance that you’ll make a traversal error in your code.

4.1 An Interesting Observation

We started this course helping you transition from Racket to Java. Had we done this example in Racket, you’d have had a valueOf function like

  (define (valueOf aformula)
    (cond [(num? aformula) (num-value aformula)]
          [(cellRef? aformula) (throw ...)]
          [(plus? aformula) (+ (valueOf (plus-left aformula))
                               (valueOf (plus-right aformula)))]))

In the first week of this course, we eliminated the cond and put the answers from the cond clauses in the individual classes for Num, CellRef, and Plus. If we look at valueOf, however, we seem to be back to almost the Racket-code structure: we’ve replaced the cond with processNum, processCellRef, and processPlus, but otherwise all the code for the valueOf method is back in the same class.

This makes sense if you remember that we are passing functions as objects in Java. The function needs code to process each variant of formula. We’ve bundled those variants into a common class. But is this still OO, if the code for the variants isn’t in the individual classes? Very much so. The Visitor pattern is a core OO programming pattern, designed precisely for cases when you want to pass functions as arguments (for the reasons described over the last two classes).

1	Spreadsheet Formulas and Functions Over Them
2	Abstracting Over the Traversal
3	Allowing Different Return Types
4	Summary