1 Starting Out: The Initial BST Classes and Interface
2 Implementing rem Elt with BSTs
3 Turning the ALL-CAPS Questions into Code
3.1 Step 1: Reorganize the if statements
3.2 Step 2: Replace if on the LEFT child with a method
3.3 Step 3: Repeat process to remove if on right child
4 Other Notables in the Java BST Implementation

Implementing Binary Search Trees in Java

Kathi Fisler

Having covered how the binary search tree (BST) operations work conceptually, we turn to implementing actual Java classes for BSTs. Our examples illustrated that any operation that modifies the data structure (here, addElt and remElt) must maintain the invariant. Operations that merely inspect the data structure (such as hasElt) are free to exploit the invariant, though the invariant does not necessarily affect all operations (such as size).

Since BSTs are binary trees, we start with the code for binary trees (similar to what we wrote for family trees). From there, the implementations of size, hasElt, and addElt are straightforward. The implementation of remElt, however, has some interesting implications in Java. The rest of these notes will focus on remElt. The final bst implementation in Java shows the details of all four operations.

1 Starting Out: The Initial BST Classes and Interface

So we can agree on names, here are the initial (no methods) interface and classes for BSTs. Note there is nothing interesting here – these are the same as what we did for family trees, just with different names for the classes and interface.

  interface IBST {}

  

  class MtBST implements IBST  {

    MtBST() {}

  }

  

  class DataBST implements IBST  {

    int data;

    IBST left;

    IBST right;

  

    DataBST(int data, IBST left, IBST right) {

      this.data = data;

      this.left = left;

      this.right = right;

    }

  }

2 Implementing remElt with BSTs

In the MtBST case, remElt will return the MtBST (since a set minus an element that isn’t there is the set itself). Since this case is straightforward, we focus instead on the DataBST.

First, let’s turn the general description of the remElt algorithm into Java code. For simplicity as we look at the subtleties of Java, we will always grab the largest element in the left child when we need to remove the root of a tree with two populated subtrees. The parts that raise interesting Java points are written in all capital letters between angle brackets (these are not valid Java code).

  public IBST remElt (int elt) {

    if (elt == this.data) {

      if <BOTH CHILDREN ARE MtBSTs> {

        return new MtBST();

      } else if <LEFT IS AN MtBST> {

        return this.right;

      } else if <RIGHT IS AN MtBST> {

        return this.left;

      } else { // both children are DataBSTs

        return new DataBST(this.left.largestElt(),

                           this.left.remElt(this.left.largestElt()),

                           this.right);

      }

    } else if (elt < this.data) {

      return new DataBST(this.data,

                         this.left.remElt(elt),

                         this.right);

    } else { // elt > this.data

      return new DataBST(this.data,

                         this.left,

                         this.right.remElt(elt)) ;

    }

  }

Before you go on: make sure you see that this code skeleton would implement the BST remElt algorithm. You should be able to articulate why this code preserves the BST invariant.

3 Turning the ALL-CAPS Questions into Code

Now we need to capture the all-caps test questions in Java. To write these tests, we need a way to determine whether each child tree is an MtBST or a DataBST. Understanding how to do this properly is the point of this section of the presentation.

If you have had Java before, you may have been taught that you can check whether an object was created from a given class using an operator called instanceof. We have, however, discussed that good OO programs should not check the type of objects explicitly. Recall that one of the key points of OO languages is that they handle finding the right method based on the type of an object automatically (this is called dispatch). So we need a way to ask the all-caps questions without explicitly asking for the types of the left- and right subtrees.

3.1 Step 1: Reorganize the if statements

Recall that calling a method on an object calls the version of that method stored in the object. We have different classes for MtBST and DataBST. So if we can break our remElt code into fragments to run on different classes, Java will handle the if statements automatically. To help with that, let’s reorganize the conditional tests in the elt==this.data case around the types of the children. We start with a conditional based on the type of the left child:

  if <LEFT IS AN MtBST> {

    if <RIGHT IS AN MtBST> {

       return new MtBST();

    } else {

       return this.right;

    }

  } else { // left is a DataBST

    if <RIGHT IS AN MtBST> {

       return this.left;

    } else { // both are DataBSTs ... }

  }

Convince yourself that this version is indeed equivalent to the first version we sketched out. Now, note that in the case that the left is an MtBST, we return the right child in either case. So we can further simplify this to:

  if <LEFT IS AN MtBST> {

    return this.right;

  } else { // left is a DataBST

    if <RIGHT IS AN MtBST>  {

       return this.left;

    } else { // both are DataBSTs ... }

  }

Putting this reorganized code back into the original remElt method yields the following:

  public IBST remElt (int elt) {

    if (elt == this.data) {

      if <LEFT IS AN MTBST> {

        return this.right;

      } else { // <LEFT IS A DATABST>

        if <RIGHT IS AN MtBST> {

          return this.left;

        } else { // both children are DataBSTs

          return new DataBST(this.left.largestElt(),

                             this.left.remElt(this.left.largestElt()),

                             this.right);

        }

      }

    } else if (elt < this.data) {

      return new DataBST(this.data,

                         this.left.remElt(elt),

                         this.right);

    } else { // elt > this.data

      return new DataBST(this.data,

                         this.left,

                         this.right.remElt(elt)) ;

    }

  }

Reminder: all we have done up to here is move around if statements. We have not done anything interesting with Java.

Make sure you are comfortable with this code implementing remElt before you move on.

3.2 Step 2: Replace if on the LEFT child with a method

Next, we turn the if statement on the left child into a method: the answer for the if will be the body of the method in the MtBST class, and the answer for the else will be the body of the method in the DataBST class (just as we did when writing methods on animals in week 1). We need to pick a name for this new method. Let’s call it remParent because the goal is to remove the parent of the left node. When we are done, we want the new remElt code would simply call remParent on the left child, as shown below (the ... is because we haven’t yet discussed whether remParent needs any parameters):

  public IBST remElt (int elt) {

    if (elt == this.data) {

      this.left.remParent(...);

    } else if (elt < this.data) {

      return new DataBST(this.data,

                         this.left.remElt(elt),

                         this.right);

    } else { // elt > this.data

      return new DataBST(this.data,

                         this.left,

                         this.right.remElt(elt)) ;

    }

  }

Let’s add remParent to the MtBST class. The body should be the code that was in the if case. That suggests the following (again, ignoring parameters to remParent for now):

  class MtBST implements IBST {

    ...

    IBST remParent(...) {

      return this.right;

    }

  }

In principle, this is the right idea. But there is a problem.

Inside the original remElt code, this refers to the node whose data we are trying to remove. this.left and this.right refer to its children. When the left child is empty, we want to return the right child. (Draw yourself an example tree to follow along here.)

But notice that we call remParent on the left child. So any references to this inside remParent now refer to the left child of the node to eliminate, not the node to eliminate. So in our initial version of remParent in the MtBST class, this.right is incorrect: it refers to a grandchild of the node we want to delete, not its right child.

When you move code into new methods, you must pass any objects that the code was using as parameters (other than the object that the method will be called on, which will just be this in the new method). The if statement that we are turning into remParent references the right sibling of the left child. So we will pass that sibling as a parameter, and replace any references to this.right from the original code with a reference to the parameter.

  public IBST remElt (int elt) {

    if (elt == this.data) {

      this.left.remParent(this.right);

    } else if (elt < this.data) {

      return new DataBST(this.data,

                         this.left.remElt(elt),

                         this.right);

    } else { // elt > this.data

      return new DataBST(this.data,

                         this.left,

                         this.right.remElt(elt)) ;

    }

  }

Let’s add remParent to the MtBST class. The body should be the code that was in the if case. That suggests the following (again, ignoring parameters to remParent for now):

  class MtBST implements IBST {

    ...

    IBST remParent(IBST rightSibling) {

      return rightSibling;

    }

  }

Now we create remParent in the DataBST class. The process is similar: copy the code, replace uses of this.right with rightSibling and replace uses of this.left with this. The resulting code looks like:

Inside the DataBST class, we add remParent with the code that was in the else // <LEFT IS A DATABST> portion:

  class DataBST implements IBST {

    ...

    IBST remParent(IBST rightSibling) {

      if <RIGHT IS AN MtBST> {

         return this;

      } else { // both children are DataBSTs

         return new DataBST(this.largestElt(),

                            this.remElt(this.largestElt()),

                            rightSibling);

      }

    }

  }

3.3 Step 3: Repeat process to remove if on right child

We still have a test on the type of the right subtree in remParent in the DataBST class, but we can use the same technique: introduce a new method name corresponding to the if statement, then divide the code across the classes to match each of the if and else clauses. Let’s call the new method mergeToRemParent. It will be called on the right sibling, taking the left sibling as an argument:

  // goes into the MtBST class

  // "this" is the right sibling; leftsibling is a DataBST

  IBST mergeToRemoveParent(IBST leftsibling) {

    return leftsibling;

  }

  

  // goes into the DataBST class.

  // "this" is the right sibling; leftsibling is a DataBST

  IBST mergeToRemoveParent(IBST leftSibling) {

    // this is where we choose largest-in-left or smallest-in-right,

    //   branching accordingly.  Only showing largest-in-left here

    int newRoot = leftSibling.largestElt();

    return new DataBST(newRoot,

                       leftSibling.remElt(newRoot),

                       this);

  }

In the implementation of mergeToRemoveParent for the DataBST class, we have filled in the ellipses that we have carried through the example, refining the terms to match the variable names in the method.

To make this code compile, we also need to add both remParent and mergeToRemoveParent to the IBST interface, and accordingly mark all implementations of these methods as public. The full solution shows all of these details.

NOTE: There are other approaches you might take to working around the type checks in this code. If you want to explore them, work on the advanced option in lab 3.

4 Other Notables in the Java BST Implementation

Several other details are embedded in the full BST implementation. In particular: