Implementing Binary Search Trees in Java
Having covered how the binary search tree (BST) operations work conceptually, we turn to implementing actual Java classes for BSTs. Our examples illustrated that any operation that modifies the data structure (here, addElt and remElt) must maintain the invariant. Operations that merely inspect the data structure (such as hasElt) are free to exploit the invariant, though the invariant does not necessarily affect all operations (such as size).
Since BSTs are binary trees, we start with the code for binary trees (similar to what we wrote for family trees). From there, the implementations of size, hasElt, and addElt are straightforward. The implementation of remElt, however, has some interesting implications in Java. The rest of these notes will focus on remElt. The final bst implementation in Java shows the details of all four operations.
1 Starting Out: The Initial BST Classes and Interface
So we can agree on names, here are the initial (no methods) interface and classes for BSTs. Note there is nothing interesting here – these are the same as what we did for family trees, just with different names for the classes and interface.
interface IBST {} |
|
class MtBST implements IBST { |
MtBST() {} |
} |
|
class DataBST implements IBST { |
int data; |
IBST left; |
IBST right; |
|
DataBST(int data, IBST left, IBST right) { |
this.data = data; |
this.left = left; |
this.right = right; |
} |
} |
2 Implementing remElt with BSTs
In the MtBST case, remElt will return the MtBST (since a set minus an element that isn’t there is the set itself). Since this case is straightforward, we focus instead on the DataBST.
First, let’s turn the general description of the remElt algorithm into Java code. For simplicity as we look at the subtleties of Java, we will always grab the largest element in the left child when we need to remove the root of a tree with two populated subtrees. The parts that raise interesting Java points are written in all capital letters between angle brackets (these are not valid Java code).
public IBST remElt (int elt) { |
if (elt == this.data) { |
if <BOTH CHILDREN ARE MtBSTs> { |
return new MtBST(); |
} else if <LEFT IS AN MtBST> { |
return this.right; |
} else if <RIGHT IS AN MtBST> { |
return this.left; |
} else { // both children are DataBSTs |
return new DataBST(this.left.largestElt(), |
this.left.remElt(this.left.largestElt()), |
this.right); |
} |
} else if (elt < this.data) { |
return new DataBST(this.data, |
this.left.remElt(elt), |
this.right); |
} else { // elt > this.data |
return new DataBST(this.data, |
this.left, |
this.right.remElt(elt)) ; |
} |
} |
Before you go on: make sure you see that this code skeleton would implement the BST remElt algorithm. You should be able to articulate why this code preserves the BST invariant.
3 Turning the ALL-CAPS Questions into Code
Now we need to capture the all-caps test questions in Java. To write these tests, we need a way to determine whether each child tree is an MtBST or a DataBST. Understanding how to do this properly is the point of this section of the presentation.
If you have had Java before, you may have been taught that you can check whether an object was created from a given class using an operator called instanceof. We have, however, discussed that good OO programs should not check the type of objects explicitly. Recall that one of the key points of OO languages is that they handle finding the right method based on the type of an object automatically (this is called dispatch). So we need a way to ask the all-caps questions without explicitly asking for the types of the left- and right subtrees.
3.1 Step 1: Reorganize the if statements
Recall that calling a method on an object calls the version of that method stored in the object. We have different classes for MtBST and DataBST. So if we can break our remElt code into fragments to run on different classes, Java will handle the if statements automatically. To help with that, let’s reorganize the conditional tests in the elt==this.data case around the types of the children. We start with a conditional based on the type of the left child:
if <LEFT IS AN MtBST> { |
if <RIGHT IS AN MtBST> { |
return new MtBST(); |
} else { |
return this.right; |
} |
} else { // left is a DataBST |
if <RIGHT IS AN MtBST> { |
return this.left; |
} else { // both are DataBSTs ... } |
} |
Convince yourself that this version is indeed equivalent to the first version we sketched out. Now, note that in the case that the left is an MtBST, we return the right child in either case. So we can further simplify this to:
if <LEFT IS AN MtBST> { |
return this.right; |
} else { // left is a DataBST |
if <RIGHT IS AN MtBST> { |
return this.left; |
} else { // both are DataBSTs ... } |
} |
Putting this reorganized code back into the original remElt method yields the following:
public IBST remElt (int elt) { |
if (elt == this.data) { |
if <LEFT IS AN MTBST> { |
return this.right; |
} else { // <LEFT IS A DATABST> |
if <RIGHT IS AN MtBST> { |
return this.left; |
} else { // both children are DataBSTs |
return new DataBST(this.left.largestElt(), |
this.left.remElt(this.left.largestElt()), |
this.right); |
} |
} |
} else if (elt < this.data) { |
return new DataBST(this.data, |
this.left.remElt(elt), |
this.right); |
} else { // elt > this.data |
return new DataBST(this.data, |
this.left, |
this.right.remElt(elt)) ; |
} |
} |
Reminder: all we have done up to here is move around if statements. We have not done anything interesting with Java.
Make sure you are comfortable with this code implementing remElt before you move on.
3.2 Step 2: Replace if on the LEFT child with a method
Next, we turn the if statement on the left child into a method: the answer for the if will be the body of the method in the MtBST class, and the answer for the else will be the body of the method in the DataBST class (just as we did when writing methods on animals in week 1). We need to pick a name for this new method. Let’s call it remParent because the goal is to remove the parent of the left node. When we are done, we want the new remElt code would simply call remParent on the left child, as shown below (the ... is because we haven’t yet discussed whether remParent needs any parameters):
public IBST remElt (int elt) { |
if (elt == this.data) { |
this.left.remParent(...); |
} else if (elt < this.data) { |
return new DataBST(this.data, |
this.left.remElt(elt), |
this.right); |
} else { // elt > this.data |
return new DataBST(this.data, |
this.left, |
this.right.remElt(elt)) ; |
} |
} |
Let’s add remParent to the MtBST class. The body should be the code that was in the if case. That suggests the following (again, ignoring parameters to remParent for now):
class MtBST implements IBST { |
... |
IBST remParent(...) { |
return this.right; |
} |
} |
In principle, this is the right idea. But there is a problem.
Inside the original remElt code, this refers to the node whose data we are trying to remove. this.left and this.right refer to its children. When the left child is empty, we want to return the right child. (Draw yourself an example tree to follow along here.)
But notice that we call remParent on the left child. So any references to this inside remParent now refer to the left child of the node to eliminate, not the node to eliminate. So in our initial version of remParent in the MtBST class, this.right is incorrect: it refers to a grandchild of the node we want to delete, not its right child.
When you move code into new methods, you must pass any objects that the code was using as parameters (other than the object that the method will be called on, which will just be this in the new method). The if statement that we are turning into remParent references the right sibling of the left child. So we will pass that sibling as a parameter, and replace any references to this.right from the original code with a reference to the parameter.
public IBST remElt (int elt) { |
if (elt == this.data) { |
this.left.remParent(this.right); |
} else if (elt < this.data) { |
return new DataBST(this.data, |
this.left.remElt(elt), |
this.right); |
} else { // elt > this.data |
return new DataBST(this.data, |
this.left, |
this.right.remElt(elt)) ; |
} |
} |
Let’s add remParent to the MtBST class. The body should be the code that was in the if case. That suggests the following (again, ignoring parameters to remParent for now):
class MtBST implements IBST { |
... |
IBST remParent(IBST rightSibling) { |
return rightSibling; |
} |
} |
Now we create remParent in the DataBST class. The process is similar: copy the code, replace uses of this.right with rightSibling and replace uses of this.left with this. The resulting code looks like:
Inside the DataBST class, we add remParent with the code that was in the else // <LEFT IS A DATABST> portion:
class DataBST implements IBST { |
... |
IBST remParent(IBST rightSibling) { |
if <RIGHT IS AN MtBST> { |
return this; |
} else { // both children are DataBSTs |
return new DataBST(this.largestElt(), |
this.remElt(this.largestElt()), |
rightSibling); |
} |
} |
} |
3.3 Step 3: Repeat process to remove if on right child
We still have a test on the type of the right subtree in remParent in the DataBST class, but we can use the same technique: introduce a new method name corresponding to the if statement, then divide the code across the classes to match each of the if and else clauses. Let’s call the new method mergeToRemParent. It will be called on the right sibling, taking the left sibling as an argument:
// goes into the MtBST class |
// "this" is the right sibling; leftsibling is a DataBST |
IBST mergeToRemoveParent(IBST leftsibling) { |
return leftsibling; |
} |
|
// goes into the DataBST class. |
// "this" is the right sibling; leftsibling is a DataBST |
IBST mergeToRemoveParent(IBST leftSibling) { |
// this is where we choose largest-in-left or smallest-in-right, |
// branching accordingly. Only showing largest-in-left here |
int newRoot = leftSibling.largestElt(); |
return new DataBST(newRoot, |
leftSibling.remElt(newRoot), |
this); |
} |
In the implementation of mergeToRemoveParent for the DataBST class, we have filled in the ellipses that we have carried through the example, refining the terms to match the variable names in the method.
To make this code compile, we also need to add both remParent and mergeToRemoveParent to the IBST interface, and accordingly mark all implementations of these methods as public. The full solution shows all of these details.
NOTE: There are other approaches you might take to working around the type checks in this code. If you want to explore them, work on the advanced option in lab 3.
4 Other Notables in the Java BST Implementation
Several other details are embedded in the full BST implementation. In particular:
The remElt method requires a largestElt method on DataBSTs. Our program never invokes largestElt on an MtBST (convince yourself of this by looking at where it gets used). Unfortunately, the Java type checker isn’t smart enough to determine this, so it requires that all variants of IBST have a largestElt method. This requirement forces largestElt into the IBST interface, and thus into the MtBST class (try removing it and see what error you get). The behavior of largestElt isn’t well-defined on MtBSTs. Therefore, the implementation of largestElt on IBST simply raises an error if it is ever invoked. We will cover error handling more explicitly in a couple of weeks.
largestElt in DataBST still uses the instanceof construct that we wanted to avoid using in remParent. Rewriting largestElt to NOT use instanceof would be a good exercise for you.
We now have two broad criteria to exercise in our test cases: that BSTs provide a proper implementation of Iset and that our implementation satisfies the BST invariant. The incomplete set of tests in the Examples class illustrate these points: test1 checks size as a standalone function; test2 and test3 check that size and addElt interact properly to satisfy the no-duplicates requirement of sets; test4 through test6 begin to check that remElt preserves the BST invariant. We say "begin" here because these tests are low-level examples of the results of the invariant rather than a convincing expression of the invariant itself. We’ll be returning to that point in a couple of days.
A full remElt implementation would choose between upgrading the largest element of the left subtree or smallest element of the right. For simplicity, this code has only provided the former. This decision vastly (and unrealistically) simplified the last three tests in the Examples class.
The tests in the Examples class check both the behavior of the operations on BSTs, but also the axioms on the Set ADT.