Removing Elements from BSTs

These are optional notes on how to implement removal of elements from BSTs in Java. There are two versions here – one that uses some constructs we haven’t covered yet (but are a bit of a cheat in that using them here isn’t pure OO programming), and one that is a bit more complex but uses pure OO programming. The pure-OO one may be a bit confusing if you are still getting the hang of Java.

None of this material will be tested on exams. This is purely for your interest/education.

1 Implementing remElt with BSTs

First, let’s turn the general description of the remElt algorithm into Java code. For simplicity as we look at the subtleties of Java, we will always grab the largest element in the left child when we need to remove the root of a tree with two populated subtrees. The parts that raise interesting Java points are written in all capital letters between angle brackets (these are not valid Java code).

public IBST remElt (int elt) {

if (elt == this.data) {

if <BOTH CHILDREN ARE MtBSTs> {

return new MtBST();

} else if <LEFT IS AN MtBST> {

return this.right;

} else if <RIGHT IS AN MtBST> {

return this.left;

} else { // both children are DataBSTs

return new DataBST(this.left.largestElt(),

this.left.remElt(this.left.largestElt()),

this.right);

}

} else if (elt < this.data) {

return new DataBST(this.data,

this.left.remElt(elt),

this.right);

} else { // elt > this.data

return new DataBST(this.data,

this.left,

this.right.remElt(elt)) ;

}

Before you go on: make sure you see that this code implements the BST remElt algorithm. You should be able to articulate why this code preserves the BST invariant.

Now we need to capture the all-caps test questions in Java. To write these tests, we need a way to determine whether each child tree is an MtBST or a DataBST. Understanding how to do this properly is the point of this section of the presentation.

If you have had Java before, you may have been taught that you can check whether an object was created from a given class using an operator called instanceof. Using instanceof, we would fill in the holes as follows:

if (this.left instanceof MtBST && this.right instanceof MtBST) {

return new MtBST();

} else if (this.left instanceof MtBST) {

return this.right;

} else if (this.right instanceof MtBST) {

return this.left;

} else { ... }

Back when we showed how to migrate Racket programs over mixed data to Java, however, we discussed that good OO programs should not check the type of objects explicitly. Remember that one of the key points of OO languages is that they handle finding the right method based on the type of an object automatically (this is called dispatch). So while instanceof works here, it isn’t a proper solution in an OO language.

2 Rewriting Code to Eliminate instanceof

A proper OO solution requires that we capture the effect of the instanceof uses in methods; these methods will have different implementations on each of the MtBST class and the DataBST class that achieve the effects of the original instanceOf. Our goal then is to design a method that can dispatch on the children to perform the appropriate computation.

To help with that, let’s reorganize the conditional tests around the types of the children. We start with a conditional based on the type of the left child:

if (this.left instanceof MtBST) {

if (this.right instanceof MtBST) {

return new MtBST();

} else {

return this.right;

}

} else { // left is a DataBST

if (this.right instanceof MtBST) {

return this.left;

} else { ... }

}

Convince yourself that this version is indeed equivalent to the first version we sketched out. Now, note that in the case that the left is an MtBST, we return the right child in either case. So we can further simplify this to:

if (this.left instanceof MtBST) {

return this.right;

} else { // left is a DataBST

if (this.right instanceof MtBST) {

return this.left;

} else { ... }

}

Next, we turn this into a method on IBST that we will call on the left child: the answer for the if will be the body of the method in the MtBST class, and the answer for the else will be the body of the method in the DataBST class (just as we did when writing methods on animals in week 1). Let’s call the method remParent:

// goes into the MtBST class

IBST remParent(IBST rightSibling) {

return rightSibling;

}

// goes into the DataBST class. "this" is the left sibling

IBST remParent(IBST rightSibling) {

if (rightSibling instanceof MtBST) {

return this;

} else { ... }

}

We would call this method from within remElt in the DataBST class, as follows:

// remElt in the DataBST class

public IBST remElt (int elt) {

if (elt == this.data)

return this.left.remParent(this.right);

else if (elt < this.data)

... //code is the same after here

}

We still have a use of instanceof in the body of remParent, but we can use the same technique. The conditional already branches on the type of a single object, so we simply introduce a new method to handle the dispatch. We will call the new method mergeToRemParent. It will be called on the right sibling, taking the left sibling as an argument:

// goes into the MtBST class

// "this" is the right sibling; leftsibling is a DataBST

IBST mergeToRemoveParent(IBST leftsibling) {

return leftsibling;

}

// goes into the DataBST class.

// "this" is the right sibling; leftsibling is a DataBST

IBST mergeToRemoveParent(IBST leftSibling) {

// this is where we choose largest-in-left or smallest-in-right,

// branching accordingly. Only showing largest-in-left here

int newRoot = leftSibling.largestElt();

return new DataBST(newRoot,

leftSibling.remElt(newRoot),

this);

}

In the implementation of mergeToRemoveParent for the DataBST class, we have filled in the ellipses that we have carried through the example, refining the terms to match the variable names in the method.

To make this code compile, we also need to add both remParent and mergeToRemoveParent to the IBST interface, and accordingly mark all implementations of these methods as public. The full solution shows all of these details.

NOTE: There are other approaches you might take to eliminating instanceof in this code. If you want to explore them, work on the advanced option in lab this week.

3 Casting: Making the Types Work Out

Unfortunately, the code as we have it still won’t compile due to one last subtle issue. Remember that we are using BSTs to implement sets. We now have the following interfaces for Iset and IBST, and the following concrete types in DataBST:

interface Iset {

Iset addElt (int elt);

Iset remElt (int elt);

int size ();

boolean hasElt (int elt);

}

interface IBST extends Iset {

int largestElt();

IBST remParent(IBST sibling);

IBST mergeToRemoveParent(IBST sibling);

}

class DataBST implements IBST {

...

DataBST(int data, IBST left, IBST right) {

this.data = data;

this.left = left;

this.right = right;

}

Now, look closely at the types of objects we are passing to the DataBST constructor within remElt in the DataBST class:

public IBST remElt (int elt) {

if (elt == this.data)

return this.left.remParent(this.right);

else if (elt < this.data)

return new DataBST(this.data,

this.left.remElt(elt),

this.right);

...

}

The second argument to DataBST here is the result of remElt. The interfaces indicate that remElt returns an object of type Iset. But the DataBST constructor expects the second input to be of type IBST. The Java compiler will reject this code on a type mismatch.

But wait – we know that we are implementing Iset through IBST in this program. The actual remElt method we are calling returns an IBST. Aren’t we then guaranteed that the types are fine when we run the code?

Yes, we are. However, the Java type system cannot confirm this automatically (designing type systems in the presence of inheritence is very tricky, precisely for cases such as this). The Java compiler has no choice but to reject this code. The Java language, then, needs to provide programmers with a way to take the responsibility for this code executing properly in practice.

Java programmers do this by claiming what type the result of remElt will have a run time. This claim is called a cast. It is written as follows:

public IBST remElt (int elt) {

if (elt == this.data)

return this.left.remParent(this.right);

else if (elt < this.data)

return new DataBST(this.data,

(IBST) this.left.remElt(elt),

this.right);

...

}

The IBST before the result of remElt tells the compiler "assume this object is an IBST when you compile". The run-time system, in turn, will check this claim when the program is actually running. If the actual object does not implement IBST, an error will be reported as the program runs.

Once you have hierarchies of classes and interfaces, casts are sometimes necessary to make code compile. They slightly hurt the performance of running programs (since the types are checked at run-time rather than compile time). As a Java programmer, you should be careful to only use a cast when you are confident that the objects you are casting can actually be of the indicated type.

4 Other Notables in the Java BST Implementation

Several other details are embedded in the full BST implementation. In particular:

Methods common to IBST classes (such as largestElt) go into the IBST interface, not the Iset interface.
The remElt method requires a largestElt method on DataBSTs. Our program never invokes largestElt on an MtBST (convince yourself of this by looking at where it gets used). Unfortunately, the Java type checker isn’t smart enough to determine this, so it requires that all variants of IBST have a largestElt method. This requirement forces largestElt into the IBST interface, and thus into the MtBST class (try removing it and see what error you get). The behavior of largestElt isn’t well-defined on MtBSTs. Therefore, the implementation of largestElt on IBST simply raises an error if it is ever invoked. We will cover error handling more explicitly in a couple of weeks.
The addElt code also uses casts. The code actually shows another way to handle the type mismatch. We could refine the type of addElt within the IBST interface. The following interface would achieve this:
  interface IBST extends Iset {
    IBST addElt (int elt);
    ...}
Casting is the more common solution, but there are advantages to the interface-based solution. In particular, Java would report an error if the addElt implementation within either BST class returned some Iset other than a BST. Without addElt in the interface, addElt implementations are free to return other Iset implementations, even though those would be nonsensical elsewhere in the program.
We now have two broad criteria to exercise in our test cases: that BSTs provide a proper implementation of Iset and that our implementation satisfies the BST invariant. The incomplete set of tests in the Examples class illustrate these points: test1 checks size as a standalone function; test2 and test3 check that size and addElt interact properly to satisfy the no-duplicates requirement of sets; test4 through test6 begin to check that remElt preserves the BST invariant. We say "begin" here because these tests are low-level examples of the results of the invariant rather than a convincing expression of the invariant itself. We’ll be returning to that point in a couple of days.
A full remElt implementation would choose between upgrading the largest element of the left subtree or smallest element of the right. For simplicity, this code has only provided the former. This decision vastly (and unrealistically) simplified the last three tests in the Examples class.
The tests in the Examples class check both the behavior of the operations on BSTs, but also the axioms on ISet.

1	Implementing rem Elt with BSTs
2	Rewriting Code to Eliminate instanceof
3	Casting: Making the Types Work Out
4	Other Notables in the Java BST Implementation