Balanced Binary Search Trees (AVL Trees)
At the end of the notes on Binary Search Trees, we noted that a tree with only one long branch counts as a BST, but is no better than a list with regards to searching for elements. So while BSTs are usually faster for finding elements than lists, they aren’t guaranteed to be faster than lists on this operation.
The core problem here is that BSTs don’t require the trees to be wide (as opposed to tall). If trees had to be wide, then we’d have elements to ignore as we searched the tree. We formalize the idea of "wide" by creating Balanced binary search trees (BBST).
There are several flavors of BBSTs, depending on the specific constraint that they use to achieve balance. Here, we will work with a BBST variant called AVL Trees.
1 AVL Trees
AVL trees augment the binary search tree invariant to require that the heights of the left and right subtrees at every node differ by at most one ("height" is the length of the longest path from the root to a leaf). For the two trees that follow, the one on the left is a BBST, but the one on the right is not (it is just a BST):
6 6 |
/ \ / \ |
3 8 3 9 |
/ \ |
7 12 |
/ |
10 |
We already saw how to maintain the BST invariants, so we really just need to understand how to maintain balance. We’ll look at this by way of examples. Consider the following BST. It is not balanced, since the left subtree has height 2 and the right has height 0:
4 |
/ |
2 |
/ \ |
1 3 |
2 |
/ \ |
? 4 |
1 .. 3 |
2 |
/ \ |
1 4 |
/ |
3 |
Even though the new tree has the same height as the original, the number of nodes in the two subtrees is closer together. This makes it more likely that we can add elements into either side of the tree without rebalancing later.
A similar rotation counterclockwise handles the case when the right subtree is taller than the left.
One more example:
7 |
/ \ |
5 10 |
/ \ |
8 15 |
\ |
9 |
10 |
/ \ |
7 15 |
/ \ |
5 8 |
\ |
9 |
The solution in this case is to first rotate the tree rooted at 10 (in the original tree) clockwise, then rotate the resulting whole tree (starting at 7) counterclockwise:
7 |
/ \ |
5 8 |
\ |
10 |
/ \ |
9 15 |
|
----------------- |
|
8 |
/ \ |
7 10 |
/ / \ |
5 9 15 |
An astute reader would notice that the original tree in this example was not balanced, so maybe this case doesn’t arise in practice. Not so fast. The original tree without the 9 is balanced. So we could start with the original tree sans 9, then addElt(9), and end up with a tree that needs a double rotation. You should convince yourself, however, that we never need more than two rotations if we had a balanced tree before adding or removing an element.
We tie the rotation algorithm into a BBST implementation by using it to rebalance the tree after every addElt or remElt call. The rebalance calls should be built into the addElt or remElt implementations.
Wikipedia has a good description of AVL trees. Refer to that for details on the rotation algorithm if you are interested.
2 Take Aways for Different Groups of Students
As we work through these data structures, there are a lot of details here: definitions of data structures, how the operations work conceptually, how the operations get implemented (particularly in Java), the performance traits of different data structures, and the mathematical arguments that guarantee those performance traits.
What you should take away from this depends on your reason for taking the course.
I expect everyone to know the definition of the various data structures we cover and the performance guarantees of the operations we’ve discusssed. You should be able to recognize when a tree meets one of these definitions, draw a BST/BBST-AVL/Heap for a given set of values. If we describe an application and what operations it needs to do often, you should be able to recommend a suitable data structure for that application.
I do not expect that you know how to implement the operations on the various data structures. These notes discuss the implementations because I know some of you are interested in this content. You will not be asked to rotate a tree, explain how to write any of the operations, etc. If you plan to major in CS, however, you will need to understand these data structures and their implementations as you go through the curriculum and your career.
Summarizing the key details that you should take away from this:
When we define a tree-based data structure, we layer a constraint on top of the core tree shape. This constraint governs how data is organized within the tree (such as smaller items to the left, etc). The technical term of constraints is invariant. If a data structure has an invariant, all implementations are required to maintain that invariant.
Binary search trees and AVL trees are binary trees with different invariants. Each of these invariants yields a different run-time performance guarantee.
A data structure consists of a known data structure and an (optional) invariant. When implementing, follow the design recipe (template, etc) for the known data structure. Build the invariant into the methods that you write over that core template.
Different programming languages explicitly capture different invariants. Types are a common invariant that are captured in code. More complex invariants, such as those for binary search trees or AVL trees, need to be documented in Java classes. We can write separate methods to check for invariants, but Java does not provide mechanisms to check invariants at compile-time as it does for types.