Being Precise about Performance

We started discussing binary search trees in hopes of getting a more efficient implementation of set operations than lists would provide. In particular, we considered the efficiency of checking whether an element is in the set. Let’s contrast the efficiency of hasElt on a list implementation to that on a BST implementation.

1 Performance of hasElt on Lists

Recall that the list implementation of hasElt eliminated at one element from consideration on each iteration. Let T be a function that consumes the number of elements in a list and produces the worst-case time required to check whether that element is in the list:

T(n) = T(n-1) + c

Read this formula as saying "computing hasElt on a list with n elements takes the time to compute it on n-1 elements plus a constant". The T(n-1) comes from calling hasElt on the rest of the list. The constant c captures the time for the other operations that link hasElt on the rest to hasElt on the whole list (conditional, or, etc). We use a constant here since none of these extra operations depend on characteristics of the particular datum in the first of the list.

Technically, recursive formulas such as this are called recurrences.

2 Performance of hasElt on BSTs

Let’s write a similar formula for hasElt on a BST. At worst, how many elements still need to be considered after we rule out the root of the tree? A BST could have all the remaining nodes in only one child. If this situation applied to all nodes, the BST is really just a list, so we get the same recurrence:

T(n) = T(n-1) + c

Put differently, in the worst case, a BST is no more efficient than a list. In practice, the BST could be better, since BSTs are not list-shaped as a general rule. However, we get no guarantee of good performance from a BST-implementation of sets. If we want guaranteed performance improvements, we need a data structure with a stronger invariant.

When you get to Algorithms (CS2223), you will discuss recurrences and related issues in much more detail, include looking at average-case running times and space efficiency. Recurrences are just one high-level way of expressing the running-time of an algorithm. We’ll return to them periodically during the course.

3 Performance of hasElt on AVL Trees

What is the recurrence for the time performance of hasElt on AVL trees? Since the tree is balanced, we are guaranteed to throw away half of the elements on each iteration. Therefore,

T(n) = T(n/2) + c

Any recurrence of this form produces a solution in log(n) iterations. This is a clear performance improvement over BSTs or Lists, though at the cost of a more complicated algorithm.

4 Summary

This class does not expect that you can figure out recurrences for operations (that’s part of 2223). We’re merely giving you a sense of how we can say with technical accuracy that one data structure should perform better than another on a particular problem. What you should understand about performance is that different shapes of data and invariants on data lead to critical differences in performance.

1	Performance of has Elt on Lists
2	Performance of has Elt on BSTs
3	Performance of has Elt on AVL Trees
4	Summary