CS 2223 May 01 2023
Classical selection: Rachmaninoff: Piano Concerto No. 2 [Kissin] (1901)
Visual Selection: Peasant Woman Binding Sheaves, Vincent Van Gogh (1889)
Live Selection: 25 or 6 to 4, Chicago (1970)
Jazz Selection: My Favorite Things. John Coltrane (1961).
1 Quicksort
1.1 Class Climate
As of 10PM last night, there were 35 responses, which is a 24% response rate. Can we get this to 30% in short order? I will be mentioning at each opportunity when we meet.
9AM and we are now at 36 (25%). Five more students will lift us over the 30% mark. You know you want to be the one to move the needle...
1.2 Main topics in sorting
Thou dost lie in’t,
to be in’t and say it is thine.
’Tis for the dead,
not for the quick;
therefore thou liest.
William Shakespeare
Hamlet
We have made some major progress in analyzing the sorting problem. From last lecture we were able to assert the Mergesort is an asymptotically optimal compare-based sorting algorithm.
We made this claim because we demonstrated that any comparison-based sorting algorithm requires O(N log N) comparisons to correctly sort an array of N elements.
The logic was based on identifying a recursive formulat that computed the number of comparisons required. We came up with the following formula that demonstrated the most number of comparisons that would be used.
C(n) <= C(n/2) + C(n/2) + N
C(n) <= 2*C(n/2) + N
Be sure you understand the logic as presented in lecture, which is supported by the book; review the class capture if that would be helpful.
1.3 Quicksort Implementation
At this point, you might ask why we should continue investigating sorting as a problem. Well, the first issue is that MergeSort still required O(N) extra storage with which to work. Is it possible to eliminate the extra overhead? It is, and the surprisingly brief implementation might stun you.
Find this implementation here: Quick.
public static void sort(Comparable[] a) { shuffle(a); sort(a, 0, a.length - 1); } // quicksort the subarray from a[lo] to a[hi] private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int loc = partition(a, lo, hi); sort(a, lo, loc-1); sort(a, loc+1, hi); }
Once again, you will note there are no comparisons or exchanges, so the real work appears to be done in partition. Just given the information above, let’s see if we can reverse engineer what the method is supposed to do.
It is instructive to compare this method with MergeSort:
mergesort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; sort(a, lo, mid); sort(a, mid+1, hi); merge(a, lo, mid, hi); }
In MergeSort, both halves of the array are sorted, and then merged into place using auxiliary storage.
In Quicksort, an item in a[lo..hi] is found and put into its final spot at a location identified by loc. Let’s assume for a moment that this is even possible. Then, all you need to do is sort the left side a[lo..loc-1] and then the right side a[loc+1..hi].
Now, if partition is able to be effective at finding a value "near the middle" of the array, then the left and right sort subproblems will more or less be half the size of the original problem, and we will be able to demonstrate an asymptotically optimal compare-based sorting algorithm.
1.4 It’s all in how you partition it...
The speed and elegance of Quicksort comes from the partition method. Numerous methods have been proposed:
Collapsing Walls – Partition described by Sedgewick. Attempts to update both left and right
Single pivot – QuickAlternate which moves to the front of the array all values <= a[hi] and swaps.
Dutch National Flag Partitioning – Dijkstra. Helps when there are various duplicates. Also see Here.
Dual-Pivot Quicksort – Vladimir Yaroslavskiy, Jon Bentley, and Josh Bloch. Offers O(N log N) performance on many data sets that cause other quicksort algorithms to degrade to quadratic, and is typically faster than traditional (one-pivot) Quicksort Implementations
The book lists one (p. 290) that is clean and easy to visualize. It also has some nice features that take advantage of duplicate values that might exist in the array.
But there are so many variations!
First the intuition: Pick one of the existing values in a[lo..hi] and try to find where it belongs. For convenience, start with the value in a[lo]. Now, imagine sweeping through the array with two index values. Index left starts at lo and advances towards hi. While another index right starts at hi+1 and descends towards lo. Until these indices "cross each other" you try to find two elements that can be swapped with each other.
Let’s try to identify some truths in the following annotated code. The handout has the clean code, so the following is a bit verbose but it has everything you need to know.
Unlike BinaryArraySearch, note that the values of lo and hi do not change within the partition method.
These ten-lines of code have a number of complicated logical conditions in such a small space. You truly need to work out a number of examples BY HAND to make sure you are able to follow the logic. Page 291 of the book has a long example and I encourage you to review that example.
Let’s try our hand at the partitioning the following seven-member array:
egg | fly | ant | cat | dog | get | bat | |
lo | hi |
Start as follows:
egg | fly | ant | cat | dog | get | bat | |
left | right |
1.5 And now we see why shuffle is necessary
Quicksort is dependent on the selection of the partitioning element. As you will see on page 295, if the partitioning only reduces the problem size by 1 then the total number of compares will be O(N2), which was exactly the case for Selection Sort.
So the trick is to ensure the value choosed to partition is close to the median value, that is, the value in the list which has an equal number of items smaller than or equal to it as it does that are greater than or equal to it.
You might think that you need to sort a collection to find its median value, but it turns out there is an approach you can use to determine it in time directly proportional to the elements in the array. However it is not used in practice because of its complexity.
1.6 Weakness in this partition
Review what happens when all elements are the same!? That is, given an array of SEVEN elements, how does partition operate? And what value right is returned?
the | the | the | the | the | the | the | |
lo | hi |
Start by selecting a[lo] as the element v to place, set left to lo and right to be one greater than hi:
the | the | the | the | the | the | the | |
left | right |
Remember that the inner while loops use pre-increment for left and pre-decrement for right. In both cases, the condition is false because all values are the same, however left and right are both incremented. When we get to the exch within the while loop, the two values are exchanged, even though they are the same value. The resulting state is follows:
the | the | the | the | the | the | the | |
left | right |
This will repeat three more times, with left and right finally crossing paths, thus terminating the loop. Naturally, during all this time, these exchanges are wasted.
But ask yourself whether it is worth your time to add an additional comparison just to ensure that the elements being swapped are different. Since you are trying to reduce the number of comparisons, it turns out not to help.
1.7 Lecture Key Topics
MergeSort does not improve with increased duplicate values
Quicksort suffers if partition only reduces problem by size 1. For this reason, the values are initially randomly sorted. Do you think this is a waste of time? Consider the performance of shuffle? It is O(N)
1.8 Version : 2023/05/01
(c) 2023, George T. Heineman