CS 2223 Apr 05 2021
Daily Exercise:
Classical selection: Rachmaninoff: Piano Concerto No. 2 [Kissin] (1901)
Visual Selection: Peasant Woman Binding Sheaves, Vincent Van Gogh (1889)
Live Selection: 25 or 6 to 4, Chicago (1970)
Daily Question: DAY08 (Problem Set DAY08)
1 Quicksort
1.1 Homework due at 10AM today
Update: At 9:30 PM Sunday night, 32 homeworks submitted.
Update: At 7:00 AM Monday morning, 87 homeworks submitted.
Update: At 9:00 AM Monday morning, 97 homeworks submitted.
Please complete before the 10AM deadline to be able to receive full credit.
Regarding the daily questions, be sure to take advantage of answering these questions daily as a way to verify you know the material. Remember, 5% of your total grade is based on answering these questions. It may very well be the difference between an A and a B for you:
1.2 Main topics in sorting
Thou dost lie in’t,
to be in’t and say it is thine.
’Tis for the dead,
not for the quick;
therefore thou liest.
William Shakespeare
Hamlet
We have made some major progress in analyzing the sorting problem. From last lecture we were able to assert the Mergesort is an asymptotically optimal compare-based sorting algorithm.
We made this claim because we demonstrated that any comparison-based sorting algorithm requires ~ N log N comparisons to correctly sort an array of N elements.
The logic was based on identifying a recursive formulat that computed the number of comparisons required. We came up with the following formula that demonstrated the most number of comparisons that would be used.
C(n) <= C(n/2) + C(n/2) + N
C(n) <= 2*C(n/2) + N
Be sure you understand the logic as presented in lecture, which is supported by the book; review the class capture if that would be helpful.
1.3 Quicksort Implementation
At this point, you might ask why we should continue investigating sorting as a problem. Well, the first issue is that MergeSort still required ~ N extra storage with which to work. Is it possible to eliminate the extra overhead? It is, and the surprisingly brief implementation might stun you.
Find this implementation here: Quick.
public static void sort(Comparable[] a) { shuffle(a); sort(a, 0, a.length - 1); } // quicksort the subarray from a[lo] to a[hi] private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int loc = partition(a, lo, hi); sort(a, lo, loc-1); sort(a, loc+1, hi); }
Once again, you will note there are no comparisons or exchanges, so the real work appears to be done in partition. Just given the information above, let’s see if we can reverse engineer what the method is supposed to do.
It is instructive to compare this method with MergeSort:
mergesort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; sort(a, lo, mid); sort(a, mid+1, hi); merge(a, lo, mid, hi); }
In MergeSort, both halves of the array are sorted, and then merged into place using auxiliary storage.
In Quicksort, an item in a[lo..hi] is found and put into its final spot at a location identified by loc. Let’s assume for a moment that this is even possible. Then, all you need to do is sort the left side a[lo..loc-1] and then the right side a[loc+1..hi].
Now, if partition is able to be effective at finding a value "near the middle" of the array, then the left and right sort subproblems will more or less be half the size of the original problem, and we will be able to demonstrate an asymptotically optimal compare-based sorting algorithm.
1.4 It’s all in how you partition it...
The speed and elegance of Quicksort comes from the partition method. Numerous methods have been proposed (there are at least four that I know of). The book lists one (p. 290) that is clean and easy to visualize. It also has some nice features that take advantage of duplicate values that might exist in the array.
First the intuition: Pick one of the existing values in a[lo..hi] and try to find where it belongs. For convenience, start with the value in a[lo]. Now, imagine sweeping through the array with two index values. Index left starts at lo and advances towards hi. While another index right starts at hi+1 and descends towards lo. Until these indices "cross each other" you try to find two elements that can be swapped with each other.
Let’s try to identify some truths in the following annotated code. The handout has the clean code, so the following is a bit verbose but it has everything you need to know.
Unlike BinaryArraySearch, note that the values of lo and hi do not change within the partition method.
These ten-lines of code have a number of complicated logical conditions in such a small space. You truly need to work out a number of examples BY HAND to make sure you are able to follow the logic. Page 291 of the book has a long example and I encourage you to review that example.
Let’s try our hand at the partitioning the following seven-member array:
egg | fly | ant | cat | dog | get | bat | |
lo | hi |
Start as follows:
egg | fly | ant | cat | dog | get | bat | |
left | right |
1.5 And now we see why shuffle is necessary
Quicksort is dependent on the selection of the partitioning element. As you will see on page 295, if the partitioning only reduces the problem size by 1 then the total number of compares will be ~ N2/2, which was exactly the case for Selection Sort.
So the trick is to ensure the value choosed to partition is close to the median value, that is, the value in the list which has an equal number of items smaller than or equal to it as it does that are greater than or equal to it.
You might think that you need to sort a collection to find its median value, but it turns out there is an approach you can use to determine it in time directly proportional to the elements in the array. However it is not used in practice because of its complexity.
The advanced mathematics alluded to in the book declares that "on average" the partition will be about half, and in the long run, this will lead to a demonstrated number of comparisons that is 40% above optimum.
1.6 Weakness in this partition
Review what happens when all elements are the same!? That is, given an array of SEVEN elements, how does partition operate? And what value right is returned?
the | the | the | the | the | the | the | |
lo | hi |
Start by selecting a[lo] as the element v to place, set left to lo and right to be one greater than hi:
the | the | the | the | the | the | the | |
left | right |
Remember that the inner while loops use pre-increment for left and pre-decrement for right. In both cases, the condition is false because all values are the same, however left and right are both incremented. When we get to the exch within the while loop, the two values are exchanged, even though they are the same value. The resulting state is follows:
the | the | the | the | the | the | the | |
left | right |
This will repeat three more times, with left and right finally crossing paths, thus terminating the loop. Naturally, during all this time, these exchanges are wasted.
But ask yourself whether it is worth your time to add an additional comparison just to ensure that the elements being swapped are different. Since you are trying to reduce the number of comparisons, it turns out not to help.
1.7 Daily Exercise
By now you have heard me state on a number of occasions that you need at least N-1 comparisons to locate the largest element within a collection of N elements.
Consider the following recursive solution to determining maximumum element in array.
static Comparable largest (Comparable[] a, int lo, int hi) { if (hi <= lo) { return a[lo]; } int mid = lo + (hi - lo)/2; Comparable maxLeft = largest (a, lo, mid); Comparable maxRight = largest (a, mid+1, hi); if (less(maxLeft, maxRight)) { return maxRight; } else { return maxLeft; } }
Using the same approach used to count comparisons in MergeSort, write an equation to computer C(N) the number of comparisons and solve the equation assuming N = 2n or a power of two.
Note: this could be an exam question, but I need to prepare students more for this to be a reality.
1.8 Review HeapBuster
When working with recursive algorithms, you need to be really careful that the divide-and-conquer technique is really dividing the problem into problems that are "more-or-less" half of their original size.
In the degenerate case, the problem is only divided into one problem of size 1, and another problem of size N-1. In this case, the number of recursive invocations will tend towards N, rather than log N.
Check out the problem that results in HeapBuster.
1.9 Homework 2 Instructions
Homework2 is now available. I will put together a homework video tonight explaining the assigned work and how you should approach it.
1.10 Tilde Explanation
This notation was introduced on page 179 of the book. It is meant to approximate a more complicated function whose lower-order terms can be ignored as N grows.
As another example, assume f(n) = n3/2 - 950n. In the long run, the constant 950 doesn’t matter with regards to the overall growth of this function, and it can be approximated with the Tilde approximation:
n3/2
This concept is bound to the notion of Order of growth. You can start with the Tilde approximation and eliminate constants, because you want to know how the performance will change, for example, as the problem size doubles.
In the above case, the Order of Growth is n3.
Starting this week, we will move away from Tilde explanations, preferring to use the BIG O notation I’ve begun to introduce. In this case, a program whose behavior conforms to this order of growth would be classified as O(N3).
1.11 Lecture Key Topics
MergeSort does not improve with increased duplicate values
Quicksort suffers if partition only reduces problem by size 1. For this reason, the values are initially randomly sorted. Do you think this is a waste of time? Evaluate its performance as ~ N exchanges, which can be easily absorbed by our expected ~N log N performance.
1.12 Daily Question
The assigned daily question is DAY08 (Problem Set DAY08)
If you have any trouble accessing this question, please let me know immediately on Discord.
1.13 Version : 2021/04/06
(c) 2021, George T. Heineman