CS 2223 May 11 2020

Lecture Path: 27
Back Next

Expected reading:
Daily Exercise:
Classical selection: Beethoven: Symphony No. 9 (1824)

Visual Selection:

Musical Selection: Questions, Moody Blues (1970)

Visual Selection: The Scream, Edvard Muchde (1893)

Daily Question: DAY27 (Problem Set In Canvas)

1 Final Preparations

Please take a few minutes before class begins to log into Canvas to complete the course evaluation. I use this information to improve my next course offering. You will have until 10PM tomorrow night, so don’t forget and try to take care of this suryey today.

HW4 is due today at 6PM.

1.1 Data Structures

"The time has come,"
the Walrus said, "To talk of many things:
Of shoes–and ships–
and sealing-wax–
Of cabbages–and kings–
And why the sea is boiling hot–
And whether pigs have wings."
Lewis Carroll

You are assumed to know the following basic structures:

Arrays
Linked Lists

You know when you should use these structures, and the implication of accessing aggregate data when stored in these structures.

You know about access performance in unordered arrays and linked lists.

Sample Question: You can locate the maximum value in an array of N elements in N-1 comparisons. Now assume you want to find the smallest and the largest value in the same array. What is the lower bound on the number of comparisons you need to make (i.e., the best case)? What is the upper bound on the number of comparisons you need to make (i.e., the worst case)? Can you provide sample instance problems with four elements to cover both these cases?

With linked lists, we saw how they were useful for storing loosely structured collections of values. They are used to implement Bag types, when there is no need to search through, but only retrieve all values in the Bag.

You have seen linked lists as they effectively implement a queue by maintaining two separate pointers, first and last.

Sample Question: Explain how to use a linked list to implement the stack data type.

1.2 Types

You should be well-versed in the basic data types used in this course. This includes:

Bag
Sample Question: It is possible to add a min() operation to the Bag type with performance of O(1) in worst case. How would you do it?
Stack
Sample Question: In Java it is not possible to resize an array of size N; rather you must create a new one (with a size > N) and manually copy the N element into the new array. With this in mind, how is it possible that an array-based implementation of Stack can still guarantee O(1) constant performance on the push operation?
Queue
Sample Question: If you have a fixed-size queue that never fills up, how do you avoid moving elements around (wastefully) whenever values are enqueued and dequeued?
Priority Queue with Heap (either MIN- or MAX- variety
Sample Question: Why does HeapSort use a MAX-PQ structure instead of a MIN-PQ structure?
Binary Search Tree
Sample Question: Under what circumstances will Binary Search Trees degenerate and (over time) deliver worse-and-worse performance.
Balanced Binary Search Tree – AVL variety
Sample Question: You are given an AVL tree that contains 15 unique values. Compute the maximum height of the tree, that is the greatest distance from any leaf node to the root, that still guarantees the AVL property.
Separate Chaining Hash Symbol Table
Sample Question: How can Symbol table provide average-case expected performance of Θ(1) for search, insert and delete? What is the classification of the worst case for these same operations?
Indexed Priority Queue with Heap Supporting DecreaseKey
Sample Question: How is Indexed Priority Queue able to efficiently locate an element in the queue so it can decrease its priority? Wouldn’t it have to search all N elements to find the one it is looking for?
Undirected Graph
Sample Question: given an undirected graph G, find all pairs of mirror vertices, that is, two vertices u and v such that the vertices adjacent to u are exactly the same as the vertices adjacent to v. What is the worst case performance of your algorithm?
Directed Graph
Sample Question: Does the mirror vertex question change if the graph is a directed graph? If so, why? if not, why?
Directed Weighted Graph

These types all have common operations as well as specific ones. For example, priority queue and binary search tree both support a deleteMin operation.

Sample Question: How does a Min Priority Queue support decreaseKey operation?
And why is it hard to envision adding an increaseKey operations?

Sample Question: You are given a connected undirected graph with an even number of vertices, V, and an even number of edges, E. This graph can be split into two graphs G1 and G2, each of which contains half of the vertices and have of the edges from the original graph. True or false? If false, provide counter example. If true, explain your reasoning.

1.3 Performance classifications

We finally introduced the Big O notation as a means to classify the order of growth of a function, which typically represents the run-time performance of an algorithm or the exact number of times an operation executes. This provided the finishing touches on the performnance analysis that we conducted throughout the term.

You should be able to reflect on the performance families we have seen:

O(1) constant
O(log N) logarithmic
O(N) linear
O(N log N) linearithmic – always outperforms any nk where k > 1. Even k = 1.001
O(N1.58) = O(Nlog(3)) = Karatsuba Multiplication (1962) – n x n multiplication faster than expected n2 algorithm.
O(N2) squared
O(N2.807) = O(Nlog(7)) = Strassen Matrix Multiplication (1969) which multiplies two 2x2 matrices using 7 multiplications (not 8)
Figure 1: Strassen Reduces # of multiplications
O(N3) cubic
O(Nk) polynomial
O(kN) exponential
O(N!) factorial

Sample Question: You are given a recurrence equation T(n) that is used to estimate the running time performance of an algorithm. You are told that T(N) = 2*T(N/3) + N/3. What is the overall classification using the above families of T(N)?

1.4 Mathematical Analysis

We have seen situations where we were concerned about counting the exact number of times an operation executed. Sometimes without knowing the exact input, it is only possible to determine the fewest number of times an operation executed (called the lower bound) or the maximum number of times an operation executed (called the upper bound).

1.4.1 Best-Case and Worst-Case

For BinaryArraySearch for example, given N integers in sorted ascending order in an array, we know that you can find (in the worst case) whether it contains a target integer in no worse than floor(log N) + 1 array inspections.

Note: you could try to claim that you need N array inspections by using a simple for loop, but this would not be "the best of the worst case" algorithms.

Thus to use "Big-Oh" notation O(g(n)) to classify the worst-case performance of Binary Array Search on a problem of size N, we would state that the worst case behavior is O(log N).

For this same problem, the best case is you would find the integer after just a single array inspection. In this case using Ω(g(n)) notation to classify the best-case performance of Binary Array Search on a problem of size N, we would state that best case behavior is Ω(1).

1.4.2 MergeSort example, which has subtle differences in best/worst cases

MergeSort is analyzed on page 272-273 of the book, and in my lectures on Apr 03 2020.

When faced with the final "merge" step in MergeSort one can see that in the best case, one of the sub-arrays contains values that are all smaller than the other sub-array, which means that the merge can complete with N/2 comparisons in the best case. In the worst case, one would need N-1 comparisons (if the values alternated with each other). This analysis was trying to count C(N) or the number of compare invocations needed to sort an array of length N.

In the best case (again assuming that N is a power of 2), we could write:

C(N) >= 2*C(N/2) + N/2

Using telescoping you get:

C(N) >= 2*C(N/2) + N/2
C(N) >= 2*[2*C(N/4) + N/4] + N/2
C(N) >= 2*[2*[2*C(N/8) + N/8] + N/4] + N/2

C(N) >= 23*C(N/23) + 3*N/2

in the general case:

C(N) >= 2k*C(N/2k) + k*N/2

and this can continue until k = log N. With a base case of C(1) = 0, this results in:

C(N) >= N*C(1}) + log(N) * N/2
C(N) >= 0 + (1/2) * log(N) * N

thus C(N) is Ω(N * log(N)) since this is the best case and we can’t do better than it.

Similarly, for the worst case, you have:

C(N) <= 2*C(N/2) + N - 1
C(N) <= 2*[2*C(N/4) + N/2 - 1] + N - 1
C(N) <= 2*[2*[2*C(N/8) + N/4 - 1] + N/2 - 1] + N - 1

C(N) <= 23*C(N/23) + 3*N - (4 + 2 + 1)

in the general case:

C(N) <= 2k*C(N/2k) + k*N - (2k-1)

and this can continue until k = log N. With a base case of C(1) = 0, this results in:

C(N) <= N*C(1) + log (N)*N - (N-1)
C(N) <= 0 + N*log(N) - (N-1)

Since (N-1) is much smaller than (N*log (N)) we can omit it from the classification scheme, and thus C(N) is O(N * log(N)) since this is the worst case, for which we will never do worse.

In summary, since MergeSort is classified as both Ω(N*log (N)) and O(N* log (N)) we can state categorically that it is Θ(N*log (N)).

1.5 Algorithm Families

We discussed a number of thematically related algorithms:

Sorting arrays of unordered elements. This includes InsertionSort, SelectionSort, MergeSort, QuickSort and finally CountingSort
Searching for values in collections
Exploring Graph structures to validate properties
Computing shortest path over weighted graphs

Given a directed graph, G, compute an undirected graph H in which an edge (u,v) exists in H if either the directed edge u –> v or the directed edge v –> u exists in G. What is the running time/performance of your algorithm?

1.6 Daily Question

The assigned daily question is DAY27 (Find in Canvas not Assistments)

If you have any trouble accessing this question, please let me know immediately on Piazza.