CS 2223 May 11 2021
Musical Selection:
Questions, Moody Blues (1970)
Visual Selection: The Scream, Edvard Muchde (1893)
1 Final Preparations
Please take a few minutes before class begins to log into Canvas to complete the course evaluation. I use this information to improve my next course offering. You will have until 10PM May 13th, so don’t forget and try to take care of this survey today.
HW5 is due today at 6PM.
I have updated the git repository to have my solutions for HW1 to HW4. I will upload my HW5 solutions after 6PM tonight.
1.1 Data Structures
"The time has come,"
the Walrus said, "To talk of many things:
Of shoes–and ships–
and sealing-wax–
Of cabbages–and kings–
And why the sea is boiling hot–
And whether pigs have wings."
Lewis Carroll
Arrays
Linked Lists
You know when you should use these structures, and the implication of accessing aggregate data when stored in these structures.
You know about access performance in unordered arrays and linked lists.
Sample Question: You can locate the maximum value in an array of N elements in N-1 comparisons. Now assume you want to find the smallest and the largest value in the same array. What is the lower bound on the number of comparisons you need to make (i.e., the best case)? What is the upper bound on the number of comparisons you need to make (i.e., the worst case)? Can you provide sample instance problems with four elements to cover both these cases?
With linked lists, we saw how they were useful for storing loosely structured collections of values. They are used to implement Bag types, when there is no need to search through, but only retrieve all values in the Bag.
You have seen linked lists as they effectively implement a queue by maintaining two separate pointers, first and last.
Sample Question: Explain how to use a linked list to implement the stack data type.
1.2 Types
You should be well-versed in the basic data types used in this course. This includes:
Bag
Sample Question: It is possible to add a min() operation to the Bag type with performance of O(1) in worst case. How would you do it?
Stack
Sample Question: In Java it is not possible to resize an array of size N; rather you must create a new one (with a size > N) and manually copy the N element into the new array. With this in mind, how is it possible that an array-based implementation of Stack can still guarantee O(1) constant performance on the push operation?
Queue
Sample Question: If you have a fixed-size queue that never fills up, how do you avoid moving elements around (wastefully) whenever values are enqueued and dequeued?
Priority Queue with Heap (either MIN- or MAX- variety
Sample Question: Why does HeapSort use a MAX-PQ structure instead of a MIN-PQ structure?
Binary Search Tree
Sample Question: Under what circumstances will Binary Search Trees degenerate and (over time) deliver worse-and-worse performance.
Balanced Binary Search Tree – AVL variety
Sample Question: You are given an AVL tree that contains 15 unique values. Compute the maximum height of the tree, that is the greatest distance from any leaf node to the root, that still guarantees the AVL property.
Separate Chaining Hash Symbol Table
Sample Question: How can Symbol table provide average-case expected performance of Θ(1) for search, insert and delete? What is the classification of the worst case for these same operations?
Indexed Priority Queue with Heap Supporting DecreaseKey
Sample Question: How is Indexed Priority Queue able to efficiently locate an element in the queue so it can decrease its priority? Wouldn’t it have to search all N elements to find the one it is looking for?
Undirected Graph
Sample Question: given an undirected graph G, find all pairs of mirror vertices, that is, two vertices u and v such that the vertices adjacent to u are exactly the same as the vertices adjacent to v. What is the worst case performance of your algorithm?
Directed Graph
Sample Question: Does the mirror vertex question change if the graph is a directed graph? If so, why? if not, why?
Directed Weighted Graph
These types all have common operations as well as specific ones. For example, priority queue and binary search tree both support a deleteMin operation.
Sample Question: How does a Min Priority Queue support
decreaseKey operation?
And why is it hard to envision
adding an increaseKey operations?
Sample Question: You are given a connected undirected graph with an even number of vertices, V, and an even number of edges, E. This graph can be split into two graphs G1 and G2, each of which contains half of the vertices and have of the edges from the original graph. True or false? If false, provide counter example. If true, explain your reasoning.
1.3 Since Midterm
Binary Search Trees are central data structure in this course and beyond. Numerous data structures are based on its principles:
Recursive – Numerous capabilities are based on the ability to define an operation in terms of recursive sub-calls. For HW4, you saw several examples of recursive functions in BST. Some were structural (just inspecting left and right references, like height). Some were read only (like iterator) while others mutated the structure (like put/insert)
Balanced – AVL trees provide a strategy for self-balancing when the tree "skews" left or right. You need to understand the AVL tree property that is being maintained (height difference of -1, 0, +1) and the underlying mechanics of how balance is restored.
Graphs close out the remainder of the class material:
Undirected – undirected graphs are fundamental starting point, and you need to know how DFS and BFS work. DFS offers a recursive strategy for exploring a graph while BFS uses a Queue to ensure minimal paths to each vertex from a designated source vertex in terms of the number of edges traversed.
Directed – directed graphs lead to new questions, such as checking for cycles and deriving a topological sort, or a linear ordering of vertices where each ordering constraint imposed by an edge is honored.
Weighted – adding weights introduces the notion of defining "smallest path" in terms of accumulated edge weights. Dijkstra’s single-source, shortest path algorithm uses a novel data structure, indexed min priority queue, to complete its tasks. Bellman-Ford can solve single-source shortest path if edges can be negative (though no negative cycle can exist). Finally Floyd-Warshall demonstrates how to compute the all-pairs, shortest path by exploring different shortest paths that increasingly use more and more vertices in the graph.
1.4 Performance classifications
We finally introduced the Big O notation as a means to classify the order of growth of a function, which typically represents the run-time performance of an algorithm or the exact number of times an operation executes. This provided the finishing touches on the performnance analysis that we conducted throughout the term.
You should be able to reflect on the performance families we have seen:
O(1) constant
O(log N) logarithmic
O(N) linear
O(N log N) linearithmic – always outperforms any nk where k > 1. Even k = 1.001
O(N1.58) = O(Nlog(3)) = Karatsuba Multiplication (1962) – n x n multiplication faster than expected n2 algorithm.
O(N2) squared
O(N2.807) = O(Nlog(7)) = Strassen Matrix Multiplication (1969) which multiplies two 2x2 matrices using 7 multiplications (not 8)
Figure 1: Strassen Reduces # of multiplications
O(N3) cubic
O(Nk) polynomial
O(kN) exponential
O(N!) factorial
Sample Question: You are given a recurrence equation T(n) that is used to estimate the running time performance of an algorithm. You are told that T(N) = 2*T(N/3) + N/3. What is the overall classification using the above families of T(N)?
1.5 Mathematical Analysis
We have seen situations where we were concerned about counting the exact number of times an operation executed. Sometimes without knowing the exact input, it is only possible to determine the fewest number of times an operation executed (called the lower bound) or the maximum number of times an operation executed (called the upper bound).
1.5.1 Best-Case and Worst-Case
For BinaryArraySearch for example, given N integers in sorted ascending order in an array, we know that you can find (in the worst case) whether it contains a target integer in no worse than floor(log N) + 1 array inspections.
Note: you could try to claim that you need N array inspections by using a simple for loop, but this would not be "the best of the worst case" algorithms.
Thus to use "Big-Oh" notation O(g(n)) to classify the worst-case performance of Binary Array Search on a problem of size N, we would state that the worst case behavior is O(log N).
For this same problem, the best case is you would find the integer after just a single array inspection. In this case using Ω(g(n)) notation to classify the best-case performance of Binary Array Search on a problem of size N, we would state that best case behavior is Ω(1).
1.6 Algorithm Families
We discussed a number of thematically related algorithms:
Sorting arrays of unordered elements. This includes InsertionSort, SelectionSort, MergeSort, QuickSort and finally CountingSort
Searching for values in collections
Exploring Graph structures to validate properties
Computing shortest path over weighted graphs
Given a directed graph, G, compute an undirected graph H in which an edge (u,v) exists in H if either the directed edge u –> v or the directed edge v –> u exists in G. What is the running time/performance of your algorithm?
1.7 Daily Question
Final daily question is available today.
If you have any trouble accessing this question, please let me know immediately on Discord.
1.8 Version : 2021/05/13
(c) 2021, George T. Heineman