CS 2223 May 02 2023
Expected reading:
Musical Selection:
Questions, Moody Blues (1970)
Visual Selection: The Scream, Edvard Muchde (1893)
Live Selection: Secret World Live, Peter Gabriel (1994)
Jazz Selection: God Bless the Child. Billie Holiday & Count Basie.
1 Final Preparations
HW4 is due today at 6PM.
Currently the response rate is just 31%. What I have found in the past is that students typically do not fill out the online survey after the final exam – especially during D term and the end of an exceptionally long year.
Please take a few minutes before class begins to log into Canvas to complete the course evaluation. I use this information to improve my next course offering. You will have until 11:59PM Thursday May 4th, so don’t forget and try to take care of this survey today.
1.1 Data Structures
"The time has come,"
the Walrus said, "To talk of many things:
Of shoes–and ships–
and sealing-wax–
Of cabbages–and kings–
And why the sea is boiling hot–
And whether pigs have wings."
Lewis Carroll
Arrays
Linked Lists
You know when you should use these structures, and the implication of accessing aggregate data when stored in these structures.
You know about access performance in unordered arrays and linked lists.
Sample Question: You can locate the maximum value in an array of N elements in N-1 comparisons. Now assume you want to find the smallest and the largest value in the same array. What is the lower bound on the number of comparisons you need to make (i.e., the best case)? What is the upper bound on the number of comparisons you need to make (i.e., the worst case)? Can you provide sample instance problems with four elements to cover both these cases?
With linked lists, we saw how they were useful for storing loosely structured collections of values. They are used to implement Bag types, when there is no need to search through, but only retrieve all values in the Bag.
You have seen linked lists as they effectively implement a queue by maintaining two separate pointers, first and last.
Sample Question: Explain how to use a linked list to implement the stack data type.
1.2 Types
You should be well-versed in the basic data types and data structures used in this course. This includes:
Bag
Sample Question: It is possible to add a min() operation to the Bag type with performance of O(1) in worst case. How would you do it?
Stack
Sample Question: In Java it is not possible to resize an array of size N; rather you must create a new one (with a size > N) and manually copy the N element into the new array. With this in mind, how is it possible that an array-based implementation of Stack can still guarantee O(1) constant performance on the push operation?
Queue
Sample Question: If you have a fixed-size queue that never fills up, how do you avoid moving elements around (wastefully) whenever values are enqueued and dequeued?
Priority Queue with Heap (either MIN- or MAX- variety
Sample Question: Why does HeapSort use a MAX-PQ structure instead of a MIN-PQ structure?
Binary Search Tree
Sample Question: Under what circumstances will Binary Search Trees degenerate and (over time) deliver worse-and-worse performance.
Balanced Binary Search Tree – AVL variety
Sample Question: You are given an AVL tree that contains 15 unique values. Compute the maximum height of the tree, that is the greatest distance from any leaf node to the root, that still guarantees the AVL property.
Symbol Table
You have seen on HW4 the ability to use a Symbol Table to record the frequency (or count) of specific values that are of interest. This is a natural way to use Symbol Tables. As long as you don’t use the original SequentialSearchST implementation, you can guarantee (using amortized analysis) Θ(1) performance for key operations.
Sample Question: How can a LinearProbingHashST or SeparateChainingHashST Symbol Table provide average-case expected performance of Θ(1) for search, insert and delete? What is the classification of the worst case for these same operations?
We also saw how to use BST and AVLTreeST to implement a Symbol Table. While the average case for BST can produce O(log N) behavior, it can quickly degrade into O(N) behavior with poor insertion of (key, value) pairs. AVLTreeST will guarantee Θ(log N) performance by balancing the tree.
Indexed Priority Queue with Heap Supporting DecreaseKey
Sample Question: How is Indexed Priority Queue able to efficiently locate an element in the queue so it can decrease its priority? Wouldn’t it have to search all N elements to find the one it is looking for?
Undirected Graph
Sample Question: given an undirected graph G, find all pairs of mirror vertices, that is, two vertices u and v such that the vertices adjacent to u are exactly the same as the vertices adjacent to v. What is the worst case performance of your algorithm?
Directed Graph
Sample Question: Does the mirror vertex question change if the graph is a directed graph? If so, why? if not, why?
Directed Weighted Graph
Algorithms that process directed and weighted graphs are interested in (a) shortest path (in terms of accumulated edge weights) between a single source to all other vertices; (b) the shortest path (in terms of accumulated edge weights) between all vertices in the graph; (c) whether the graph has a cycle; (d) whether a graph has a negative cycle.
These types all have common operations as well as specific ones. For example, priority queue and binary search tree both support a deleteMin operation.
Sample Question: How does a Min Priority Queue support
decreaseKey operation?
And why is it hard to envision
adding an increaseKey operations?
Sample Question: You are given a connected undirected graph with an even number of vertices, V, and an even number of edges, E. This graph can be split into two graphs G1 and G2, each of which contains half of the vertices and have of the edges from the original graph. True or false? If false, provide counter example. If true, explain your reasoning.
1.3 Since Midterm
Well, technically, the day before the midterm, I covered Heaps on Apr 03 2023, however I didn’t complete the discussion in time to assign that material for the midterm, so it is still relevant. Practice using the "Heap In Class Exercise.pptx" you can find in Files | handouts in Canvas.
Binary Search Trees are central data structure in this course and beyond. Numerous data structures are based on its principles:
- Recursive – Numerous capabilities are based on the ability to define an operation in terms of recursive sub-calls. For HW3 and HW4, you saw several examples of recursive functions in BST
Structural – just inspecting left and right references, like height
Read-only Traversals – like summing the elements in a Binary Search Tree or producing the post order traversal
Mutative – like inserting or removing values
Balanced – AVL trees provide a strategy for self-balancing when the tree "skews" left or right. You need to understand the AVL tree property that is being maintained (height difference of -1, 0, +1) and the underlying mechanics of how balance is restored.
Graphs close out the remainder of the class material:
Undirected – undirected graphs are fundamental starting point, and you need to know how DFS and BFS work. DFS offers a recursive strategy for exploring a graph while BFS uses a Queue to ensure minimal paths to each vertex from a designated source vertex in terms of the number of edges traversed.
Directed – directed graphs lead to new questions, such as checking for cycles and deriving a topological sort, or a linear ordering of vertices where each ordering constraint imposed by an edge is honored.
Weighted – adding weights introduces the notion of defining "smallest path" in terms of accumulated edge weights. Dijkstra’s single-source, shortest path algorithm uses a novel data structure, indexed min priority queue, to complete its tasks. Bellman-Ford can solve single-source shortest path if edges can be negative (though no negative cycle can exist). Finally Floyd-Warshall demonstrates how to compute the all-pairs, shortest path by exploring different shortest paths that increasingly use more and more vertices in the graph.
1.4 Performance classifications
We finally introduced the Big O notation as a means to classify the order of growth of a function, which typically represents the run-time performance of an algorithm or the exact number of times an operation executes. This provided the finishing touches on the performnance analysis that we conducted throughout the term.
You should be able to reflect on the performance families we have seen. I also include some new ones that I have alluded to, but didn’t have time to discuss:
O(1) constant
O(log N) logarithmic
O(N) linear
O(N log N) linearithmic – always outperforms any nk where k > 1. Even k = 1.001
O(N1.58) = O(Nlog(3)) = Karatsuba Multiplication (1962) – n x n multiplication faster than expected n2 algorithm.
O(N2) squared
O(N2.807) = O(Nlog(7)) = Strassen Matrix Multiplication (1969) which multiplies two 2x2 matrices using 7 multiplications (not 8)
Figure 1: Strassen Reduces # of multiplications
O(N3) cubic
O(Nk) polynomial
O(kN) exponential
O(N!) factorial – Consider the N-queen problem (code in repository) for computing the first (and ultimately total) number of configuration when placing N non-attacking queens on an NxN chess board. The naive solution is to try all N! permutations, and this might turn out to be the worst case every time. Optimizations can improve performance but not break this behavior (see Book1Notes.xls)
1.5 Mathematical Analysis
We have seen situations where we were concerned about counting the exact number of times an operation executed. Sometimes without knowing the exact input, it is only possible to determine the fewest number of times an operation executed (called the lower bound) or the maximum number of times an operation executed (called the upper bound).
1.5.1 Best-Case and Worst-Case
For BinaryArraySearch for example, given N integers in sorted ascending order in an array, we know that you can find (in the worst case) whether it contains a target integer in no worse than floor(log N) + 1 array inspections.
Note: you could try to claim that you need N array inspections by using a simple for loop, but this would not be "the best of the worst case" algorithms.
Thus to use "Big-Oh" notation O(g(n)) to classify the worst-case performance of Binary Array Search on a problem of size N, we would state that the worst case behavior is O(log N). Note that by itself, O(...) is not exclusively meant for worst-case scenarios. It is only a tool to classify the upper bound on some situation.
For this same problem, the best case is you would find the integer after just a single array inspection. In this case using Ω(g(n)) notation to classify the best-case performance of Binary Array Search on a problem of size N, we would state that best case behavior is Ω(1).
1.6 Algorithm Families
We discussed a number of thematically related algorithms:
Sorting arrays of unordered elements. This includes InsertionSort, SelectionSort, MergeSort, HeapSort and QuickSort
Searching for values in collections
Exploring Graph structures to validate properties, using Breadth-first Search and Depth-first Search
Computing shortest path over weighted graphs
Given a directed graph, G, compute an undirected graph H in which an edge (u,v) exists in H if either the directed edge u –> v or the directed edge v –> u exists in G. What is the running time/performance of your algorithm?
1.7 Version : 2023/05/01
(c) 2023, George T. Heineman