CS 2223 Mar 28 2022

Lecture Path: 09
Back Next

Expected reading: 361-374 (Section 3.1, Symbol Tables)
Daily Exercise:
Classical selection: Beethoven: Sonata No 21 ’Waldstein’ (1804)

Visual Selection:

Musical Selection: Billy Joel: Tell Her About It (1983)

Visual Selection: Haystacks - White Frost, Sunrise, Claude Monet (1890)
Live Selection:Do you feel like we do, Peter Frampton (1973)

Daily Question: DAY09 (Problem Set DAY09)

1 Fundamental Data Types

Did anyone try the interview question from Friday?

1.1 Importance of Sorting

Sorting is useful for two reasons – it provides information to humans in a useful order to make it easier for us to process. Using the knowledge of BINARY ARRAY SEARCH, you can see that another reason to sort an array is to reduce the lookup time. However, there is a cost associated with sorting, namely the need to secure a contiguous block of memory, which makes it challenging when new values are added to the collection (or values need to be removed).

Up next is a data type that can provide efficient functionality of key operations without demanding all values in a collection are completely sorted.

1.2 Back to Fundamental Data Types

A good decision is based on knowledge and not on numbers
Plato

We are now ready to complete the last of the major data type families for this course. Much like the platonic solids from geometry, these are the fundamental building blocks for algorithms:

These data types are:

Bag (p. 121) – Use with collections of non-comparable objects. Use when you don’t care overmuch about individual retrieval of elements but rather only want to retrieve all elements one at a time.
Stack (p. 121) – Use when you want Last-in, First-out (LIFO) behavior. Can be structured to support expandable collections or can be restricted to fixed capacity.
Queue (p. 121) – Use when you want First-in, First-out (FIFO) behavior. Can be structured to support expandable collections or can be restricted to fixed capacity.
Max Priority Queue (p. 309) – Use when you want to retrieve specific element that is "largest value" or "highest priority". Can be structured to support expandable collections or can be restricted to fixed capacity. Can be augmented to support arbitrary re-classification of "value" or "priority" (IndexMinPQ (p. 320) which we will cover in few weeks).
Symbol Table (p. 363) – Use when you want to associate a value with a key.

We are going to introduce the Symbol Table type today and over the next lecture we will complete its implementation.

1.3 Symbol Table

Each symbol table has a type representing the key and a type representing the value. For example, to count word frequencies in text (p. 372) you might want to associate an Integer with a String.

Operation	Description
put (Key key, Value value)	Associate (key,value) in table
Value get (Key key)	retrieve value for key
void delete (Key key)	remove (key,value) pair in table
boolean contains (Key key)	check if table has key
int size	return number of pairs
boolean isEmpty	determine if empty

To date, we have been concerned with collections for storing and retrieving individual items. Now we are contemplating an extension for storing associated pairs.

I claim that we already have the underlying data structures in place to properly implement the Symbol Table type.

1.4 Key Equality

These symbol tables are primarily concerned with testing equality of keys. When primitive data types are compared, the "==" operator is the default one used. In this case, each Key is a full Java object and we therefore assume that the Key class has an associated boolean equals (Object o) method that provides a semantic equality that goes beyond string equality.

1.5 Ordered Symbol Tables

We will begin to cover ordered symbol tables after the first exam.

1.6 Potential Implementation

We have covered enough structures to support the Symbol Table API. Today we will describe this in the context of linked lists that store additional information. This SequentialSearchST<Key, Value> implementation is from p. 375 of the book.

We have already seen how to use linked lists to store information, both ordered and unordered. The change here is to modify the structure of each node. The following defines the class and its inner Node class used to store the information.

public class SequentialSearchST<Key, Value> { int N; // number of key-value pairs Node first; // the linked list of key-value pairs // Nodes now store (key and value) class Node { Key key; Value value; Node next; public Node (Key key, Value val, Node next) { this.key = key; this.value = val; this.next = next; } } }

As you might imagine, we will build up linked lists of these (key, value) using put(key,value) operations, which only become a bit more complicated because you may be replacing a value that is already associated with key in the SequentialSearchST symbol table.

First observe that there is a useful constructor for creating Node objects from a (key, value) pair and a link to the next Node to use. Should you not wish to have the node have a next link, then simply pass null as the third parameter to this constructor.

Here is the put method implementation:

This is a standard template you will see a lot this week:

Node n = first;
while (n != null) {
// some code

n = n.next;
}

public void put(Key key, Value val) { Node n = first; while (n != null) { if (key.equals (n.key)) { n.value = val; return; } n = n.next; } // add as new node at beginning first = new Node (key, val, first); N++; }

The above while loop visits each Node in the linked list to see if it is the one whose key matches the incoming key parameter. Should there be a match, then this is a request to reassociated the new value val with this existing key, so the value associated with that node in the linked list is updated and the function returns.

Should there be no match with an existing key in the linked list, then we must add a node node. This is done, here, by making it the new first node of the linked list. This is the same behavior as you saw earlier with the Bag data type. Don’t forget to increment N which keeps track of the number of items in the linked list.

1.7 Retrieve information

The get(key) method is even simpler than the put method. You simply traverse the linked list one at a time, trying to find the node whose key value matches the key parameter. If found, then return the associated value stored by that Node otherwise return null.

public Value get(Key key) { Node n = first; while (n != null) { if (key.equals (n.key)) { return n.value; } n = n.next; } return null; // not present }

1.8 Delete information

What if you want to remove a (key, value) pair from the symbol table? Then you would invoke the delete(key) method. To efficiently remove a node from a linked list, you need to know its previous node. But how can you do this if all of the Node objects only point to the next one?

public void delete(Key key) { if (first == null) { return; } Node prev = null; Node n = first; while (n != null) { if (key.equals (n.key)) { if (prev == null) { // no previous? Must have been first first = n.next; } else { prev.next = n.next; // have previous one link around } return; } prev = n; // don’t forget to update! n = n.next; } }

1.9 Big O Approximation

Assuming that there were N elements in the Symbol Table, what is your analysis of the running time of the core operations, put, get, and delete? State your answer in terms of N.

Let’s come up with some more O(...) examples.

The idea is to come up with a model that can be used to compute the order of growth as N grows. Thus if you have a polynomial equation, you can eliminate the lower order terms regardless of their constants, because as N grows they will matter less and less.

Thus N3 + 1,000,000*N is going to be ~ N3.

Why? Because one N is larger than 100, N3 grows much, much faster than N.

Let’s look at some other examples (exercise 1.4.5, p. 208):

N+1
1 + 1/N
(1 + 1/N)*(1+2/N)
2N3 - 15*N2 + N
log (2N) / log(N)
log (N2+1)/log (N)
N100/2N

1.10 Daily Exercise

Let’s try to analyze the the recursive solution in terms of the maximum number of comparisons?

You should have been able to declare C(n) as the number of comparisons and then defined its behavior as:

C(N) = C(N/2) + C(N/2) + 1

assuming that N=2n as a power of 2.

C(N) = 2*C(N/2) + 1
C(N) = 2*(2*C(N/4) + 1) + 1
C(N) = 2*(2*(2*C(N/8) + 1) + 1) + 1

and this leads to...

C(N) = 8*C(N/8) + 4 + 2 + 1

since N = 2n and we are still at k=3...

C(2n) = 2k*C(N/2k) + (2k-1)

Now we can continue until k = n = log N, which would lead to...

C(2n) = 2n*C(N/2n) + (2n-1)

and since C(1) = 0, we have

C(N) = N - 1

1.11 Thoughts on HW2

Rubric is posted, together with a video giving high-level overview.

Is anyone working on the bonus questions?

1.12 Daily Question

The assigned daily question is DAY09 (Problem Set DAY09)

If you have any trouble accessing this question, please let me know immediately on discord.