CS 2223 Mar 18 2022

Lecture Path: 04
Back Next

Expected reading: pp. 132-141 (1.3, Implementation Collections), 176-183 (1.4, Analysis of experimental data), p. 185 (Useful approximationsfor the analysis of algorithms)
Lecture Challenges: Continuing Anagram hunt

Visual Selection:

Sample exam question: Postfix processing
Classical selection: Tchaikovsky: Piano Concerto No. 1 (1874)
Musical Selection: Village People: YMCA (1978)
Visual Selection: Napoleon Bonaparte, First Consul, crossing the Alps at Great St. Bernard Pass (1803)
Live Selection: La Campabella (Liszt, 1851) (Evgeny Kissin, 2010)

Daily Question: DAY04 (Problem Set DAY04)

1 Arrays are Structures, not types

1.1 Announcements

I have an anonymous canvas quiz/survey asking you to report on your status of HW1. Please take a few seconds to go there and provide feedback.

Yesterday’s handout is available in Canvas under Files | handouts | day 03 binary array search notes.

I will be holding my first Museum walk today. As I mentioned in the course announcements, this is one of the social activities that I am putting together in the first few weeks. If you have time in your schedule and would like a break from academic activities, let’s meet at 11 AM at the bottom of the Fuller Stairs and walk over to the museum. We will be back by around noon or 12:15, depending on the size of the group.

1.2 Important concepts from readings

Arrays are a data structure not a type
We struggle with, agonize over and bluster heroically about the great questions of life when the answers to most of these lie hidden in our attitude toward the thousand minor details of each day.
Robert Grudin
Time and the art of living
If you use array for Stack (or Queue), then you are responsible for managing size dynamically
Analysis of Experimental data

1.3 Reminder of key operations

Operation	Bag	Queue	Stack
add	add(Item)	enqueue(Item)	push(Item)
remove	--	dequeue()	pop()
size	size()	size()	size()
isEmpty	isEmpty()	isEmpty()	isEmpty()
--	--	--	--
iterator	any order	dequeue order	pop order

The iterators allow you to view the elements of these data types without changing its state.

We have not yet shown how to define an operation that determines whether one of these data types contains a specific value. Use the iterator to retrieve all values without updating its state

public void output (FixedCapacityStackOfStrings stack) { for (String s : stack) { StdOut.println(s); } }

1.4 Answer to question from yesterday

Yesterday I asked the following question:

You are given a train track arrangement with an incoming track containing (from left to right) cars 1, 2, 3, ..., N. There is a single spur that can contain up to N cars, and a single outgoing track. What are the most number of permutations that you can achieve on the outgoing track. There are a total of N! total possibilities.

Press to Reveal.

This is an interesting problem (which has already been solved before):

Answers

N	N!	Num Possible
1	1	1
2	2	2
3	6	5
4	24	14
5	120	42
6	720	132
7	5040	429
8	40320	1430
9	362880	4862

Check out the implementation in SingleSpurProblem. The solution makes an interesting use of Stacks, if you would like to check it out. This problem is a bit more challenging than what I could put on a homework.

Interested in more insight into the sequence 1, 2, 5, 14, 42, 132, 429, 1430, 4862, ... ? Check out https://oeis.org/A000108.

1.5 Daily Questions

So far, three daily questions are behind us. I have fully scored question 1 (and am trying to figure out how to get these results into Canvas). I have scored a subset of the next two questions, and this effort will soon involve the TAs in the class so I can keep up with these daily questions.

I will fill in these questions after they have been asked!

1.6 Data Type vs. Data Structures

Some languages, such as Java, allow you to determine the size of an array, but this is used by programmers to write safer code and doesn’t change the behavior of an array.

An array is a data structure, not a data type. You can be quite certain of that. The reason is simple. There is no API for an array. It offers two specific capabilities, namely, get the nth item and set the nth item.

In addition, you can’t even resize most arrays. The best you can do is create a new array that is larger (or smaller) than the original one, and then you copy elements from the original array into the new structure. The obvious conclusion is that resizing a stack takes time directly proportional to the stack size. This point will come up later this lecture.

This distinction is important because we are going to use arrays to implement a number of data types, starting with Stack

1.7 Fixed Capacity Stack Implementation

You have seen FixedCapacityStackOfString implementation (p. 133). If you continue with p. 135 you will see an implementation that uses the Java Generics capability to define a Stack of any type of element. Functionality is the same, this is just easier to program with.

The generic implementation can be found in the Git repository, in class FixedCapacityStack where it uses generics.

public class FixedCapacityStack<E> { private E[] a; // holds the items private int N; // number of items in stack // create an empty stack with given capacity public FixedCapacityStack(int capacity) { a = (E[]) new Object[capacity]; N = 0; } public boolean isEmpty() { return N == 0; } public boolean isFull() { return N == a.length; } public void push(E item) { a[N++] = item; } public E pop() { return a[−−N]; } }

Here are some questions to ask

Are you familiar with N++ and −−N operators?

What does the field N represent? Can you put it in words?

Can you make the E[] a field have the final modifier?

Can you make the int N field have the final modifier?

The biggest limitation is that this will run out of the existing memory, so we need a strategy to deal with expanding storage on demand.

1.8 Expandable Fixed Capacity Stack Implementation

Pages 136-137 describe how to resize the stack as needed to ensure it grows to support the full set of objects being pushed onto the stack. This implementation is efficient for several reasons:

Memory efficient – once popped, the former value in the array location is set to null. This improves Garbage Collection. But also makes it easier to understand within the debugger.
Amortized constant performance – A tricky statistical consideration.

The goal, as stated by Sedgewick, is to ensure that "push and pop operations take time independent of the stack size." (p. 132). How can you be sure this is true if you know that resizing the stack takes time directly proportional to the stack size.

Statistics to the rescue!

Let’s say you did N operations, and each one took a fixed amount of time. For simplicity, we’ll call this a single unit of time, without regard to whether it is seconds or hours in length. Once completed, these N operations will have taken up N units of time

1 + 1 + 1 + 1 + ... + 1 + 1 = N time units

So the average is N/N or 1 time unit.

All good so far. What if some operations take longer? For example, what if each successive operation takes one more unit of time than the previous? With N operations, you would then have

1 + 2 + 3 + 4 + ... + N-1 + N = N*(N+1)/2 time units

Here the average is (N+1)/2 time units. No longer is the time of an operation considered to be independent of other operations. In addition, the average is no longer a constant number.

Now let’s review the behavior of Stack. As long as the stack is not full, you can be assured that each push or pop operation will take time "independent of stack size", so it can be treated as a single time unit.

Now, what if you could guarantee that given a sequence of N operations, N-1 of the operations would require a single time unit while just one would require N time units. Note it is important that the N is the same value (both to count the number of operations and to reflect the length of the time to perform). The reason is that this special operation would take time to perform "in time that is dependent on the size of the input" (to twist the statement on page 132).

Now your summation becomes:

1 + 1 + 1 + 1 + ... + 1 + N

That is, 1 added N-1 times, and N added one final time. The total is 2N-1 operations, which when divided by N is 2-1/N which can be considered to be a constant especially with increasing values of N.

How do you ensure this nice distribution? First, let’s review the code (which I have placed in ResizingArrayStackResizeStrategies. Here is the revised push function (a bit simplified from my actual code which is instrumented to show why this strategy is the right choice.

public void push(Item item) { if (N == a.length) { resize (2*a.length); } a[N++] = item; // add item }

Why double the size of the array?

So only when the array is FULL does the resize operation take place. But doesn’t it seem like OVERKILL to double the size of the array, just to make room for one more item?

Here’s the thing. Empirical studies have consistently shown that this is the best approach when you have no idea as to the number of elements that will be pushed onto the stack.

ResizingArrayStackResizeStrategies demonstrates the benefit. When you compare two separate runs, how much SLOWER is the one that only extends linearly, say by 100 positions?

Resize by extending by 100 positions Time: 0.06300 (size = 59982, #Resize = 600) Avg=0.03316 +/- 0.01404 Time: 0.03200 (size = 59950, #Resize = 600) Avg=0.00954 +/- 0.01216 Time: 0.04600 (size = 60292, #Resize = 603) Avg=0.01759 +/- 0.01105 Time: 0.03200 (size = 60146, #Resize = 602) Avg=0.00665 +/- 0.00973 Time: 0.03100 (size = 60226, #Resize = 603) Avg=0.01305 +/- 0.01278 Resize by doubling Time: 0.00000 (size = 60004, #Resize = 15) Avg=0.00000 +/- 0.00000 Time: 0.01600 (size = 60254, #Resize = 15) Avg=0.00996 +/- 0.00776 Time: 0.00000 (size = 60152, #Resize = 15) Avg=0.00000 +/- 0.00000 Time: 0.01600 (size = 60296, #Resize = 15) Avg=0.00951 +/- 0.00786 Time: 0.00000 (size = 60070, #Resize = 15) Avg=0.00000 +/- 0.00000

Shrink size of array and prevent loitering

Note that the Resize capability has two distinct features. First, once the array is only 1/4 full, its size is cut in half. This is the opposite logic of the resize, if you think about it. Also, it makes sure to null out any entries once a value has been popped. This helps during the Garbage collection process.

public Item pop() { if (isEmpty()) throw new NoSuchElementException("Stack underflow"); Item item = a[N-1]; a[N-1] = null; // to avoid loitering N–; if (N > 0 && N == a.length/4) { // shrink size of array resize(a.length/2); // if necessary } return item; }

Discussion on the size of the initial array for the stack.

Did anyone see the initial size of the array used for a newly created stack? Any thoughts?

1.9 In-class question

You are given a stack of elements with storage of size 4 and containing two elements as shown. Propose a sequence of operations that results in the revised stack with storage of size 8 and containing 5 elements as shown.

Figure 1: Open Discussion

1.10 Iteration

Review the discussion on page 138-139 which describes how to traverse all elements of an aggregate data type. In this case, the structure is an array, so a ReverseArrayIterator is developed as an inner class to ResizingArrayStackResizeStrategies.

Briefly, why does a Stack need a reverse array iterator instead of a regular forward iterator?

You can review the ArrayIterator sample code I have provided for iterating in forward direction over an array of elements.

1.11 Analysis of Experimental Data: Order of growth

Sedgewick presents the case for running benchmarks on your data to determine runtime performance. From this data, you can determine trends by plotting the results.

The goal is to determine Order of Growth for performance (p. 180). Here is sample output for DoublingTest, comparing results from 2015 and 2018.

Growth Hypothesis?

N 2015 2018 2021 250 0.0 0.0 0.0 500 0.1 0.0 0.0 1000 0.5 0.1 0.0 2000 4.0 0.5 0.6 4000 31.4 4.1 2.2 8000 259.4 31.7 17.3 16000 - 257.5 139.7

These results were run on three different computers, but they exhibit, more or less, the same growth pattern. What can you make of these numbers? Perhaps you might use Excel to compute Trendlines for the available data.

Using this model, you might be able to estimate the time it would take to perform the computation with N=32000 elements. The difference in specific values from column-1 to column-3 reflect the improved hardware performance (and perhaps other factors) since the code is identical.

Develop mathematical model (following upon Knuth’s foundations) that the running time of a program is determine by:

Cost of executing each statement
The frequency of execution of each statement

Sedgewick uses Tilde approximation which abstracts most code blocks into constant execution, relying on identifying the most frequently executed operations. Typically these are deep within a nested for loop, or you can determine the number of recursions that a recursive function call makes.

I will replace any references to Tilde approximations with Big O notation. But I will start in an informal manner. So far we have seen three distinctly different growth patterns:

Constant – When the time to perform an operation is independent of N, the size of the problem instance. Think pop() or size() for a stack.
Logarithmic – When the time to perform an operation is directly proportional to log N, the size of the problem instance. Think BINARY ARRAY SEARCH.
Linear – When the time to perform an operation is directly proportional to N, the size of the problem instance. Think find largest in unordered array.
Cubic – When the time to perform an operation is directly proportional to N3, the size of the problem instance. Think ThreeSumModified

1.12 Sample Exam Question: solved

There is an alternate notation known as postfix notation. The above equation would be represented as "1 4 5 * 2 3 + * +". As its name implies, in postfix notation the operator comes after the arguments. Based on the structure of Dijkstra’s algorithm, devise a one stack solution to compute the value of this sequence of tokens.

On the exam, I would ask you to describe this algorithm using pseudocode:

This only shows for "+" and "*" but you can clearly see how to extend to other binary operators. This would be a suitable answer on an exam.

stack = new Stack while more input available s = next token if s is "*" then pop off last two values from stack and push back their product else if s is "+" then pop off last two values from stack and push back their sum else if s is value then push numeric interpretation of s onto stack pop value from stack and print it out

1.13 Sample Exam Question

This question assumes that you have a Stack of elements.

Write pseudocode for a function that takes a Stack of elements and modifies the stack so that its bottom two elements are swapped with each other.

1.14 Lecture Takeaways

Arrays can store aggregate information. For dynamic behavior, you can both grow and shrink array.
Homework 1 Rubric is posted

1.15 Thoughts on Homework 1

So far my office hour attendance has been more social than answering questions that students have. This might mean that students are taking care of business and do not have lots of questions. Of course it could also mean that you do have questions, but you are going to the TA office hours (great!). If, however you haven’t started the homework yet, please take my advice and delay no longer...

1.16 Lecture Challenges

Given the 10 base-10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8 and 9) can you construct a mathematical expression solely of addition that produces the value 100?

For example, 18 + 29 + 30 + 4 + 5 + 6 + 7 = 99. So close! Note how each digit is used exactly once. In trying to solve this problem, try to make statements of fact that you can use to guide your solution. For example, you could start by realizing that no three digit number would ever be used, because that is already over the target total of 100.

1.17 Daily Question

The assigned daily question is DAY04 (Problem Set DAY04)

If you have any trouble accessing this question, please let me know immediately on Discord.

1.18 Interview Challenge

Each Friday I will post a sample interview challenge. During most technical interviews, you will likely be asked to solve a logical problem so the company can see how you think on your feet, and how you defend your answer.

There are five cups placed on a table upside down in a line. Let’s label them "A B C D E". Under one of the cups is a terrified squirrel. Your goal is to find the squirrel under the following restrictions.

You can lift up exactly one cup at a time. If you find the squirrel you win! If not, then you put the cup back in its original position, upside down again, and turn around and wait for five seconds.

When you are turned around, the squirrel has just enough time to move to one of the cups that is a direct neighbor to the one it was hiding underneath. For example, if the squirrel had been hiding in cup "C", it could move to cups "B" or "D". If it had been hiding under cup "E", then it can only move to cup "D".

Devise an algorithm that will find the squirrel after picking up a finite number of cups.

I will post my solution on Monday.