CS 2223 Mar 24 2021
1 A Journey Of A Thousand Miles Begins With A Single Step
1.1 Definition
1.2 Science
1.3 Scientific Method
1.4 Skills
1.5 Lecture Challenges
1.5.1 Closed form formulas
1.5.2 Anagrams
1.5.3 Thought-provoking Questions
1.6 Preparing for Mar 25 2021
1.7 Daily Question
1.8 Three Things To Do Today!
1.9 References
1.10 Version :   2021/  03/  23

CS 2223 Mar 24 2021

Lecture Path: 01
Next

Expected reading: Today is the only day that you will have no expected readings
Expected demonstration: BINARY ARRAY SEARCH
Lecture Challenges: Closed Form, Anagrams, Defective Search

Visual Selection:

Daily Question: DAY01

For the remainder of this course, I expect you to have completed all readings and demonstrations prior to the start of the lecture on which they are listed.

Before presenting this lecture, I will review the course structure.

Diamonds are found only in the dark places of the earth, truths are found only in the depths of thought.
Victor Hugo
Les Miserables

1 A Journey Of A Thousand Miles Begins With A Single Step

A perfect algorithm is like an artistic study in miniature. Consider the amazing portrait of Ginevra de’ Benci by Leonardo da Vinci [1474/1478]

Can you zoom in and find Leonardo’s fingerprint? hint: It is near something blue...

Leonardo da Vinci painted this portrait – only 15x15 inches in size – when he was just 21 years old. It signaled a revolution in portraiture, fundamentally changing the way artists approached the subject. There is nothing extra in this artwork. Every brushstroke is meaningful and I encourage you to "zoom in" on the portrait. Click on the link below the image and you will be brought to the National Gallery of Art web site where you can click on the image there to explore. Right-click to see the mouse commands; you can double-click to zoom in and then move around to see the incredible detail of the picture.

Now let’s look at another elegant miniature, this time the BINARY ARRAY SEARCH algorithm which determine whether a sorted collection contains a target item. Here is a description from Jon Bentley’s amazing book, Programming Pearls:

BINARY [ARRAY] SEARCH solves the problem [of searching within a pre-sorted array] by keeping track of a range within the array in which T [i.e. the sought value] must be if it is anywhere in the array. Initially, the range is the entire array. The range is shrunk by comparing its middle element to T and discarding half the range. The process continues until T is discovered in the array, or until the range in which it must lie is known to be empty. In an N-element table, the search uses roughly log2(N) comparisons.

This algorithm, so briefly stated, is the foundation of so many efficient algorithms. Easily stated, but not so easily implemented. Bentley reports that given this description, 90% of professional programmers are unable to code this algorithm given several hours. What follows is a Java implementation:

BINARY ARRAY SEARCH

I identify algorithms by name in the margin using ALL CAPS.

Just Like Owen Meany would do.

Explore the code for BinaryIntSearch.

boolean contains(int[] collection, int target) { int low = 0; int high = collection.length-1; while (low <= high) { int mid = (low+high)/2; int rc = collection[mid] - target; if (rc < 0) { low = mid+1; } else if (rc > 0) { high = mid-1; } else { return true; } } return false; }

Given the above program, I have some simple questions to ask:

Important questions are hilighted in orange. Pay attention to these!

By the end of this course, I want all students to appreciate the elegance and beauty of this algorithm. Its simplicity is legendary. And just as important, one needs correct implementations that are validated to handle all cases, especially the boundary cases that make it hard to translate algorithms into code.

There are some subtle features of the above code that you might not notice at first glance:

The following table reports the result of an experiment. We generate a sorted array of 65,536 unique integers drawn from the range [-16777216, 16777216]. We then create a new list of 65,536 target integers in the range [-33554432, 33554432]. The target range is wider since we want to ensure we can search for numbers that are both too low and too high.

Any thoughts on the probability that a target search actually finds a value from the original sorted array? That is, of the T=33,554,432 individual searches, how many times will a random target value be successfully found in the original sorted array? Can you come up with a formula that estimates this number based on the definition of T, the initial sorted array, and the random target values?

We then want to generate a single trial run. Because computers are so fast these days, in each run we carry out the following task 512 times: search for each of the 65,536 target values in order. Thus there will be a total of 512 x 65,536 = 33,554,432 invocations of the contains method. All reported results are in seconds.

Program

PC

cs

ccc

PC/JS

iPhone/JS

iPhone8/JS

eq_lt_gt_div

1.763

3.296

2.001

393.725

757.193

118.894

trial-2

1.997

3.170

2.033

392.119

766.366

112.891

trial-3

2.012

3.170

1.963

397.914

921.355

115.374

trial-4

1.997

3.170

1.877

409.038

770.85

116.339

trial-5

2.012

3.171

1.897

416.332

775.169

117.99

Program

gt_eq_lt_shift

1.279

2.490

1.639

1.284

15.696

1.612

trial-2

1.607

2.222

1.508

1.284

13.858

1.435

trial-3

1.560

2.223

1.549

1.374

13.846

1.364

trial-4

1.513

2.222

1.536

1.279

13.859

1.368

trial-5

1.513

2.222

1.544

1.294

13.847

1.343

Program

lt_gt_eq_div

1.841

3.107

1.961

22.367

25.975

3.984

trial-2

2.044

3.071

1.937

21.414

25.881

3.984

trial-3

2.028

3.071

1.952

21.609

25.008

3.886

trial-4

2.028

3.069

1.849

21.772

27.017

4.083

trial-5

2.059

3.072

1.893

21.86

24.929

3.895

Want to test your own mobile device? Visit RunTrial to test out the speed of your device (warning: This might take up to an hour to run).

To see why the code has such disparate runtime performance, check out Raw Source Code

What patterns can you see in the above table? Do you have more questions after seeing these results? We will spend a good amount of time this course in learning how to properly conduct experiments and benchmark results.

1.1 Definition

An Algorithm is a finite, deterministic and effective problem-solving method suitable for implementation as a computer program.

This course is the study of algorithms, which are the fundamental building blocks of computer science.

1.2 Science

So says Wikipedia

Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe.

This course, CS 2223 Algorithms, aims to introduce you to the study of the fundamental concepts that relate to computational structures, otherwise known as computer programs. In this course, we are concerned both with ideas and the functional expression of those ideas as computer programs. In this course, we are glad when a computer program produces the correct answer to a problem, but we are excited when we can prove statements about the problem that help us predict the runtime performance of any implementation in any programming language that attempts to solve the same problem.

In presenting the domain of algorithms I would like to remind everyone of similar efforts made by countless scientists over the centuries. For example, what is so special about the following organism?

If you have ever taken a course in biology then you should be familiar with the fruit fly, one of the most studied organisms in scientific history. Researchers study fruit flies because they contain "a wealth of biological data that makes them attractive to study as examples for other species and other natural phenomena that are more difficult to study directly." Drosophila has 13,600 genes and it has been fully sequenced. Scientists study Drosophila not because it is important in and of itself, but because it provides a platform for study.

The power of modeling is most evident when considering the following scrap of paper. Can anyone recognize what it is?

Figure 3: Back of envelope computation

As an aside, if the Ancient Greeks had focused less on geometry and more on computation, who knows how different the world would have been. Squaring the Circle was the ultimate problem: find a square of equal area to a circle using only a finite number of steps with compass and straightedge.

Even Leonardo da Vinci spent countless hours in this pursuit.



Unfortunately, as stated, the problem is impossible.

In this course, we will study a number of fields in computer science that might not seem interesting. Consider the task of sorting in ascending order a collection of arbitrary values. The Sorting domain is one of the most studied fields in all of computer science. Advances in understanding how to write efficient sorting code has led to a deep understanding of Divide and Conquer algorithms.

1.3 Scientific Method

Never forget that this is a computer science class. We approach computational structures with the mind of a scientist. That is:

1.4 Skills

The entire course is focused on learning specific skills in algorithms.

Throughout this course, I will continue to draw your attention to a set of outcomes that I have created for this course. These outcomes are broadly divided into three categories:

1.5 Lecture Challenges

Each day I will present you with the opportunity to exercise and further develop your problem solving skills. These lecture challenges are intended to give you the chance to spend 20 minutes or so thinking about a problem and trying to solve it. I truly believe that you can improve your problem solving abilities with daily practice. I can’t grade these challenges, though I will post my own solutions at the start of the next class so you can review your answer.

Sometimes I will assign a challenge that is "open-ended" without any expectation of how long it will take you to solve the problem. These will be used sparingly, but will be used over a sequence of days to uncover more details about a problem which will help sharpen your attempts at solving the problem.

1.5.1 Closed form formulas

In this course you will become familiar with mathematical tools used to analyze the performance of algorithms. Try your skills at the following question. Later in the course I will conduct similar analyses for algorithmic questions:

Can you come up with a formula that represents the sum of the first n cubic numbers? That is, what is Sn = Σ i3 for i=1 to n?

To get you started, the first few terms of Sn are 1, 9, 36, 100, ...

You should be able to come up with a formula for Sn that uses just n. Take notes in your course note book as you attempt to solve this problem. As an added challenge, can you come up with another way to rephrase problem with a Σ, i and n?

1.5.2 Anagrams

You may find yourself stuck trying to work out some tricky bit of logic. Instead of trying to solve the problem "all at once" you need to make smaller progress or try to find an instance of a solution, before tackling the more general solution. Try your skills at this question:

Rearrange the following letters to form a single English word containing 13 letters.

BRITNEY SPEARS

Take notes in your course notebook as you attempt to solve the problem. Solution will be shown tomorrow. If you solve that one, then try another 13-letter anagram:

INGRID STEEGER

Does solving one anagram make it easier to solve another one?

Here’s one final one, that was (for me at least) surprisingly easy.

ROGER DALTREY

1.5.3 Thought-provoking Questions

Assume you have a collection of exactly 2n integers from 1 to 2n in order. What if you make the following mistake in the implementation to BINARY ARRAY SEARCH, namely using "<" instead of "<=" as the condition for the while loop:

while (low < high) { ... }

Now search for each of the 2n integers. Can you come up with a closed formula, p(n), that computes the success rate (a number between 0 and 1) of confirming that a randomly selected integer from 1 to 2n is found by this defective code?

1.6 Preparing for Mar 25 2021

Be sure to complete the readings for today as well as Mar 25 2021. As mentioned in class, for each of the following lectures, I have designated a list of 14-18 pages that you must read prior to coming to class. In addition, please review the lecture notes which you can find here 24 hours before the lecture itself.

1.7 Daily Question

It’s time for our first Daily Question! Please review the Daily Questions area within Canvas.

Each question is worth a total of 0.25 towards your overall grade. Participate in 20 of them and you will earn the maximum of 5%.

The assigned daily question is DAY01 and you can find it in the Assignments area in Canvas.

If you have any trouble accessing this question, please let me know immediately on Discord.

A total of 5% of your total grade is based on your participation in these daily questions. Each student has the chance to complete 20 of these questions. Answering each question is a great way to confirm you know the material. Plus, as I have mentioned in class, I am conducting research on how to use Machine Learning (ML) to provide automatic feedback on open-ended responses. During this term, I need to collect sample data to train the ML models so that in future offerings of this course, students will benefit from immediate feedback.

Each question is only open for 24 hours – each morning I will review the answers prior to the start of class, which will help me judge the overall learning achievements. If I find that students are not "getting" a key concept, I can adjust my lecture accordingly. Thank you for participating!

1.8 Three Things To Do Today!

On to the next class!

1.9 References

[1] Binary_search_algorithm, Wikipedia

1.10 Version : 2021/03/23

(c) 2021, George T. Heineman