cs2223 Class 4

[WPI] [cs2223] [cs2223 text] [News] [Syllabus] [Classes]

cs2223, D97/98 Class 4

Note: This page contains an attached Excel spreadsheet. A zip version is also atached for those whose browsers do not reliably download spreadsheet files.

Verifying Proofs

In Class 3, we showed how to write the recurrence relation for an algorithms, one which creates identity matrices. There are two parts to the process: writing a recurrence relation which accurately models the relationship between successive values in a sequence, and solving the recurrence relation. We used simulation - a computer program actually created the identity matrices and counted the number of replacement statements - to validate the recurence relationship. We can similarly use calculations to validate the solution of the recurrence relation. The first page of the attached spreadsheet uses direct comparison of the first one-hundred elements of the sequence to verify that

is the solution to the recurrence relation:

Summations

The summation symbol is a shorthand notation for the sum of terms:

Examples including sum(k=3->7; k) = 3+4+5+6+7

Summations are widely used in analyzing algorithms. The core of the identity() algorithm shown in Class 3 is two nested for loops:

replacements = 0;
for (int y = 0; y < n; y++)
   for (int x = 0; x < n; x++)
      {
      if (x == y) replacements++, *(array + n * y + x) = 1;
      else replacements++, *(array + n * y + x) = 0;
      }

The additions (increments) of replacements can be modeled as nested summations:

R = sum(y=0->N-1; sum(x=0->N-1; 1)) = sum(y=0->N-1; N) = N*sum(y=0->N-1; 1) = N^2

This is the result we found and proved to be correct in Class 3.

Averages

Summations are also useful for calculating average case results. In Class 1 and Class 3 we analyzed an algorithm for finding the minimum element in an array.

To calculate the average height, called the Expected Value, of the students in cs2223, we would add all of the heights and divide by the number of students:

E(h) = sum(k=1->S; h(k)) / S

In this equation, S is the number of students, and h_k is the height of the k-th student.

Sometimes we group students who have the same height and reduce the number of the terms by making the summation over groups:

Table showing the index k, h(k) - a height, and n(k) - the number of people with that height E(h) = sum(k=1->N; h(k)*n(k)) / S

The upper limit N is the number of groups. It is often less than S, the number of students, but they can be equal in the case when the height measurements are so precise that the groups are all of size one.

We can take the constant inside the summation:

E(h) = sum(k=1->N; h(k)*n(k)/S)= sum(k=1->N; h(k)*Pr{k})

The probability of being in group k - which means the probability of having height h_k - equals the number of students in the group divided by the total number of students. This result can be generalized. The average value of anything can be found by summing all possible values times their respective probabilities of occurrence.

Average Number of Replacement Operations in `minval()`

We used the last result to calculate the average number of replacements in the function minval(), which we introduced in Class 3. We use the average value to characterize this algorithm because the spread between the best-case and worst-case performance is so extreme. The best case (minimum number of replacements) occurs when the array elements are in ascending order; only one replacement occurs. The worst case (maximum number of replacements) occurs when the array elements are in descending order; the minimum value is replaced by every array value in succession.

To calculate the average number of replacements, look at each element in the array, starting at the beginning. For each element, find the probability that it is smaller than any or the previous elements. Multiply that probabilitiy by the amount the replacement count will increase (one) if it actually is lower than any which came before. The sum of these terms is the average replacement count:

E(R) = sum(k=1->N; 1*Pr{k})

Table showing three columns: the index k, value - all question marks, and Pr{k} = 1/k To calculate the probabilities, consider the array. The probability that the first element is a minimum (smaller than any which came before) is one. Now look at the second element in the array. If we know nothing about the values in the array, it is reasonable to assume that they are in random order. That means that any of the values could be the minimum. So it is equally likely that the second elment is larger than the first as it is that the second is smaller than the first. That means the probability that the second is a minimum is one-half. When we get to the third element in the array, it's probabiliity of being smaller than either of the first two elements is one-third, and so forth.

The average number of replacements in the function minval is:

E(R) = sum(k=1->N; Pr{k}) = sum(k=1->N; 1/k) = H(n)

This is called the harmonic series and it shows up often when analyzing algorithms. The harmonic series cannot be reduced to anything simpler, but it is approximately equal to:

The constant, known as Euler's constant, is an irrational constant of nature, such as pi or e. This means that the number of replacements grows roughly as the log of the array size. This approximation is good for values of N greater than about 10, as shown in the second page of the attached spreadsheet. The spreadsheet also contains a function - written in Visual Basic, the macro language of Excel - which calculates the harmonic series.

We tested our calculation with the simulation minval2.C. The source code is contained in the CCC directory:

/cs/cs2223/classes/class04/

The attached script shows the calcuation. In the third page of the attached spreadsheet, we compared the measured values from five simulations with the calculated value.

Running Averages

The minval2.C code contains this algorithm for calculating the average of the number of replacements when the

average = 0.0;
for (int n = 1; n <= NMAX; n++) // average NMAX cases
   {
   for (int i = 0; i < array_size; i++) array[i] = i + 1;
      // fill the array in order
   shuffle(array_size, array); // shuffle the array
   replaces = 0;
   minval(array_size, array); // find minimum value then ignore it
   average = ((n - 1) * average + replaces) / (float)n;
   } // end for(n)

The next to the last line ia a running average. Instead of adding the total number of replacements and dividing by NMAX, we updated the average each time through the loop. Assume we know the average number of replacements after we have gone through the loop N times:

E(N)(R) = sum(k=1->N; R(k)) / N

The next time through the loop, N+1, the average is:

E(N+1; R) = sum(k=1->N+1; R(k)) / (N+1) = (sum(k=1->N+1; R(k)) + R(N+1)/ (N+1) = (N/(N+1)) * (sum(k=1->N; R(k) / N)) + R(N+1)/(N+!) = (N/(N+1)) * E(N; R) + R(N+1)/(N+!)

If we replace N+1 by N, we obtain the running average:

In the above code, we express this as the command:

average = ((n - 1) * average + replaces) / n;