WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS2223 Algorithms 
Homework 4 - B Term 2005

PROF. CAROLINA RUIZ 

Due Date: Thursday December 1st, 2005 at 2:00 PM 
Late submission policy for this (and only this) homework:
Homework 4 submissions received between 2:00-2:45 pm will be penalized with 20% off. No submissions will be accepted after 2:45 pm on Thursday, Dec. 1st, 2005.
------------------------------------------

Instructions


Problems

  1. (20 points)The cashier exercise
    Consider the problem of paying a given amount of money with the fewest possible number of coins. Given US coin denominations: 1 cent, 5 cents, 10 cents, 25 cents, and 100 cents (1 dollar coins), devise a method for the cashier to pay a given amount to a customer, which is given as the input to your program, using the fewest number of coins. Assume that the cashier has access to as many coins of each denomination as needed.

    Will a greedy algorithm for this problem output an optimal solution? If so,

    1. (10 points) write a greedy algorithm in detail,
    2. (10 points) prove rigorously that it is correct (i.e., it produces an optimal solution for any input), and
    3. Optional: (10 extra points) analyze its time complexity.
    If a greedy approach doesn't work, show an example in which the greedy approach produces a suboptimal solution.

  2. (10 points) The postal service exercise
    Consider the problem of mailing a package using the fewest possible number of stamps. Assume that the US postal denominations (i.e., the cost in cents of available stamps) are: 1, 10, 21, 37, 60, 100, 350, 1225, 1500. That is, given the cost of mailing a package, we want to find the fewest number of stamps such that the sum of their denominations is exactly equal to the mailing cost.

    Show an example in which the greedy approach to solve this problem produces a suboptimal solution.


  3. (30 points)Dijkstra's Algorithm
    Before you can solve this exercise, you need to read Sections 2.5 and 4.4 of the textbook in detail.

    Here is Dijkstra's algorithm as discussed in class (though a little more detail has been added to the pseudo-code below to help you with this homework).

    Input:
    - A graph G = (V,E) with |V|=n, |E|=m.
    - An array length with m positions, such length[(u,v)] = cost/length of the edge (u,v)
    - A node s in G.
    
    Output:
    - An array d of size n such that d[v] = the distance of the shortest path between s and v in G
    - An array Predecessor of size n such that Predecessor[v] = the node in G which preceds v
      in the shortest path from s and v in G
    
    Algorithm:
    
    Disjkstra (G, length, s) {
    
       Create an empty priority queue PQ
    
       d[s] := 0
       Predecessor[s] := nil
       add s with key value 0 to PQ 
     
       For each node v in V different from s {
          d[v] := infinity
          Predecessor[v] := nil
          add v with key value infinity to PQ 
       }
    
       Explored := empty set
    
       While PQ != empty {
          v := ExtractMin(PQ)
          Explored := Explored union {v}
     
          For each each (v,w) adjacent to v in E {
             if d[w] > d[v] + length[(v,w)] then {
                d[w]:= d[v] + length[(v,w)] 
                change w's key value in PQ to the new value d[w]
                Predecessor[w] := v
             }
          }
       }
    
       Return arrays d and Predecessor
    }
    
    Consider the following undirected graph. Follow by hand the implementation of Dijkstra's algorithm discussed in class (given above) to produce the shortest paths from the source node S to all other nodes In the graph.


    1. (5 points) Show the adjacency list representation of the graph. Remember that since the graph is undirected, then each edge (u,v) should belong to both u's and v's adjancecy lists. Also, remember to include the length/cost of each edge in your representation.
    2. Follow the execution of the algorithm step by step. For each step, show the state of the variables used by the algorithm, namely:
      • (2 points) the set of nodes already explored
      • (4 points) the array used to keep track of the cost/length of the shortest path from the source node to each node in the graph.
      • (4 points) the array used to keep track of the parent/predecessor of a node v so that the path from the source node to v can be reconstructed.
      • (10 points) the priority queue used to efficiently retrieve the unexplored node with the minimum cost edge from the set of explored nodes.
        Remember that the costs of the elements in this priority queue need to be updated (and hence the priority queue needs to be modified) each time that a node is moved from being unexplored to being explored.
    3. (5 points) Show the output of the algorithm. That is, for each node v in the graph show what the shortest path from the source node to v is and what the cost/length of the path is.

  4. (40 points) Huffman Codes
    Before you can solve this exercise, you need to read Sections 2.5 and 4.8 of the textbook in detail.

    Implement the Huffman codes algorithm in your favorite programming language (java, C, C++, ...). Consider the following pseudo-code of the algorithm:

    Input: 
    - n: the number of symbols to consider
    - An array S of n positions where S[i] is the ith symbol in S, 1 ≤ i ≤ n.
    - An array P of n positions where P[i] is the frequency/probability of the symbol S[i].
    
    Output: 
    - An array Code of n positions where Code[i] contains the encoding of the symbol S[i].
    - The average number of bits per letter/symbol used by the encoding.
    
    Algorithm:
    
    Huffman(S, P, n) {
          
          create an empty priority queue Q
    
          // Insert each symbol in S into Q as follows: //
          For i := 1 to n {
            create a new tree node v
            v.symbol := S[i] 
            v.freq := P[i] 
            v.left := nil 
            v.right := nil 
            insert v with key value v.freq into Q 
          }
           
          // Construct a binary tree with the symbols in S as follows: //
          For i := 1 to n-1 {
             create a new tree node z
             x := ExtractMin(Q). That is, extract the tree x with the minimum frequency from Q 
             z.right := x
             y := ExtractMin(Q). That is, extract the tree y with the minimum frequency from Q 
             z.left := y 
             z.freq := x.freq + y.freq
             insert z with key value z.freq into Q
          }
    
          extract the minimum (and only) tree T remaining in Q
    
          generate the codes for each symbol in S by following the
            tree T structure in a depth-1st manner assigning "0" to the
            left child of the node and "1" to the right child of the node
            until all the leaves are assigned a code
            (as shown in Figure 4.16 of the textbook)
            Once that a code is assigned to a symbol in S, say to the symbol S[i], store such
            code in the corresponding array position Code[i].
    
          Compute "ABL", that is the average number of bits per letter/symbol.
    
          Output the code generated for each symbol in S and also output the value "ABL"
       }
    
    
    WHAT TO SUBMIT FOR THIS PROBLEM:

    1. Submit a printout of your program. Document your code as much as possible to facilitate the grading of your submission.

    2. Submit a printout of your program's output on the following test cases:
      • Test case 1:
             n = 5
             Symbols:      a    b    c    d    e  
             Frequencies: .32  .25  .20  .18  .05 
             
      • Test case 2:
             n = 6
             Symbols:      a     b     c     d     e    f
             Frequencies: .045  .013  .012  .016  .009  .005
             
      • Test case 3:
             n = 12
             Symbols:      a     b     c     d     e     f     g     h     i     j     k     l   
             Frequencies: .002  .008  .022  .003  .004  .015  .001  .011  .009  .005  .007  .013  
             

    3. Optional (15 points extra credit): Add a new function to your program that receives as input the name of a text file, reads in the text file counting the frequency of each of all the 26 letters a, b, c, ..., z. Remember to divide the number of occurrences of each of these letters by the total number of the letters in the text file. For the purporse of this exercise you can ignore all symbols that are not letters in the alphabet (e.g., punctuation symbols, numbers. etc.) that appear in the text file. Also both capital and lower case letters contribute to the count (i.e., "a" and "A" both contribute to the count of letter "a").

      As an example of a text file you can use the short story Gift of the Magi by O. HENRY included here. Run your new function to read this file and count the frequencies of the 26 letters in the alphabet. Then use these data as input to your Huffman codes program. Include printouts of your new function, the output it generates, and also the output that your Huffman codes program generates on this input.