CS2223 Algorithms. D-2008     Exam 2 Solutions - April 11, 2008

Prof. Carolina Ruiz. Department of Computer Science. Worcester Polytechnic Institute.

Solutions by Prof. Ruiz

Instructions

  • Show your work and Justify your answers
  • Use the space provided to write your answers
  • Ask in case of doubt

    Problem I. Design and Analysis of Algorithms (40 points)

    Consider the problem of an operator (e.g., a restaurant waiter, an office clerk, a computer processor, ...) sequentially serving n customers C1,...,Cn each having a given duration time of service, duration[i]. Write an algorithm that produces an optimal service schedule, in which each customer receives a start-of-service[i] time that allows the customer to be served for the necessary amount of time, and that minimizes the average wait of all customers, which is (1/n)*Σni=1 start-of-service[i]. Assume that the operator and all the customers are ready to start at time 0.
    Input: Output: Algorithm:

    1. (5 points) Describe in detail the strategy you will use to solve this problem.

      Solution:

      I'll use a greedy algorithm with a selection criterion that chooses as the next customer to be served, the customer with the shortest duration among those who haven't been served yet.
    2. (15 points) Write your algorithm (in pseudo-code) here. Remember that you need to keep track of the customers, so that at the end, start-of-service[i] corresponds to the same customer i whose input was duration[i].

      Solution 1:

      /* I'll use a priority queue PQ to store the customers, using their
      duration value as their key. */
                                                     Instruction #Iterations Total
      {
        StartHeap(PQ)                                                          O(1)
        For each i := 1 to n do {                           O(1)      * n   =  O(n)
            Insert customer i with key duration[i] in PQ    O(log(n)) * n   =  O(nlog(n))
        }
        time := 0                                                              O(1)
        For each i := 1 to n do {                           O(1)      * n   =  O(n)
           nextcustomer := ExtractMin(PQ)                   O(log(n)) * n   =  O(nlog(n))
           start-of-service[nextcustomer] := time           O(1)      * n   =  O(n)
           time := time + duration[nextcustomer]            O(1)      * n   =  O(n)
        }
      }      
                                                                              -----------
                                     T(n) =  2*O(1) + 4*O(n) + 2*O(nlog(n)) =  O(nlog(n))
      

      Solution 2:

      /* An alternate solution can be constructed similary by sorting the customers
      in increasing order by their duration time, using an O(nlog(n)) sorting
      algorithm, like mergesort. Care should be taken to keep track of the original
      number of the customer, so that the output start-of-service[i] corresponds
      to the original custormer i. For that, I'll use an array 
      name_duration_array[1...n] that stores in each of its cells an
      object (pair) that consists of two parts: (1) the customer number, and 
      (2) his/her duration time.
      
      
                                                     Instruction #Iterations Total
      {
        For each i := 1 to n do {                           O(1)      * n   =  O(n)
            name_duration_array[i] := (i,duration[i])       O(1)      * n   =  O(n)
        }
        Use mergesort to sort name_duration_array[1...n]                       O(nlog(n))
        in increasing order according to the 2nd component 
        of the pair in each cell (i.e., duration)
      
        time := 0                                                              O(1)
        For each k := 1 to n do {                           O(1)      * n   =  O(n)
           Let name_duration_array[k]=(i,duration[i])       O(1)      * n    = O(n)
           start-of-service[i] := time                      O(1)      * n    = O(n)
           time := time + duration[i]                       O(1)      * n    = O(n)
        }
      }      
                                                                              -----------
                                         T(n) =  O(1) + 6*O(n) + *O(nlog(n)) = O(nlog(n))
      

    3. (10 points) Rigorously prove that the algorithm is correct and optimal. That is, prove that it outputs a valid start-of-service[1...n] array in which all customers are sequentially served, each customer is served for the necessary amount of time, and the average wait of all customers, (1/n)*Σni=1 start-of-service[i], is the minimum possible.

      Solution:

      • All customers are served, with customer i begin time equal to start-of-service[i].

      • Each customer i is scheduled to be served for the entire duration[i], as the variable "time" was incremented by duration[i] after scheduling customer i.

      • Let's show that this greedy algorithm produces the optimal solution. Let the input costumers be C1,C2,...,Cn with durations D1,D2,...,Dn, and let the output be their assigned starting times S1,S2,...,Sn.

        • Alternate Proof 1: The starting time Si of customer Ci is determined by our algorithm to be:
          Si = start_of_service[i]= Σall customers j with (Dj < Di) OR (Dj=Di && j < i) Dj. Since we are making each Si as small as possible, then their average (1/n)*Σni=1 Si will be as small as possible.

        • Alternate Proof 2: Assume by way of contradiction that the sequence S1,S2,...,Sn output by our algorithm is not optimal. Then, there is a different sequence R1,R2,...,Rn that has a lower average than that of S1,S2,...,Sn. Let's sort each of these sequences of starting times in increasing order: Sk1, Sk2,..., Skn and Rm1, Rm2,..., Rmn. Let j be the first position in which they differ (j &ge 1):
          Sk1,Sk2,...,Sk(j-1),Skj,...,Skn 
          Rm1,Rm2,...,Rm(j-1),Rmj,...,Rmn 
          
          |---------------|
          the two sequences coincide here
          
          Since Skj ≠ Rmj, then either:

          1. Rmj < Skj: Let's consider the sequence of customers who would be served before index j:
            Ck1, Ck2,..., Ck(j-1) in the S-sequence
            Cm1, Cm2,..., Cm(j-1) in the R-sequence
            For each i, 1 ≤ i ≤ (j-1), we must have that duration of Cki = duration of Cmi.
            
            Since our algorithm is such that Skj = Sk(j-1) + duration[Ck(j-1)], then the R-sequence doesn't give enough time to fully service customer Cm(j-1) and therefore is not a correct solution to this problem.

          2. Rmj > Skj: Then the R-sequence is wasting the time interval Skj - Rmj and unnecessarily increasing its average, so it cannot be optimal. Our S-sequence is staying ahead of it!

          In both cases we conclude that the R-sequence cannot be an optimal solution to this problem contradicting our original assumption, and therefore the S-sequence output by our algorithm is optimal.


    4. (10 points) Analyze the runtime of your algorithm as a function of n. Provide a runtime function T(n) and the tightest asymptotic upper bound g(n) you can such that T(n)=O(g(n)). [You can do so above neatly next to each instruction of the algorithm, or below in the space provided.]

      Solution:

      See runtime analysis above, together with the algorithm(s).

    Problem II. Time Complexity (20 points)

    1. (10 points) Let A and B be two algorithms with runtimes TA(n) = loga(n) and TB(n) = logb(n), respectively, where a and b are two (possibly different) constants a,b > 1.

      Prove that TA(n) = Θ(TB(n)).
      [Hint: use the fact that loga(n) = logb(n)/ logb(a) ]


      Solution:

      TA(n)
      = loga(n)
      = logb(n)/ logb(a)
      = [1/logb(a)]*logb(n)
      = k*logb(n) where k = 1/logb(a) is a constant (note that logb(a) > 0 since a,b > 1)
      = Θ(logb(n))
      = Θ(TB(n))
    2. (10 points) Let A and B be two algorithms with runtimes TA(n) = nloga(c) and TB(n) = nlogb(c), respectively, where a, b, and c are (possibly different) constants a,b,c > 1.

      Show that TA(n) might not be Θ(TB(n)) by choosing specific values for the constants a, b, and c (e.g., a=2) such that TA(n) is NOT Θ(TB(n)). Justify your answer.
      [Hint: Remember that loga(u) = x if and only if ax = u]


      Solution:

      Take a = 2, b = 4, c = 16. Then,
      • TA(n) = nloga(c) = nlog2(16) = n4
      • TB(n) = nlogb(c) = nlog4(16) = n2
      We know that TA(n) = n4 ≠ Θ(n2) = Θ(TB(n)).
    [The moral of this problem is that the base of the logarithm can be ignored in O, &Omega, and &Theta EXCEPT IF :

    Problem III. Design and Analysis of Algorithms (40 points)

    Let G=(V,E) be a directed graph. We say that G is a directed acyclic graph (DAG) if it does NOT contain any directed cycles. The graph on the left in the figure below is a DAG. The same graph has been drawn differently on the right to show explicitly that it doesn't contain any directed cycles.

    A topological ordering of a directed acyclic graph G=(V,E) is an ordering of its nodes as v1, v2, ., vn so that for every edge (vi, vj) we have i < j. We have added labels v1,...,v7 to the nodes in the depicted graph to show that this graph has a topological ordering of its nodes (imagine that initially, the nodes in G were named 1,...,n in no particular order). Graphically, all edges in the topological ordering of the graph go "from left to right".

    graph

    Indeed, it is true that there exists a topological ordering for any directed acyclic graph. In this problem, you'll construct an algorithm that given a DAG, will output a topological ordering of its nodes. The desired runtime of the algoritm is O(n+m), where n is the number of nodes and m is the number of edges.

    Input:

    Output:

    Algorithm:

    1. The strategy that we will use to solve this problem is based on the following observation: In a DAG, there is always at least one node that has no incoming edges. Check the graph depicted to see that this is the case. A very general sketch of the algorithm is as follows:
         Repeat
           Find a node u in G with no incoming edges
           Make u the first element of (or the next element if some nodes
           have already been added to) the topological ordering of G
           "Remove" node u from G ("G - u" will continue to be a DAG)
         Until all nodes in G have been added to the topological ordering
      

      Maintain the following information:

      • incoming_count[1...n] where incoming_count[w] = remaining number of incoming edges to node w
      • S = set of remaining nodes with no incoming edges
      Initialize incoming_count[1...n] and S in O(m + n) time via a single scan through the graph.

      In order to "remove" a node u from G it suffices to:

      1. remove u from S (in O(1) time)
      2. decrement incoming_count[w] for all edges from u to w, and add w to S if incoming_count[w] hits 0.

    2. (25 points) Write your algorithm (in pseudo-code) here. Remember that your algorithm should run in O(m+n) time.

      Solution:

      Both the algorithmm and its analysis here rely on the fact that the graph is represented by an adjacency list, and that the set S is represented with a list for which adding or removing the 1st element takes O(1) time.
      {
      
                                                     Instruction #Iterations Total
        /* Initialization */
      
        For each node u in G do {                            O(1)     * n    = O(n)
            incoming_count[u] := 0                           O(1)     * n    = O(n)
        }
      
        For each node u in G do {                            O(1)     * n    = O(n)
            For each edge (u,w) adjacent to u do {           O(1)     * m    = O(m) 
                incoming_count[w] := incoming_count[w]+1     O(1)     * m    = O(m)
            }
        }
      
        S := empty list                                                      = O(1)
      
        For each node u in G do {                            O(1)     * n    = O(n)
            if incoming_count[u] = 0 then {                  O(1)     * n    = O(n)
                Add u as the first element of S              O(1)     * n    = O(n)
            }
        }
      
        i:= 1                                                                = O(1)
      
        /* end of initialization */
      
        While S is not empty do {                            O(1)     * n    = O(n)
      
            u := first element in S                          O(1)     * n    = O(n)
            topological_ordering[i] := u                     O(1)     * n    = O(n)
            
            For each edge (u,w) adjacent to u do {           O(1)     * m    = O(m) 
                incoming_count[w] := incoming_count[w]-1     O(1)     * m    = O(m) 
                if incoming_count[u] = 0 then {              O(1)     * n    = O(n)
                    Add u as the first element of S          O(1)     * n    = O(n)
                }
            }
            i := i + 1                                       O(1)     * n    = O(n)
        }
      
        return topological_ordering[1..n]                                    ≤ O(n) 
                                                                            --------
      
                                            T(n) = 2*O(1) + 11*O(n) + 4*O(m) = O(m + n)
      }
      

    3. (15 points) Show that the runtime of your algorithm is O(m+n). [You can do so above neatly next to each instruction of the algorithm, or below in the space provided.]

      Solution:

      See runtime analysis above, together with the algorithm.