WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS2223 Algorithms
Homework 2 - B Term 2013

PROF. CAROLINA RUIZ

Due Date: Nov. 12, 2013 at 1:00 PM code, and 2:00 PM written report
See course policy regarding late homework submissions

------------------------------------------

Homework Instructions:


Homework Problems:

  1. (70 points) Problem 1. Consider the following problem: Given n sets S0, ..., Sn-1 where each of the sets Si is a SUBSET of {0,1,...,n-1}, determine if there is a pair of sets among the ones given, say Si and Sj, that are disjoint (that is, don't have any elements in common).

    The following algorithm solves this problem:

    INPUT:
    n: number of sets
    S0, ..., Sn-1, where Si is a SUBSET of {0,1,...,n-1}.
    
    OUTPUT:
    True, if there is a pair of the input sets Si and Sj that are disjoint (in this case, i  and j are printed).
    False, otherwise. 
    
    
    ALGORITHM: 
    
    
      cost of each instruction # of times instruction will be repeated
     for each set Si:
         for each other set Sj:
             disjoint = true
             for each element k of Si:
                 if k belongs to Sj:
                     disjoint = false
             if disjoint == true
                 print 'Sets ',i,j,' are disjoint'
                 return(true)
    
     return(false)
    
       
      We will implement the above algorithm in Python. Consider n as the size of the input. The runtime of the algorithm, T(n), will depend on the data structure we use to represent the input sets. Below we will consider two different, alternative data structures to represent these sets, and see how T(n) is affected by the choice of data structure.

      T(n) will be affected also by how you implement the "for each other set Sj" instruction. Note that it is enough to consider j > i only. [Hint: Studying the runtime analysis of InsertionSort in section 2.2 of the textbook will help you think of ideas to express the number of iterations of the instructions in the algorithm above.]

    1. Implementation Version 1: Suppose that we use a Python list to represent each set. As an example, take n = 5 and S0 = {2, 3, 0}, S1 = {4, 2}, S2 = {1, 4, 3, 0}, S3 = {2}, S4 = {}.
      We will represent the list of these sets in Python as:
      [[2, 3, 0], [4, 2], [1, 4, 3, 0], [2], []]
      
      1. (10 points) Use worst case analysis to analyze the runtime of the algorithm if it is implemented using this data structure. That is, calculate the function T(n) for this algorithm by estimating the cost of performing each of the instructions above, and the number of times that each instruction would be executed in the worst case. Explain your work.
        In order to determine "if k belongs to Sj" in the algorithm above, traverse the list Sj element by element until either you find k, or you have finished traversing the list. That is, do not use any specialized Python functions to search a list (since you wouldn't know the time complexity of such a Python function to include in your runtime analysis).
      2. (5 points) Provide a function g(n) such that T(n) = O(g(n)). Make the asymtotic upper bound g(n) as tight as possible.
      3. (5 points) Write a rigorous mathematical proof showing that T(n) = O(g(n)).

    2. Implementation Version 2: Suppose that we use a Python list to represent each set, but this time a set will be represented as a list of size n where position k of the list will be 1 if k belongs to the set, and 0 otherwise. As an example, take n = 5 and S0 = {2, 3, 0}, S1 = {4, 2}, S2 = {1, 4, 3, 0}, S3 = {2}, S4 = {}.
      We will represent the list of these sets in Python as:
      [[1, 0, 1, 1, 0], [0, 0, 1, 0, 1], [1, 1, 0, 1, 1], [0, 0, 1, 0, 0], [0, 0, 0, 0, 0]]
      
      1. (10 points) Use worst case analysis to analyze the runtime of the algorithm if it is implemented using this data structure. That is, calculate the function T(n) for this algorithm by estimating the cost of performing each of the instructions above, and the number of times that each instruction would be executed in the worst case. Explain your work.
        Once again, in order to determine "if k belongs to Sj" in the algorithm above, do not use any specialized Python functions to search a list. Use the obvious way to determine if k belongs to Sj that takes advantage of the set representation used in this Version 2.
      2. (5 points) Provide a function g(n) such that T(n) = O(g(n)). Make the asymtotic upper bound g(n) as tight as possible.
      3. (5 points) Write a rigorous mathematical proof showing that T(n) = O(g(n)).

    3. Experimental Comparison of the two Versions:
      1. (5 points) Implement each of the two versions of the the algorithm in Python as two separate functions inside the same Python program:
            def DisjointSetsVersion1(L):
                ...
        
            def DisjointSetsVersion2(L):
                ...
        
      2. For each of the following values of n, n = 100, 200, 300, 400, 500:
        1. (5 points total) Randomly generate an input of size n. That is, a collection of n sets as described above.
        2. (5 points total) Convert this input into each of the two data representations (version 1 and version 2).
        3. (5 points total) Run each of the two functions on this input, timing its execution. Include the execution times in your report.
      3. (10 points) Plot a graph where the x axis corresponds to n and the y axis corresponds to execution time. Include in this plot all the execution times recorded for the two functions and for each of the values of n. Use some convention (e.g., color) to distinguish between the execution times of DisjointSetsVersion1 and DisjointSetsVersion2 in the plot. Describe any observations you can make about this plot. Do the growth of these curves match the asymptotic behavior of these functions described by your Big-O analysis above? Explain.

      Submit your code on myWPI.


  2. (70 points) Problem 2. Rank the following functions by order of growth. That is, find an arrangement f1, f2, ..., f6 of the functions satisfying f1 = Ω(f2), f2 = Ω(f3), ..., f5 = Ω(f6). Partition your list into equivalent classes such that functions f(n) and g(n) are in the same class if and only if f(n) = Θ(g(n)).
    an
    na
    ana
    loga(n)
    aa*n
    nloga(n)

    where a is a constant greater than 1.

    Provide a detailed, rigorous proof of each part of your solution.

  3. (20 points) Problem 3. Find a function f(n) and a function g(n) such that f(n) ≠ O(g(n)), f(n) ≠ Ω(g(n)), and f(n) ≠ Θ(g(n)). Explain your answer in detail.

  4. (15 points) Problem 4. Write a formal mathematical proof of the following property of Big-O.
    Transitivity of Big-O: If f(n) = O(g(n)) and g(n) = O(h(n)) then f(n) = O(h(n)).

  5. (25 points) Problem 5. Consider the following sorting methods: bubble sort, insertion sort, merge sort, and quick sort, as described and implemented in the online textbook Problem Solving with Algorithms and Data Structures by Brad Miller and David Ranum.
    1. Read the description of each of these 4 sorting methods.
    2. Include in a single Python program the code provided on the aforementioned webpage implementing each of these 4 sorting methods.
      Note:If Python produces "RuntimeError: maximum recursion depth exceeded", then increase the number of recursion levels allowed by adding the following code to your program:
      import sys
      sys.setrecursionlimit(12000)
      
      I include here the skeleton of a Python program that I wrote which satisfies the requirements below.

    3. Let n be the size of the list to be sorted. For each n, n = 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000, generate a list of n elements in decreasing order:
       
      [n-1, n-2, ..., 2, 1, 0]
      
    4. (10 points) Run each of the 4 sorting functions on this input, timing its execution. Include the execution times in your report.
    5. (15 points) Plot a graph where the x axis corresponds to n and the y axis corresponds to execution time. Include in this plot all the execution times recorded for the 4 sorting functions and for each of the values of n. Use some convention (e.g., color) to distinguish between the execution times of the different sorting functions in the plot. Describe any observations you can make about this plot. Do the growth of these curves match the asymptotic behavior of these functions described by the Big-O analysis of these sorting methods on the webpage? Explain.

    Submit your Python program on myWPI.