CS 2102 Homework Assignment #2

Assigned: Tuesday October 31, 2006, 10:00 AM
Due: Tuesday November 07, 2006, 10:00 AM

Guidelines:

Now Available. Now that we have progressed to supporting input from the keyboard, you must include the appropriate test cases with your homework assignment. These test cases will be graded, so try to develop ones that are meaningful for your problem.

Sample Data

For questions 3 and 4, you will need to create some files on which to test your program. Please make sure that you document properly the sample test cases that you come up with to test your program. In an effort to help you with questions 3 and 4, I have come up with the following sample data that you could use:
 

Sample Input File Sample Sequences File
actgactgaacgtacgtacgggcatcagctgactacttatcgtacgtagct ctgaacgtac
cagctgatgcccgtacg
tacttatcg
agctgatcgtgctagtacca
tcagtcagt

Description

  1. [20 pts.] Write a program that displays the prompt "Enter a line of text and I will print it in reverse", then reads from the keyboard a line of input (containing any number of characters including white space) terminated by the ('\n') character (see p. 83 of the text). Then create a String rev that represents the reverse of the line. Finally, output rev back to the console, on a line by itself. Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.
     
  2. [15 pts.] Write a program that loads up a String from a file called "input.txt".  Note that in Eclipse, you will need to create this file within the project where your homework is being written. To do this, follow these instructions:

     
    Right-click on the Project, select New -> File. And when prompted, type the File Name "input.txt". Note that you will create your file as a direct child of the Project on which you had right-clicked.

    By placing the File here, you make it possible to easily read the file by the statement:

    Scanner sc = new Scanner (new File ("input.txt"));

    Then you can read Strings and the like from this scanner object in the same way as shown in class.

    Check out the oct-31 package for an example of loading Strings from a file.


    Now, assume that this input file is composed only of characters 'a', 'c', 't', and 'g' (Why? Check this out). That is, there are no whitespace characters. Produce a report that computes the percentage of these characters in the input file and output a summary. Using the sample data provided above, the following table should be output:
     
    a:0.2549019607843137
    c:0.2549019607843137
    t:0.2549019607843137
    g:0.23529411764705882

    The most common letter is: a

    If two or more letters share the maximum percentage, then you can choose to arbitrarily select one to be output in the final statement.

    Regarding formatting. You could choose to output the percentage as a raw double (i.e., 0.2301019238487) or format as a true percentage (i.e., 23.01019238487%) or reduced number of digits (i.e., 23%). The choice is yours.


    Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.

  1. [25 pts.] Write a program that loads up a String from a file called "input.txt". You should reuse the input file that you had used for question 1. This file shall be composed only of characters 'a', 'c', 't', and 'g' (no whitespace characters will be present), thus it will be composed of only a single String, which we'll refer to as target. A second file "sequences.txt" shall also be created (in the same way as done for Problem2) which contains a set of lines, each of which is a String composed of characters 'a', 'c', 't', and 'g' (Why? Check this out).

    Your task is to read String si from "sequences.txt", one by one, and search for the first occurence (if any) of si within target. If si is located as a substring in target, then output "x..y + fragment" where:


    If the sequence si can't be found in target then output "UNMATCHED" on a line by itself.
     
    Sample Output using the above sample sequences:

    6..15 + ctgaacgtac
    UNMATCHED
    34..42 + tacttatcg
    UNMATCHED
    UNMATCHED

    Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.

  2. [25 pts.] For this program, you will reuse the "input.txt' file created for Problem1, and the "sequences.txt" file created for Problem2. This time, your task is to read String si from "sequences.txt", one by one, and search for the first occurrence (if any) of rev(si) within target: You must assume that the target and si Strings are composed only of 'a', 'c', 't', and 'g' characters.
      So, rev(si) for the String si="acggtcgattcg" is equal to the value "cgaatcgaccgt". As with problem3, you are to produce a report. if rev(si) is located as a substring in target, then you should output "y..x - fragment" where:
    of the form "y..x - fragment", where:
     
    Sample Output using the above sample sequences:

    UNMATCHED
    31..15 - cgtacgggca...
    UNMATCHED
    UNMATCHED
    9..1 - actgactga

    Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.
     

  3. [15 pts.] Write a program that reads in a sequence of n numbers from the keyboard. The user first is prompted "How many numbers are in the sequence", to which they reply with an int value n > 0. Then your program should read in n int values.

    The task of your program is to (a) identify the longest sequence of identical values in a row; and (b) print that value to the console. A sample run of your program should look like the following:

     
    How many numbers are in the sequence
    6
    Please enter 6 numbers separated by whitespace
    3
    4
    4
    4
    5
    5

    The largest sequence of consecutive values is a sequence of 3 int with value 4.


    Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.

    If there exists multiple sequences that have the same maximal longest sequence, then you can arbitrarily choose one to output. For example, given the sequence "3 4 4 4 5 5 5" you could choose to output either 4 or 5 as containing the largest sequence of consecutive values.
     

Optional Non-Graded

  1. See if you can combine Problem 3 and Problem 4 together

Deliverables

Your goal is to turnin the Project files by Tuesday November 07th at 10:00 AM  Further details will be posted HERE showing the preferred means of uploading your solution to the TAs. Please be aware that no late homeworks will be accepted. This means that we will grade as zero any homework not submitted by the above turnin means.


Notes

  1. [11/06/06 8:30 PM] Clarification to problem 4. The fragment to be reported is described as rev(si) but I don't show this in the sample output. I have instructed the TAs to accept either form as part of the output. In this homework, I have fixed the output as it should be.
  2. [11/05/06 12:40 AM] Clarification to problem 2 regarding formatting. Sample data set provided for q3 and q4 to make it easier to debug your solutions; also provided sample output with regards to this sample data.
  3. [11/05/06 12:14 AM] Clarification to problem 5 added.
  4. [11/01/06 5:14 PM] There was an error in the example for question 5. It was told that there were '5' numbers in the sequence, when in fact there were six. This has been updated (in red). Updated sample output for q3 and q4.
  5. [10/31/06 1:12 AM] Homework2 completed. Homework2 guidelines to be completed next...
  6. [10/28/06 11:54 PM] Homework2 to be posted here.

©2006 George T. Heineman