CS 2102 Homework Assignment #2

Assigned: Tuesday October 31, 2006, 10:00 AM
Due: Tuesday November 07, 2006, 10:00 AM

Guidelines:

Now Available. Now that we have progressed to supporting input from the keyboard, you must include the appropriate test cases with your homework assignment. These test cases will be graded, so try to develop ones that are meaningful for your problem.

Sample Data

For questions 3 and 4, you will need to create some files on which to test your program. Please make sure that you document properly the sample test cases that you come up with to test your program. In an effort to help you with questions 3 and 4, I have come up with the following sample data that you could use:

Sample Input File	Sample Sequences File
actgactgaacgtacgtacgggcatcagctgactacttatcgtacgtagct	ctgaacgtac cagctgatgcccgtacg tacttatcg agctgatcgtgctagtacca tcagtcagt

Description

[20 pts.] Write a program that displays the prompt "Enter a line of text and I will print it in reverse", then reads from the keyboard a line of input (containing any number of characters including white space) terminated by the ('\n') character (see p. 83 of the text). Then create a String rev that represents the reverse of the line. Finally, output rev back to the console, on a line by itself. Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.

[15 pts.] Write a program that loads up a String from a file called "input.txt". Note that in Eclipse, you will need to create this file within the project where your homework is being written. To do this, follow these instructions:

Right-click on the Project, select New -> File. And when prompted, type the File Name "input.txt". Note that you will create your file as a direct child of the Project on which you had right-clicked.

By placing the File here, you make it possible to easily read the file by the statement:

Scanner sc = new Scanner (new File ("input.txt"));

Then you can read Strings and the like from this scanner object in the same way as shown in class.

Check out the oct-31 package for an example of loading Strings from a file.

Now, assume that this input file is composed only of characters 'a', 'c', 't', and 'g' (Why? Check this out). That is, there are no whitespace characters. Produce a report that computes the percentage of these characters in the input file and output a summary. Using the sample data provided above, the following table should be output:

a:0.2549019607843137
c:0.2549019607843137
t:0.2549019607843137
g:0.23529411764705882

The most common letter is: a

If two or more letters share the maximum percentage, then you can choose to arbitrarily select one to be output in the final statement.

Regarding formatting. You could choose to output the percentage as a raw double (i.e., 0.2301019238487) or format as a true percentage (i.e., 23.01019238487%) or reduced number of digits (i.e., 23%). The choice is yours.

Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.

[25 pts.] Write a program that loads up a String from a file called "input.txt". You should reuse the input file that you had used for question 1. This file shall be composed only of characters 'a', 'c', 't', and 'g' (no whitespace characters will be present), thus it will be composed of only a single String, which we'll refer to as target. A second file "sequences.txt" shall also be created (in the same way as done for Problem2) which contains a set of lines, each of which is a String composed of characters 'a', 'c', 't', and 'g' (Why? Check this out).

Your task is to read String s_i from "sequences.txt", one by one, and search for the first occurence (if any) of s_iwithin target. If s_iis located as a substring in target, then output "x..y + fragment" where:
- x represents the starting position of s_i within target, whose proper value is 1 through target.length(). NOTE THAT THIS IS DIFFERENT FROM THE INDEX CHARACTERS OF 0 THROUGH target.length()-1 AS WE HAVE SEEN IN CLASS.
- y represents the ending position of s_i within target, whose proper value is 1 through target.length()
- "+" is just a plus sign.
- fragment represents the first ten characters of the sequence s_i followed by "...". If s_icontains less than 10 characters in length, then you should output s_iwithout the trailing "..." characters.
If the sequence s_ican't be found in target then output "UNMATCHED" on a line by itself.

Sample Output using the above sample sequences:
6..15 + ctgaacgtac
UNMATCHED
34..42 + tacttatcg
UNMATCHED
UNMATCHED

Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.
[25 pts.] For this program, you will reuse the "input.txt' file created for Problem1, and the "sequences.txt" file created for Problem2. This time, your task is to read String s_i from "sequences.txt", one by one, and search for the first occurrence (if any) of rev(s_i) within target: You must assume that the target and s_i Strings are composed only of 'a', 'c', 't', and 'g' characters.
- s_i is the complement of s_i, created by swapping (a,t) values and swapping (c,g) values. For example, if s_i is "acggtcgattcg" then s_i is "tgccagctaagc".
- rev(s) is defined as the reverse of a String. Thus rev("acggtcgattcg") is "gcttagctggca".
So, rev(s_i) for the String s_i="acggtcgattcg" is equal to the value "cgaatcgaccgt". As with problem3, you are to produce a report. if rev(s_i) is located as a substring in target, then you should output "y..x - fragment" where:
of the form "y..x - fragment", where:
- x represents the starting position of rev(s_i)within target, whose proper value is 1 through target.length(). NOTE THAT THIS IS DIFFERENT FROM THE INDEX CHARACTERS OF 0 THROUGH target.length()-1 AS WE HAVE SEEN IN CLASS.
- y represents the ending position of rev(s_i)within target, whose proper value is 1 through target.length(). Note that y is greater than x
- "-" is just the minus sign.
- fragment represents the first ten characters of rev(s_i) followed by "...". If rev(s_i) contains less than 10 characters in length, then you should output rev(s_i) without the trailing "..." characters.
Sample Output using the above sample sequences:
UNMATCHED
31..15 - cgtacgggca...
UNMATCHED
UNMATCHED
9..1 - actgactga

Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.

[15 pts.] Write a program that reads in a sequence of n numbers from the keyboard. The user first is prompted "How many numbers are in the sequence", to which they reply with an int value n > 0. Then your program should read in n int values.

The task of your program is to (a) identify the longest sequence of identical values in a row; and (b) print that value to the console. A sample run of your program should look like the following:

How many numbers are in the sequence
6
Please enter 6 numbers separated by whitespace
3
4
4
4
5
5
The largest sequence of consecutive values is a sequence of 3 int with value 4.

Provide within the documentation of your class the sample test cases that you used to validate the correctness of your program.

If there exists multiple sequences that have the same maximal longest sequence, then you can arbitrarily choose one to output. For example, given the sequence "3 4 4 4 5 5 5" you could choose to output either 4 or 5 as containing the largest sequence of consecutive values.

Optional Non-Graded

See if you can combine Problem 3 and Problem 4 together

Deliverables

Your goal is to turnin the Project files by Tuesday November 07th at 10:00 AM Further details will be posted HERE showing the preferred means of uploading your solution to the TAs. Please be aware that no late homeworks will be accepted. This means that we will grade as zero any homework not submitted by the above turnin means.

Notes

[11/06/06 8:30 PM] Clarification to problem 4. The fragment to be reported is described as rev(si) but I don't show this in the sample output. I have instructed the TAs to accept either form as part of the output. In this homework, I have fixed the output as it should be.
[11/05/06 12:40 AM] Clarification to problem 2 regarding formatting. Sample data set provided for q3 and q4 to make it easier to debug your solutions; also provided sample output with regards to this sample data.
[11/05/06 12:14 AM] Clarification to problem 5 added.
[11/01/06 5:14 PM] There was an error in the example for question 5. It was told that there were '5' numbers in the sequence, when in fact there were six. This has been updated (in red). Updated sample output for q3 and q4.
[10/31/06 1:12 AM] Homework2 completed. Homework2 guidelines to be completed next...
[10/28/06 11:54 PM] Homework2 to be posted here.

Š2006 George T. Heineman