;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; CS1101 Lab 6 - Assignment with application in DNA Sequencing ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Lab Motivation and Goals: ;; This lab is designed to help you practice ;; - using the set! assignment operator ;; - deciding when use of assignment is appropriate ;; - writing programs that work on a DNA database ;; ;; Background Information: ;; In the last few decades, scientists have made rapid progress in sequencing ;; genes and proteins for many organisms, including humans. This progress ;; produced a large amount of data, but the only way for scientists to find ;; information about sequenced genes was to search through numerous published ;; articles. This was time consuming and inefficient, but advances in computer ;; science and database technology provided the solution to this problem. ;; ;; Today, there are several biological databases that contain long strings of ;; nucleotides that make up DNA (guanine, adenine, thymine, cytosine and ;; uracil) and which are associated with certain genes. The nucleotides are ;; stored as abbreviations (G, A, T, C, and U, respectively) in order to save ;; disk space. Scientists use these databases to store, organize, and analyze ;; genetic sequences. The ease of using such databases has greatly contributed ;; to the rapid developments in genetics research. ;; ;; In this lab, you will develop a small version of a biological database that ;; contains DNA fingerprints of people who have been arrested in a city. Each ;; person's DNA is unique, so law enforcement official are beginning to use DNA ;; fingerprinting to identify criminals. Since actual DNA sequences are very ;; long, for the purposes of this lab we will limit the length to 6 nucleotides. ;; Your database needs to store many sequences and the names of people that have ;; those specific sequences. You will also write various storing, ;; organizational, and analytical tools that can be used by forensic scientists. ;; ;; It is important to be able to keep the items in a database the same for all ;; functions that can be called on it. For example, if a police officer adds a ;; new DNA sequence to the database, that sequence needs to be present when a ;; forensic scientist looks it up later. To do this, you will need to use ;; assignment operators. ;; ;; Use the following sample information to test your programs. ;; ;; NAME DNA SEQUENCE ;; ---------------------------------- ;; Julius No ACT GAT ;; Rosa Klebb TCU CCT ;; Auric Goldfinger CTG GAA ;; Ernst Blofeld ACT GTA ;; Kamal Khan GUA TCG ;; Hugo Drax ATA AAT ;; Max Zorin GGC UAT ;; ;; You will need to use the following structs to write your programs: ;; ;; A dna-sequence is s ;; (make-dna-sequence string string string string string string) (define-struct dna-sequence (n1 n2 n3 n4 n5 n6)) ;; A dna-fingerprint is a ;; (make-dna-fingerprint string string dna-sequence) (define-struct dna-fingerprint (first-name last-name dna-seq)) ;; A dna-database is either ;; - empty, or ;; - (cons dna-fingerprint dna-database) ;; ;; You will be working with two databases, worcester-db and boston-db, which are ;; defined as constants below. These databases are currently empty, but you will ;; use your functions to add information into them. (define worcester-db empty) (define boston-db empty) ;; Exercises: ;; 1. In order to use the database, you will need to implement some searching ;; capabilities. ;; ;; (a) Write a function find-by-name, which takes in a first name, a last ;; name, and a DNA database and returns the database entry for the person ;; with the given name. You may assume that names are unique and do not ;; repeat in the database. If the person's DNA fingerprint is not present, ;; return false. ;; ;; (b) Next, write a function find-by-dna-seq, which takes in a dna-sequence ;; and a DNA database and returns the database entry containing that ;; sequence. Again, if the entry does not exist, return false. ;; 2. It is also necessary to be able to update a database with new information ;; as it is acquired. ;; ;; (a) Write a function add-to-worc-db, which takes in a ;; dna-fingerprint and adds it to the worcester-db. Make sure ;; that duplicate entries are not added to the ;; database. The function add-to-worc-db-cons should have the following ;; contract: ;; ;; add-to-worc-db-cons : dna-fingerprint -> void ;; ;; (b) Try to use your function above to add at least two of the sample ;; criminals listed at the beginning of this assignment to the worcester-db. ;; ;; (c) Now use one of the find functions you wrote in exercise 1 to look up a ;; criminal that you have just added. What result do you expect to get? ;; Is that what happens? ;; ;; (d) Since you are working with two databases, it is tempting to write a ;; general function add-to-database, which takes in a dna-fingerprint and ;; a dna-database and adds the print to the given database. This function ;; would make it possible to add prints to any database, without limiting ;; it to just Worcester. Modify your add-to-worc-db function to take in ;; a database as an argument and perform all the actions on this ;; database. What happens when you run this program? ;; ;; (e) Write down what you have observed regarding how set! works. Why is it ;; possible to use one find function for multiple databases, but not ;; possible to do the same for updating databases? ;; 3. Since it is sometimes necessary to delete records from a database, write a ;; function delete-from-worc-db, which takes in a dna-fingerprint and ;; deletes it from the worcester-db. ;; Everybody should be able to finish up to this point. ;; 4. In order to generate reports and to enhance the performance of your ;; database, it is useful to have built-in sorting tools as well as to be ;; able to extract only necessary information. ;; ;; (a) Write a function sort-by-name, which sorts all the records in ;; worcester-db by the last name of the criminal. ;; ;; (b) Write a function extract-names, which first sorts worcester-db using ;; the function you just wrote, and then returns a list of just the last ;; names of all the people in the database. ;; 5. Sometimes you may not have all the necessary information available to you ;; right away, so you might need to do searches through the database with ;; partial information. Write a function find-by-incomplete, which takes in a ;; dna-sequence, with 'unknown marking the unknown nucleotides, and a ;; database and returns the entry of the best match for that sequence. A best ;; match is one that has the most nucleotides matching the given sequence.