Homework 5: Memoization and Graphs

Please document all of your classes and methods with Javadoc for this assignment.

1 Problem 1: Lehvenstein Distance

Have you ever used a software application (such as a text-messaging tool) that tries to automatically correct or complete your typing? For example, if I typed "Jaca" in a smartphone, the editor might list "Java" as an alternative. Such applications find alternatives using a measure of the "closeness" of two words. One standard metric for closeness is called Levenshtein distance. For this problem, you will write a memoized function to compute the Levenshtein distance between two strings.

Levenshtein distance computes how many operations it would take to convert one string into another. Valid operations here include inserting a character, deleting a character or substituting one single character with another. For example, we could convert "baking" to "masking" by (1) changing "b" to "m" and (2) inserting an "s".

Intuitively, the algorithm compares all the prefixes of the two strings, computing the minimum number of operations needed to convert from one prefix to another (since the operations are all invertible, we get the same number regardless of whether we convert from prefix A to prefix B or from prefix B to prefix A). You can visualize this through a matrix. The characters of one string label the columns; the characters of the other label the rows (with an additional column/row at the left and top for the empty prefix). Each cell contains a number that indicates the minimum number of edits needed to convert between the prefixes up to that cell’s row and column. The Wikipedia description of Levenshtein distance shows a complete example.

The matrix representation is helpful because we compute the value of one cell from the values of its neighboring cells. Specifically:

If one string is empty and other has k characters, the Levenshtein distance is k (insert or delete all the characters).
Otherwise, let the first string be Sx and the second be Ty, where S and T are strings and x and y are characters. If x=y, then the Levenshtein distance is the same as the distance between S and T. Otherwise, we will convert these strings either by
- Converting S to Ty, then deleting y,
- Converting Sx to T, then inserting y,
- Converting S to T, then substituting y for x.
We will choose whichever option requires the fewest operations. Thus, we compute the value of a cell by taking the minimum value of its left/upper neighbors and adding 1 (to process x/y).

At this point: stop and work a couple of examples by hand. Forget code. Examples will help you understand how this algorithm is working.

1.1 Implementing the Algorithm

Write a function that takes two strings and computes their Lehvenstein distance. You may simply put this function your Main class. However, do not use arrays (if you know them) to construct the table. Instead, do this through a function that takes the row and column index of a cell and uses a memoization table to store the results so far. Your function should use your memoization data, so that you never actually compute a value for the same cell more than once.

In writing this, you will want the String method charAt, which takes an integer and returns the character at that position into the String. The first character in the string is at index 0. The largest index is one less than the number of characters in the string.

If you do this right, we should not see any sort of iteration (for loop, etc) over the indices into the string. Your implementation should be recursive, starting with the indices of the last position of each of your strings, and computing the distance for smaller indices as needed by the algorithm described above. Again, memoize your function so that you never compute the value of any given cell more than once.

2 Traversing Graphs

Airlines are struggling to make a profit. A handful of regional, low-fare airlines have decided to merge into a unified network, hoping to entice more business with broader low-cost service. Since each airline used to operate its own services, the new "national" airline has some gaps. For example, if one partner airline serviced only the southwest and another only the midwest, the merged airline will need at least one flight between the southwest and the midwest to meet its service goals.

The flight-scheduling team of the merged airline has already created a large graph of all the existing flights offered across the partner airlines. You have to write a program to (a) help find the gaps in service and (b) suggest how they might fill those gaps. There may have been gaps in the service within an individual regional airline, so viewing the gaps through the regional airlines’ networks will not help here.

To find the gaps in service, we first need to break the national flight graph into subsets of cities that can reach one another, but cannot reach cities in any other subset. For example, assume we have flights between Boston/Worcester, Boston/Hartford, Chicago/Denver, Phoenix/Houston, and Houston/Tulsa. This graph would yield three subsets: (Boston, Worcester, Hartford), (Chicago, Denver), and (Phoenix, Houston, Tulsa). We’ll refer to each subset as a network.

Create a class Graph that contains a list of Nodes (as developed in class), with one node per city. Unlike in class, a connection between two cities should denote that there are flights in either direction between the cities (in other words, if you can fly from city A to city B, you can fly from city B to city A).
Write a method getNetworks on graphs that returns a LinkedList of networks, where each network is a LinkedList of city names (Strings).
Write a method newFlights on graphs that returns a list of pairs of city names such that adding flights between each pair would turn the entire graph into one network. Your returned list should have as few pairs as possible, but does not need to account for any other constraints (such as the distance between cities for the proposed flights).

Hint: Examples and Test Cases!!! Work out your examples on paper. Be sure to understand these problems thorougly away from the code before you start to code. Once you understand these problems, the code isn’t that hard. Without understanding the problems, you’ll get nowhere fast. Pictures are your friend here.

2.1 Extra Credit

For extra credit, account for the distances between cities when recommending new flights. You’ll have to store distances somewhere and reference it appropriately when proposing new flights. Your documentation should describe the extent to which your approach minimizes distances within the new flights. You do not have to minimize distances over the entire network (though you can try this if you want to).

3 What to Turn In

Submit the .java files for all classes, interfaces, and examples developed for this assignment. Organize your files into one subdirectory for each of the Lehvenstein-distance and the graph-traversal problems. Do not submit the .class files. Submit a separate file for each class/interface. You do not need to turn in your Javadoc files, but your code should be annotated such that we could generate Javadocs from your files.

4 Grading

We will be looking for evidence that you understand the algorithms at hand, as evidenced by your code and test cases. We expect to see Javadoc documentation on all classes and methods, good tests, and clean code.

1	Problem 1: Lehvenstein Distance
2	Traversing Graphs
3	What to Turn In
4	Grading