Program Planning
Up until now, we have written list-processing programs that all follow a similar structure:
Create a single variable for the running result – this variable has the same type that the function returns
Write a for-loop over the input list, updating the running result variable as you visit each element
Return the running result at the end of the loop.
This structure is how you express template-style programs from 1101 in Java. The pattern works if your program simply builds its result from accumulating information about each element in turn. In practice, sometimes list programs do more than accumulate a running result. Then, we need different programming patterns.
For the next two lectures, we are going to work through examples of different list-processing situations and the program structures that you can use to solve them. We will discuss different approaches that you can take to the same problem, as different approaches have different characteristics in practice. We will show you some techniques for breaking a problem down into separate methods/functions. The goal of these two lectures is to help you plan out solutions to programming problems.
1 Rainfall
Recall the Rainfall problem that you worked on in lab:
Write a program called rainfall that consumes a LinkedList<Double> representing daily rainfall readings. The list may contain the number -999 indicating the end of the data of interest. Produce the average of the non-negative values in the list up to the first -999 (if it shows up). There may be negative numbers other than -999 in the list (representing faulty readings). If you cannot compute the average for whatever reason, return -1.
Let’s look at one common solution to this problem:
class Rainfall { |
double rainfall1(LinkedList<Double> data) { |
double sum = 0; |
int count = 0; |
|
for (Double d : data) { |
if (d == -999) { |
if (count > 0) { |
return sum/count; |
} |
else { return -1; } |
} |
else if (d >= 0) { |
sum = sum + d; |
count = count + 1; |
} |
} |
if (count > 0) { |
return sum/count; |
} |
else { return -1; } |
} |
} |
This solution traverses the list, updating the sum and count variables for every non-negative element before -999. When either -999 or the end of the list is reached, the program returns the average.
Notice that this program does NOT follow the default pattern. We have two variables (one tracking count and one tracking sum), one of which has a different type than the return type of the function.
This is because Rainfall is a problem with multiple subtasks. Rainfall involves
computing the sum (part of average)
counting the elements (part of average)
filtering out the negative numbers
stopping at the -999
Note that two of these tasks require traversing the list (summing and counting). You therefore can’t use the default pattern that we’ve had (with only one variable), as one variable can only track the results of one list traversal. In this solution, we use two variables: one to hold the count results and one to hold the sum results.
Note: Being able to look at a programming problem and identify the (sub)tasks that the problem requires is a critical skill in programming. Identifying tasks up front can give you ideas for structuring your code.
Note that since the default pattern doesn’t work, you can’t write rainfall as a simple template function either (in which we had lists that we could process recursively). You actually need to write at least two traversal functions (or expressions with map/filter/fold) in order to write rainfall, or you need a single recursive function with two accumulator variables (one for each of the running sum and count).
How else might you have done this problem? Well, you might have thought about how to leverage the existing size method on lists: if you cleaned out the data so that you only had the numbers you want, you could use size as part of a helper that computes the average of a list.
// compute average of list, returning -1 if list is empty |
double average(LinkedList<Double> numList) { |
if (numList.size() > 0) { |
double sum = 0; |
for (Double d : numList) { |
sum = sum + d; |
} |
return sum/numList.size(); |
} |
else { return -1; } |
} |
|
// rainfall version that cleans out the data then |
// uses helper for the average |
double rainfall2(LinkedList<Double> data) { |
LinkedList<Double> clean = new LinkedList<Double>(); |
|
for (Double d : data) { |
if (d == -999) { |
return average(clean); |
} |
else if (d >= 0) { |
clean.add(d); |
} |
} |
// we need this here as well, in case there is no -999 in the list |
return average(clean); |
} |
Note: When we ask you to come up with multiple solutions with different structures (as in hwk3), we are asking you to cluster the subtasks differently across solutions. These two solutions show an example of this.
For those coming from Racket – ask yourself whether you would have preferred a different solution structure had you been in Racket (perhaps using multiple template functions, or perhaps using map and filter).
1.1 Tradeoffs Across Solutions
Which of these two rainfall solutions do you prefer and why? What are the tradeoffs surrounding each one? Here are some thoughts.
The first one traverses the list only one time, so it runs a bit faster (note, however, if all of your lists are fairly small, the time difference isn’t that important).
By virtue of cleaning out the data, the second one lets us use existing methods (in this case, size – you might well also have an average method lying around).
By virtue of cleaning out the data, the second one lets you more easily do other computations on the useful data if your application warranted that.
Some people will find each of these solutions more readable easier to read than the other.
2 Max Triple Length
Let’s try this again on a different problem.
Write a program called maxTripleLength that consumes a LinkedList<String> and produces the length of the longest concatenation of three consecutive elements. Assume the input contains at least three strings.
For example, given a list containing ["a", "bb", "c", "dd", "e"], the program would return 5 (for "bb", "c", "dd").
You don’t have to actually concatenate the strings to solve this, but if you want to, you can do this with +, as follows:
"go " + "goats"
First, what are the tasks within this problem? We have to (a) identify the triples, (b) compute the length of each triple, and (c) compute the max length across the triples. The first and third involve traversals, but the second does not. That means we need either a variable or a loop/helper to manage each of the first and third tasks.
When thinking out how to write something like this, it can help to draw out an example, starting from a concrete sample input. For example:
"a" "bb" "c" "dd" "e" |
------------------- <-- length 4 |
------------------- <-- length 5 |
----------------- <-- length 4 |
Looking at the sketch suggests a couple of ways we could go about this problem:
Write a helper to create a list of explicit triples, then have a loop that traverses that list to compute the max.
Write a helper to create a list of triple lengths, then have a separate function to get the max number from that list (most languages provide a built-in max function).
Write a single loop that looks at three elements at a time within the list, computing both the length of the triple and the max.
Let’s start with the third one. We need to be able to traverse the list and look at three consecutive elements in each pass. The for-loops we have seen so far give us only one element at a time. They seem a poor fit for this problem. Java (and most other languages) provides a second style of for-loop that iterates not over a list, but over a variable with fixed increments. If you had a loop with a variable that kept incrementing (e.g., 0, 1, 2, 3, ...), you could use that variable to pull out list elements based on their position.
Let’s see an example of this loop style, using a simple function to sum the elements of a list of integers:
int sum(LinkedList<Integer> numList) { |
int sum = 0; |
|
for (int i = 0; i < numList.size(); i = i+1) { |
sum = sum + numList.get(i); |
} |
return sum; |
} |
There are three components to the for line: (1) create and initialize a variable (i), (2) indicate the condition (on that variable) under which the loop should finish, and (3) indicate how the value of the variable should change from one iteration to the next.
In this example, we want a variable i that takes on the values from 0 up to (but not including) the size of the list, in increments of 1. (The values run from 0 to under the size of the list because in Java the position of the "first" element is 0, not 1 – most programming languages start positions from 0).
Inside the loop, we use a method called get on lists, which retrieves the element at the given position. So numList.get(0) gets the first element, and so on.
Now let’s use this style of for-loop to write our MaxTriple code. We want to iterate through all the positions at which triples can start, so we stop the loop when the position index is 2 shorter than the size of the list. Each time through the list, we get the element at the current index i (positions into lists are usually called indices in programming terminology). But since i is an integer, we can also get neighbors of i by computing other indices based on i.
class MaxTripleTwo { |
public int maxTripleLength (LinkedList<String> args) { |
int maxLength = 0; |
|
for (int i = 0; i < args.size() - 2; i++) { |
String concat = args.get(i) + args.get(i+1) + args.get(i+2); |
if (maxLength < concat.length()) |
maxLength = concat.length(); |
} |
|
return(maxLength); |
} |
} |
2.1 Which kind of for loop to use?
As a general rule, you should use the per-element loop that we showed you last week whenever possible, shifting to this index-based approach only when necessary. "Necessary" includes cases like needing to access several elements per iteration, or needing to skip over some elements in a predictable pattern (like looking only at every other element). If you don’t need the flexibility of the index, use the per-element form instead.
Why? In part, flexibility opens opportunities for mistakes. Once you have the variable for the index floating around, it can be easy to confuse the index for the element at that index. This error happens all the time. For example, you might accidentally use the expression sum = sum + i in the sum-a-list example above. It’s a very common mistake, and one you simply avoid with a per-element loop.
In part, the per-element style is more direct to read, because the reader knows exactly how the loop goes through the elements. With the index-based for loop, someone has to read and check that you are just incrementing the position one at a time. Why waste a colleague’s brain cycles checking that, when the other style fixes the iteration pattern?
Finally, the index-based loop is something of a historical artifact. In the earlier days of programming, programs viewed data as closely tied to computer memory, and how items were laid out in memory. Index-based for loops started as a reflection of how and where individual elements were stored "under the hood". As languages get more expressive, and capture programs at the level of data more than the level of memory, the idea of "iterating through the data" is simply a better conceptual match than "iterating through how the data is organized in memory". More and more languages are providing constructs that take a per-element view of iterating over data (list comprehensions in Python, map/filter in Racket, and so on). These are becoming the new standard idiom, unless you need to fall back on the index-based loops for a particular reason.
For those with prior programming experience, this may run counter to how you were taught. Textbooks often lag many years behind practice (given the nature of publishing cycles), and some instructors stick to the patterns they first learned as novice programmers. The per-element loops are, however, now accepted as the better practice.
2.2 A MaxTriples Version that Pulls out the Triples
Let’s go back and look at another proposal for processing the triples: namely, we first create a list of explicit triples, then we find the length of the longest in that list.
First, we will need a class to store the triples.
class Triple { |
String a, b, c; |
|
Triple (String a, String b, String c) { |
this.a = a ; |
this.b = b ; |
this.c = c ; |
} |
} |
How do we convert a list of strings into a list of triples? We can use the index-based for as with the previous solution, but instead of computing the max length, we simply create and store triples. First, create the triples:
public LinkedList<Triple> BreakIntoTriples (LinkedList<String> args) { |
LinkedList<Triple> TripList = new LinkedList<Triple>(); |
|
for (int i= 0; i < args.size()-2; i++) { |
TripList.add(new Triple(args.get(i), |
args.get(i+1), |
args.get(i+2))); |
} |
return(TripList); |
} |
Now we use this as a helper method in our loop that computes the max (this assumes we’ve added a totalLen method to the Triples class):
public int maxTripleLength (LinkedList<String>args) { |
int maxLength = 0 ; |
|
for (Triple t : BreakIntoTriples(args)) { |
if (maxLength < t.totalLen()) { |
maxLength = t.totalLen() ; |
} |
} |
return(maxLength); |
} |
Notice what this version achieves: it puts different tasks into different methods. This enables us to reuse methods across problems, and gives each method a more focused purpose.
The idea of reshaping your input data (here, from a list of strings to a list of triples) is called parsing (though you may find the term reshaping more intuitive for now). Reshaping your data can make later tasks much simpler. When facing a new problem, it is always worth asking whether the problem would be easier to solve if you just had the data in another format. If so, write a helper to convert the shape, then write the simpler program for the core computation.
3 Summary of Design Steps
This lecture has presented several ideas about how to approach a programming problem with multiple tasks. Summarizing in one place:
Identify the tasks within the problem before you try writing code. If you start with code, you’re almost certain to start having the same loop implement little bits of different tasks at the same time, and you will end up with a mess. A little pre-planning goes a long way.
Write down a rich, concrete example of the input to your problem and annotate it (literally sketch on paper) to show how that data gets broken down or used across the tasks in a problem: identify the triples, or write down a second list for Rainfall that has all the irrelevant data removed. Once you see how the data transforms through the problem, you can consider writing helper methods to do each transformation.
Figure out which tasks involve traversing the data (either the original data or a cleaned/reformatted version). When you get to write the actual code, you will need either a separate loop (whether through a build-in method or a manual loop) for each traversal task, or a separate "running" variable to track the progress of each task being done within the same loop.
Consider whether the program might benefit from one of the two concrete strategies we discussed here: cleaning out irrelevant data, or reshaping/parsing data into a more manageable format.
4 Several Other Problems to Try
Here are several additional problems on which to practice planning. For each one, make sure you can list out the tasks and think of different ways to organize the tasks. At least during lecture, being able to break the problem into tasks is more important than actually producing the code.
4.1 Adding Machine
Design a program called addingMachine that consumes a list of numbers and produces a list of the sums of each non-empty sublist separated by zeros. Ignore input elements that occur after the first occurrence of two consecutive zeros.
If you finish this one quickly in Java and went through 1101/1102, try this in Racket. Does Racket lead you to a different style of solution?
The following table lays out the tasks. The two solutions are in this file, labeled "Version 1" and "Version 2". The first does a single loop, the second reshapes the data then does a pass to sum the reshaped data. The table shows how each task is realized in each solution.
Task |
| Traversal? |
| Captured V1 |
| Captured V2 |
Identify sublists between 0s |
| yes |
| variable i in adding1 |
| variable i in breakIntoSublists |
Sum sublists between 0s |
| yes |
| currSum variable |
| inner for loop in sumSublists |
Locate 00 |
| depends |
| if-test |
| if-test |
Construct output list of sums |
| yes |
| results variable in adding1 |
| results variable in sumSublists |
We list "Locate 00" as "depends" in the traversal column because you could choose to track the 00 as you go (no separate traversal), or you could choose to truncate the data after the 00 as a first pass, then just process each item separately (with a separate traversal).
(I know some of you want to see a Racket solution – I haven’t had time to write that up yet.)
4.2 Shopping Cart
An online clothing store applies discounts during checkout. A shopping cart is a list of the items being purchased. Each item has a name (a string like “shoes”) and a price (a real number like 12.50). Design a program called checkout that consumes a shopping cart and produces the total cost of the cart after applying the following two discounts:
if the cart contains at least 100 worth of shoes, take 20% off the cost of all shoes (match only items whose exact name is "shoes")
if the cart contains at least two hats, take 10 off the total of the cart (match only items whose exact name is "hat")
Use the following classes for items and carts:
class CartItem { |
String name; |
double price; |
|
CartItem (String name, double price) { |
this.name = name; |
this.price = price; |
} |
} |
|
// A sample cart in your Examples class |
LinkedList<CartItem> cart; |
4.3 Least Healthy Teams
A company maintains records on its employees. For each, it stores the person’s name, which team they are on on, and how many days they have been out sick. The company tracks the health of each team by totalling the number of sick days across all members of the team.
Write a method leastHealthy that takes a list of employees and returns a list of the team names in order from the one with the most to the one with the least sick days. Here’s a class of Employee to get you started.
class Employee {
String name;
String team;
int missedDays;
}
4.4 Nearest Point Distance
Imagine that you had an application that was tracking devices as they moved around (such as tracking phones, flying drones, etc). You need a program to figure out how close the nearest tracked device is from some specified point. Because of limitations in your tracking application, rather than have a list of locations, your program will get a list of all the locations, but flattened out into a sequence like
device1-x device1-y device2-x device2-y ... |
(where device1-x and device1-y are the x and y coordinates of device1, and so on. We’ll work with just 2-dimensional coordinates and check the distance to the origin (0,0) to keep things simple).
Write a program nearestPointDistance that takes a LinkedList of integers for the locations as shown above and returns the distance (a double) of the closest point to the origin (the coordinate at (0,0)). For example:
nearestPointDistance([3, 4, 0, 2, 5, 12]) should return 2
How do we get 2? The distance from (3,4) is 5, the distance from (0,2) is 2, and the distance from (5,12) is 13. The smallest of these distances is 2.
As a reminder, given a point (x,y), its distance to the origin is given by the formula
sqrt(x*x + y*y) |
in Java, you get the sqrt of a number num by writing Math.sqrt(num). This function returns a double. To use it, include the following import in your file:
import java.lang.Math; |