Interfaces and Abstract Datatypes
We’ve covered the fundamental parts of Java programs: classes, abstract classes, and interfaces. Classes capture new concrete kinds of data. Abstract classes share fields, methods, and helper functions that are common to classes. Interfaces define types based on required methods, but omit the details of fields and how the methods are implemented.
So far, we have seen one use for interfaces: to create types for data that have choices, or variants (such as many specific animals, or different options of family tree). Interfaces have many other uses, though. The general idea of summarizing what something does (specification) rather than how it does it (implementation) is one of the most fundamental concepts in all of engineering. It underlies good programming in every language, as well as good software design at a large scale. Today, we consider another programming problem that interfaces help address.
1 Three Motivating Problems
Consider the following problems:
Store all of the URLs visited from a web browser, with the ability to check whether a specific URL has been visited.
Gather all the words that someone generated during a word game (such as "write down all words can you make from the letters d e h l l l o o r w"), with the ability to add words, find out how many words the player generated, and get all of the words for purposes of scoring.
Maintain information on who is coming to a dinner party. In addition to adding and removing names, you need to be able to compare the current number of attendees to the number of chairs in your apartment.
If we were to write classes for each of these problems, we might do something like the following (where bunchOStrings is the name of some made-up type that can store a bunch of strings):
// The web browser code |
class webBrowser { |
bunchOStrings visitedURLs; |
|
bunchOStrings addURLIfNew(String newURL) { |
if (newURL not in visitedURLs) { |
// add newURL to visitedURLs |
} |
} |
} |
|
// The word game code |
class WordGame { |
bunchOStrings words; |
|
bunchOStrings addWordIfNew(String newWord) { |
if (newWord not in words) { |
// add newWord to words |
} |
} |
} |
|
// The party invites code |
class Invites { |
bunchOStrings guests; |
|
bunchOStrings addGuestIfNew(String name) { |
if (name not in guests) { |
// add name to guests |
} |
} |
} |
Hopefully, you would quickly realize that you were writing the same fundamental code several times for each of the addIfNew methods. Hopefully, you’ve developed the instinct not to do that. So what should you do about it? Based on what we’ve covered so far, there seem to be three options:
Move this common code (including the bunchOStrings field) to a super class that these three classes extend, putting addIfNew in the super class.
Make bunchOStrings a class that has the common addIfNew method
Make bunchOStrings an interface that requires an addIfNew method
Stop and think: which of these options do you like?
The super-class idea (first option) doesn’t make sense in this case. Super-classes (abstract or not) are used to share code across related classes in some common domain. These three classes (webBrowser, WordGame, and Invites) have nothing to do with one another conceptually, so they don’t belong in the same class hierarchy.
So we have to choose between making a class or an interface across these three applications.
If we made a class, we might write something like this:
class bunchOStrings { |
<Fields to hold the data> |
|
<Constructor> |
|
bunchOStrings addElt(String newElt) { ... } |
bunchOStrings remElt(String newElt) { ... } |
int size() { ... } |
boolean hasElt(String elt) { ... } |
} |
The problem here is that it assumes that each bunchOStrings warrants the same code for the methods. If we consider our motivating problems, however, different operations matter more in each one:
The collection of previously-visited URLs in a typical browser is huge. If the user wants to style or color visited URLs differently from unvisited ones (common), the browser needs to quickly search the collection for each URL in the page it is rendering.
When scoring the word game, the ability to get the longest word a player found is important (as the long words are worth more points). Searching doesn’t happen often in this application.
The party list is likely small, so the performance of particular operations probably doesn’t matter as much.
Thus, while the collection of operations we need for each problem is similar, the code that provides the best performance for each context could be rather different. A single bunchOStrings class, however, would fix one version of each method. We would be much better off creating an interface here, and several different classes (with different performance characteristics) that implement that interface. Programmers could then choose whichever of those classes performs well for their kinds of data.
An Aside: You might notice that our original premise has shifted: we started by saying we wanted to share code across our three problems, but now we are saying we might not want the same code after all. For the three specific problems at the start of these notes, that is correct: we did shift to wanting different implementations of bunchOStrings. However, you can certainly imagine that different applications might want the same implementation (such as two applications that each maintain sets of small numbers of items), so the original motivation still holds.
2 Abstract Datatypes
The moral of the story so far is that we sometimes want to talk about the same operations on a collection of data, but that doesn’t mean that they want to share the same code.
Computer Science has a term for this: an abstract datatype is a pattern of data with standard operations (such as adding members or counting members) that can be used in many contexts. With our three motivating examples, we have identified the need for some kind of data that supports several basic operations:
A notion of a collection of items
The ability to add and remove items to the collection
The ability to count items in the collection
The ability to check whether an item is in the collection
A datatype with these operations is called a Set. Ideally, we would like to have a type for sets, and then define each of visitedURLs, words, and guests as a set.
The following interface captures the required operations on sets:
interface ISet { |
ISet addElt(String newElt) ; // adds item to the set |
ISet remElt(String newElt) ; // removes item from the set |
int size(); // returns number of items in the set |
boolean hasElt(String elt); // determines whether item is in the set |
} |
Note that using the ISet interface, rather than the Set class has little impact on how we implement the classes for our motivating problems. We simply change the type of the field, as shown below for the visitedURLs example:
// The web browser code |
class webBrowser { |
ISet visitedURLs; |
|
// mark a new URL as visited |
ISet addURL(String newURL) { |
return this.visitedURLs.addElt(newURL); |
} |
|
// determine whether URL has already been visited |
boolean haveVisited(String aURL) { |
return this.visitedURLs.hasElt(aURL); |
} |
} |
2.1 Abstract Datatypes versus Data Structures
In your prior CS courses, you should have encountered Data Structures, such as structures/records, lists, and trees. A data structure is a specific shape of data and algorithms for manipulating that data to implement desired operations. An abstract datatype (abbreviated ADT throughout computer science) is a collection of operations (with the types of their inputs and outputs) but without fixing implementations. The term "abstract" reflects the lack of concrete implementations.
Even though you may only have a vague sense of abstract datatypes at this point, you should have an initial sense that data structures are implemented with concrete details (in Java, this means in classes), while abstract datatypes simply summarize names and types of operations (in Java, this means interfaces). The full story is more subtle, but you should have at least that intuition for now.
2.2 Different Flavors of Similar Datatypes
Now, let’s consider another motivating problem: tracking votes. We want to determine both how many votes were cast and how many votes were cast for each candidate. Votes are also just collections of words, so we should be able to manage votes using an ISet as well.
Stop and think: what should the result of the following expression be for each of the party guests and votes problems (assuming in one case Elmo has been invited to the party and in the other we are voting for Elmo)?
ISet s = new Set(); |
s.addElt("Elmo").addElt("Elmo").size() |
We want this expression to yield different answers in these two problems. For votes, we need to record each addition of "Elmo" as a distinct element in the set. For counting party guests, we only want to count Elmo once.
This suggests that there is more to data types than just the operations they support. We also need to know something about other properties of the datatype, as reflected in interactions between the operations.
2.3 Properties as Part of Datatypes
Our examples (including votes) point to two different datatypes:
Sets, which are collections of elements with no duplicates.
Bags, which are collections of elements that include duplicates.
We need to build this distinction into the datatypes. But how?
Here, we have to distinguish between the concept of abstract datatypes and the use of interfaces to capture (part of) them in Java. Ignore Java for the moment. If we just wanted to describe the operations on the Set datatype, what could we write down?
The difference between Sets and Bags lies in duplicate elements. Ultimately, this difference is reflected in the operations: if we add an existing element to a Set, the size of the set remains unchanged. If we add an existint element to a Bag, the size of the bag increases by 1. This suggests that describing the interactions between operations is a critical part of defining a datatype. We describe these interactions through precise, mathematical statements about the effects of operations.
As an example, here is the ADT for Sets:
------------------------------------------- |
Name: Set |
Description: a collection of unordered elements |
without duplicates |
Operations: |
addElt : Set element -> Set |
remElt : Set element -> Set |
size : Set -> integer |
hasElt : Set element -> boolean |
Properties: |
// on hasElt and addElt |
- if hasElt(S,e) then addElt(S,e)) = S |
- hasElt(addElt(S,e),e) = true |
- if hasElt(S,e) and e!=f then hasElt(addElt(S,f),e) |
|
// on hasElt and remElt |
- if (not hasElt(S,e)) then remElt(S,e) = S |
- hasElt(remElt(S,e),e) = false |
- if hasElt(S,e) and e!=f then hasElt(remElt(S,f),e) |
|
// on addElt/remElt and size |
- if (not hasElt(S,e)) then size(addElt(S,e)) = size(S) + 1 |
- if hasElt(S,e) then size(remElt(S,e)) = size(S) - 1 |
|
// on addElt and remElt |
- remElt(addElt(S,e),e) = S |
------------------------------------------- |
The properties are grouped by the operations whose interactions they capture. The order of properties here is not relevant. While properties may look like test cases, they are not. Test cases are over concrete data, while properties have variables. Properties are, however, useful blueprints for test cases.
Keep in mind that ADT descriptions are just documentation, not code. We write properties in a general form that can be adapted to working in any programming language.
2.4 The ADT for Bags
For reference, here is the ADT for Bags. Bags lose the duplicates property; the properties change accordingly (do you see where?). Note that this ADT for bags assume that remove takes out 1 occurrence of the element, not all occurrences.
------------------------------------------- |
Name: Bag |
Description: a collection of unordered elements |
that allows duplicates |
Operations: |
addElt : Bag element -> Bag |
remElt : Bag element -> Bag |
size : Bag -> integer |
hasElt : Bag element -> boolean |
Properties: |
// on hasElt and addElt |
- hasElt(addElt(S,e),e) = true |
- if hasElt(S,e) and e!=f then hasElt(addElt(S,f),e) |
|
// on hasElt and remElt |
- if (not hasElt(S,e)) then remElt(S,e) = S |
- if hasElt(S,e) and e!=f then hasElt(remElt(S,f),e) |
|
// on addElt/remElt and size |
- size(addElt(S,e)) = size(S) + 1 |
- size(remElt(S,e)) = size(S) - 1 |
|
// on addElt and remElt |
- remElt(addElt(S,e),e) = S |
|
// on addElt and remElt and hasElt |
- hasElt(remElt(addElt(addElt(S,e)),e),e) = true |
------------------------------------------- |
3 Where Do the Pieces of ADTs go in Java?
ADTs are independent of programming language. When you program in a new language, you need to figure out how that language captures the pieces of an ADT. Java interfaces capture the name and operations of an ADT. The properties get covered in the code that implements the ADT and into test cases.
Some languages provide more support for building properties into programs than Java (just as Java provides more support for interfaces than Racket does). It is important to understand what facilities each language you work in provides for capturing these concepts. Whatever your language doesn’t capture explicitly becomes your responsibility to document and test extensively.
4 Summary
This lecture shifted us from learning Java constructs into more abstract computer science concepts. In particular, we saw:
Separating specification from implementation is one of the most fundamental concepts in engineering: there are many specific approaches to a problem, each meeting different constraints. In Computer Science, typical constraints are performance, power, or memory usage.
When we look at a programming problem, we need to think about the data we need, the operations we need to perform on that data, which operations need optimal performance for our context, and what detailed behavior operations must provide.
Java interfaces capture operations and their types. Java classes capture different implementations of operations (which implies different performance of the operations). Abstract datatypes capture required behavior on operations.
Two ADTs can have the same operations, but different behavioral requirements. The Java interfaces for these two ADTs may appear the same, but their documentation should differ. Sets and Bags are a good example of this.
Data structures provide a specific implementation of an abstract data type. We will see much more of this concept in the coming lectures.