Interfaces and Abstract Datatypes
We’ve covered the fundamental parts of Java programs: classes, abstract classes, and interfaces. Classes capture new concrete kinds of data. Abstract classes share fields, methods, and helper functions that are common to classes. Interfaces define types based on required methods, but omit the details of fields and how the methods are implemented.
So far, we have seen one use for interfaces: to create types for data that have choices, or variants (such as many specific animals, or different options of family tree). Interfaces have many other uses, though. The general idea of summarizing what something does (specification) rather than how it does it (implementation) is one of the most fundamental concepts in all of engineering. It underlies good programming in every language, as well as good software design at a large scale. Today, we consider another programming problem that interfaces help address.
1 Three Motivating Problems
Consider the following problems:
Store all of the URLs visited from a web browser, with the ability to check whether a specific URL has been visited.
Gather all the words that someone generated during a word game (such as "write down all words can you make from the letters d e h l l l o o r w"), with the ability to add words, find out how many words the player generated, and get all of the words for purposes of scoring.
Maintain information on who is coming to a dinner party. In addition to adding and removing names, you need to be able to compare the current number of attendees to the number of chairs in your apartment.
If we were to write classes for each of these problems, we might do something like the following (where bunchOStrings is the name of some made-up type that can store a bunch of strings):
// The web browser code |
class webBrowser { |
bunchOStrings visitedURLs; |
|
bunchOStrings addURLIfNew(String newURL) { |
if (newURL not in visitedURLs) { |
// add newURL to visitedURLs |
} |
} |
} |
|
// The word game code |
class WordGame { |
bunchOStrings words; |
|
bunchOStrings addWordIfNew(String newWord) { |
if (newWord not in words) { |
// add newWord to words |
} |
} |
} |
|
// The party invites code |
class Invites { |
bunchOStrings guests; |
|
bunchOStrings addGuestIfNew(String name) { |
if (name not in guests) { |
// add name to guests |
} |
} |
} |
Hopefully, you would quickly realize that you were writing the same fundamental code several times for each of the addIfNew methods. Hopefully, you’ve developed the instinct not to do that. So what should you do about it? So far, the answer you’ve learned might be "make an abstract class". Concretely, we might make an abstract class about bunchOStrings and put a common addIfNew method there. This approach doesn’t make sense in this case. Abstract classes are used to share code across related classes in some common domain. These three classes (webBrowser, WordGame, and Invites) have nothing to do with one another conceptually.
In this case, you have really hit on the need for an abstract datatype: a pattern of data with standard operations (such as adding members or counting members) that can be used in many contexts. In this case, the addIfNew method should be built into the bunchOStrings class, since it is an operation that we seem to commonly want on bunchOStrings objects.
Today’s lecture starts us studying how we design and implement abstract datatypes such as bunchOStrings.
2 Classes as Datatypes
Our three motivating problems indicate that we need some kind of data that supports several basic operations:
A notion of a collection of items
The ability to add and remove items to the collection
The ability to count items in the collection
The ability to check whether an item is in the collection
A datatype with these operations is called a Set. Ideally, we would like to have a type for sets, and then define each of visitedURLs, words, and guests as a set.
You might consider replacing the (totally-made-up) bunchOStrings class with a (more realistic) Set class that has the general operations that you need. Something like:
class Set { |
<Fields to hold the data> |
|
<Constructor> |
|
Set addElt(String newElt) { ... } |
Set remElt(String newElt) { ... } |
int size() { ... } |
boolean hasElt(String elt) { ... } |
} |
We would then use Set in place of bunchOStrings in the previous examples. (Ignore the "if new" part of the add operation for a moment – we’ll return to that shortly)
Stop and think: is making a Set class and having a variable of type Set in each of these applications a good idea?
3 From Classes to Abstract Datatypes
Return to our three motivating problems again: they all require the same operations, but different operations matter more in these different applications:
The set of previously-visited URLs in a typical browser is huge. If the user wants to style or color visited URLs differently from unvisited ones (common), the browser needs to quickly search the set for each URL in the page it is rendering.
When scoring the word game, the ability to get the longest word a player found is important (as the long words are worth more points). Searching doesn’t happen often in this application.
The party list is likely small, so the performance of particular operations probably doesn’t matter as much.
Thus, while the collection of operations we need for each problem is similar, the code that provides the best performance for each context could be rather different. A single Set class, however, would fix one version of each method. We would be much better off creating an interface for sets, and several different classes (with different performance characteristics) that implement that interface. Programmers could then choose whichever of those classes performs well for their kinds of data.
An Aside: You might notice that our original premise has shifted: we started by saying we wanted to share code across our three problems (hence creating the datatype for sets in the first place), but now we are saying we might not want the same code after all. For the three specific problems at the start of these notes, that is correct: we did shift to wanting different implementations of sets. However, you can certainly imagine that different uses of sets might want the same implementation (such as two applications that each maintain sets of small numbers of items), so the original motivation still holds.
3.1 An Interface for Sets
The following interface captures the required operations on sets:
interface ISet { |
ISet addElt(String newElt) ; |
ISet remElt(String newElt) ; |
int size(); |
boolean hasElt(String elt); |
} |
Note that using the ISet interface, rather than the Set class has little impact on how we implement the classes for our motivating problems. We simply change the type of the field, as shown below for the visitedURLs example:
class visitedURLs { |
ISet theURLs; // the only change is in this line |
|
// Constructor omitted |
|
// mark a new URL as visited |
visitedURLs addURL(String newURL) { |
return this.theURLs.addElt(newURL); |
} |
|
// determine whether URL has already been visited |
boolean haveVisited(String aURL) { |
return this.theURLs.hasElt(aURL); |
} |
} |
3.2 Abstract Datatypes, a First Definition
We have just motivated the distinction between a data structure and an abstract datatype. A data structure is a specific set of fields and algorithms for manipulating data in those fields to implement desired operations. An abstract datatype (abbreviated ADT throughout computer science) is a collection of operations (with the types of their inputs and outputs) but without fixing implementations. The term "abstract" reflects the lack of concrete implementations.
Even though you may only have a vague sense of abstract datatypes at this point, you should have an initial sense that data structures are implemented with concrete details (in Java, this means in classes), while abstract datatypes simply summarize names and types of operations (in Java, this means interfaces). The full story is more subtle, but you should have at least that intuition for now.
The structures/classes, lists, and trees that you learned about in 1101 are all data structures. So far, the only abstract datatype we’ve seen is for sets.
3.3 Different Flavors of Similar Datatypes
Now, let’s consider another motivating problem: tracking votes. We want to determine both how many votes were cast and how many votes were cast for each candidate. Votes are also just collections of words, so we should be able to manage votes using an ISet as well.
Stop and think: what should the result of the following expression be for each of the party guests and votes problems (assuming in one case Elmo has been invited to the party and in the other we are voting for Elmo)?
ISet s = new Set(); |
s.addElt("Elmo").addElt("Elmo").size() |
We want this expression to yield different answers in these two problems. For votes, we need to record each addition of "Elmo" as a distinct element in the set. For counting party guests, we only want to count Elmo once.
This suggests that there is more to data types than just the operations they support. We also need to know something about other properties of the datatype, as reflected in interactions between the operations.
3.4 Properties and Axioms on Datatypes
Our examples (including votes) point to two different datatypes:
Sets, which are collections of elements with no duplicates.
Bags, which are collections of elements that include duplicates.
We need to build this distinction into the datatypes. But how?
One way is with an informal comment such as the one given above: we simply summarize properties of the datatype in prose.
In addition, however, we should also be clear about the impact of the properties on the datatype’s operations (such as whether adding an element should change the number of elements in the data structure). We do this by including logical statements about the interaction of operations under the properties; these statements are called axioms.
3.5 Abstract Datatypes, a More Complete Definition
The following shows the full ADT description for Sets:
------------------------------------------- |
Name: Set |
Operations: |
addElt : Set element -> Set |
remElt : Set element -> Set |
size : Set -> integer |
hasElt : Set element -> boolean |
Properties: |
- The set has no duplicates |
- The items within the set are unordered |
Axioms |
// on hasElt and addElt |
- if hasElt(S,e) then addElt(S,e)) = S |
- hasElt(addElt(S,e),e) = true |
- if hasElt(S,e) and e!=f then hasElt(addElt(S,f),e) |
|
// on hasElt and remElt |
- if (not hasElt(S,e)) then remElt(S,e) = S |
- hasElt(remElt(S,e),e) = false |
- if hasElt(S,e) and e!=f then hasElt(remElt(S,f),e) |
|
// on addElt/remElt and size |
- if (not hasElt(S,e)) then size(addElt(S,e)) = size(S) + 1 |
- if hasElt(S,e) then size(remElt(S,e)) = size(S) - 1 |
|
// on addElt and remElt |
- remElt(addElt(S,e),e) = S |
------------------------------------------- |
The axioms are grouped by the operations whose interactions they capture. The order of axioms is not relevant. While axioms may look like test cases, they are not. Test cases are over concrete data, while axioms have variables. Axioms are, however, useful blueprints for test cases.
Keep in mind that ADT descriptions are just documentation, not code. We write properties and axioms in a general form that can be adapted to working in any programming language.
3.5.1 The Relationship between Properties and Axioms
Axioms capture how operations reflect properties of a datatype. They do not necessarily capture all properties though. Consider the property that elements are unordered – we don’t capture that anywhere in the axioms. We can’t, because the axioms are written against the operations and the operations don’t say anything about order. That’s worth repeating more directly: some properties are implicit in the operations provided in the ADT. You cannot capture such properties in axioms.
3.6 The ADT for Bags
For reference, here is the ADT for Bags. Bags lose the duplicates property; the axioms change accordingly. Note that this ADT for bags assume that remove takes out 1 occurrence of the element, not all occurrences.
------------------------------------------- |
Name: Bag |
Operations: |
addElt : Bag element -> Bag |
remElt : Bag element -> Bag |
size : Bag -> integer |
hasElt : Bag element -> boolean |
Properties: |
- The items within the bag are unordered |
Axioms |
// on hasElt and addElt |
- hasElt(addElt(S,e),e) = true |
- if hasElt(S,e) and e!=f then hasElt(addElt(S,f),e) |
|
// on hasElt and remElt |
- if (not hasElt(S,e)) then remElt(S,e) = S |
- if hasElt(S,e) and e!=f then hasElt(remElt(S,f),e) |
|
// on addElt/remElt and size |
- size(addElt(S,e)) = size(S) + 1 |
- size(remElt(S,e)) = size(S) - 1 |
|
// on addElt and remElt |
- remElt(addElt(S,e),e) = S |
|
// on addElt and remElt and hasElt |
- hasElt(remElt(addElt(addElt(S,e)),e),e) = true |
------------------------------------------- |
4 Where Do the Pieces of ADTs go in Java?
ADTs are independent of programming language. When you program in a new language, you need to figure out how that language captures the pieces of an ADT. Java interfaces capture the name and operations of an ADT. The properties are not put into code: they go in comments or other forms of documentation. The axioms get built into the code that implements the ADT and into test cases.
Some languages provide more support for building axioms into programs than Java (just as Java provides more support for interfaces than Racket does). It is important to understand what facilities each language you work in provides for capturing these concepts. Whatever your language doesn’t capture explicitly becomes your responsibility to document and test extensively.
5 Summary
This lecture shifted us from learning Java constructs into more abstract computer science concepts. In particular, we saw:
Separating specification from implementation is one of the most fundamental concepts in engineering: there are many specific approaches to a problem, each meeting different constraints. In Computer Science, typical constraints are performance, power, or memory usage.
When we look at a programming problem, we need to think about the data we need, the operations we need to perform on that data, which operations need optimal performance for our context, and what detailed behavior operations must provide.
Java interfaces capture operations and their types. Java classes capture different implementations of operations (which implies different performance of the operations). Abstract datatypes capture required behavior on operations.
Two ADTs can have the same operations, but different behavioral requirements. The Java interfaces for these two ADTs may appear the same, but their documentation should differ. Sets and Bags are a good example of this.
To identify the axioms for an ADT, (1) consider each pair of operations and ask how they must interact to satisfy the properties; (2) for each property, think about how you might see it reflected in interactions between operations.
Data structures provide a specific implementation of an abstract data type). We will see much more of this concept in the coming lectures.