Representing and Implementing Sets

Now that you’re able to implement basic, concrete data structures in Java, we turn to more advanced data structures and data types. We’ll be focusing both on these data types in their own right, and on various implications of building them in Java (which in turn will lead us deeper into OO programming in the coming classes).

1 Abstract Data Types

Up until now, we have maintained a close connection between data types and their underlying implementations. For example, when we wanted lists, we used Racket lists; when we wanted trees, we implemented trees (either through Racket structs or Java classes). Today, we start looking at Abstract Data Types, which admit multiple implementations through concrete data structures.

Roughly speaking, an abstract data type (henceforth abbreviated ADT) has three components:

A name
A set of required operations (with contracts)
A set of expected properties

We will work with a specific example to illustrate ADTs.

2 Sets

Sets (collections of items) are a common data type in computing. We might use them to capture the courses someone has registered for, the destinations served by an airline, or someone’s friends in a social network. As a data type, sets have two key characteristics (a.k.a., properties):

They contain no duplicates
The items within the set are unordered

Four operations are common on sets: adding elements, removing elements, determining how many elements are in the set, and asking whether a specific item is in the set.

We summarize this information in an ADT as follows:

-------------------------------------------
ADT Name: Set
ADT operations:
  addElt : Set element -> Set
  remElt : Set element -> Set
  size : Set -> integer
  hasElt : Set element -> boolean
ADT properties (vague):
  1. The set has no duplicates
  2. The items within the set are unordered
ADT axioms
  <will fill in later>
-------------------------------------------

An ADT is like an interface, in that it specifies what to provide but not how to provide it. (There are differences between interfaces and ADTs, which we return to later.) In order to run a program that uses an ADT, we also need to provide an implementation that satisfies the specification. Let’s look at that for Sets via Lists.

2.1 Implementing Sets with Lists

Even though we haven’t yet done lists in Java (we will next week), we can sketch this implementation out in a combination of Racket and pseudocode. To implement an ADT, we provide a concrete data structure to represent the list and implementations of each operation in the ADT that satisfy the signature and the properties.

Here, for example, is an implementation of Sets basd on Lists:

A set is a list

addElt(Set, element) =
  (cons element Set)

remElt(Set, element) =
  (remove element Set)  \* assume remove built-in on lists *\

size(Set) =
  (length Set)

hasElt(Set, element) =
  (member element Set)

Not quite done though. Before we can claim to implement the Set ADT, we have to show that this implementation respects the properties of Sets. How does this implementation stack up against the two properties?

Sets are unordered: Does our list-based implementation respect this property? You might argue no, because lists are, by definition, ordered; they have operations like first which clearly expose ordering. The real question, however, is not whether the implementation supports more operations, but whether it exposes them. Nothing in our API for sets lets a user detect ordering. In particular, the sets API does not give access to the first operation on the underlying list.
Sets have no duplicates: We know that cons will accept duplicates (e.g., (cons 3 (cons 3 empty))), but as we have just argued, this is only a problem if our implementation violates the properties through the API. Does our current implementation let us observe duplicates in sets? Yes. Assume we have a set S. What is the difference between (size S) and (size(add 3 (add 3 S)))? It should be 1, but our implementation yields 2. This mismatch is a problem.
Where should we fix the implementation? Your first thought was probably to implement add differently, by checking whether its argument is already in the set before using cons. But remember, our goal is only to make the API behave properly – the data structure in the implementation need not respect the properties of the ADT! We detected the incorrect behavior using a combination of add and size. This suggests that a different implementation of size could also do the trick. In particular, size could simply report the number of unique elements (a more complicated implementation than length).
Which should we choose? It depends on how you will use your set ADT. In terms of efficiency, changing add makes add more expensive but keeps size relatively cheap; changing size makes it much more expensive, but keeps add very cheap. If you expect lots of additions but few size checks, the latter is preferable. (And yes, there are of course other options such as maintaining a separate size variable; we will come back to that option in a future lecture).

The duplicates problem reflected through add and size illustrates a key point about ADT properties: they in turn suggest good test cases for implementations of ADTs. In fact, one could view such test cases as concrete instances of the properties we have already stated for ADTs. Such instances are called axioms. We can write these down explicitly in our ADT as follows:

-------------------------------------------
ADT Name: Set
ADT operations:
  addElt : Set element -> Set
  remElt : Set element -> Set
  size : Set -> integer
  hasElt : Set element -> boolean
ADT properties (vague):
  1. The set has no duplicates
  2. The items within the set are unordered
ADT axioms (concrete)
  if (not hasElt(S, n)) then size(addElt(S,n)) = size(S) + 1
  if hasElt(S, n) then size(addElt(S,n)) = size(S)

-------------------------------------------

Moral of all of this: when you implement an ADT, make sure that your implementations satisfy the properties. As you discover useful axioms that embody those properties, include them with your ADT definition. They will help you develop good tests later on. We will begin to check that your test cases for ADT implementations check the required properties and axioms.

3 A Java Interface for Sets

To tie this back to Java, let’s formally write tje Sets ADT as a Java interface:

interface Iset {

Iset addElt (int elt);

Iset remElt (int elt);

int size ();

Boolean hasElt (int elt);

}

What about the properties and axioms? The best Java can do here is include them in comments. Java interfaces do NOT fully support ADTs, precisely because Java lacks support for stating axioms outside of comments. The interface is still useful, but just weaker than ADTs.

4 Summary

Key take-aways from these notes:

An Abstract Data Type (ADT) specifies the name, operations, and expected properties on a new kind of data.
The data structure that you use to implement an abstract data type may have different properties than the ADT itself. What matters is that those differences don’t leak through the operations in the ADT. Axioms within the ADT help check for leaks.
Sometimes, choosing a good implementation depends on knowing how you will use the ADT.
Java interfaces are not the same as ADTs. In particular, Java interfaces lack a formal representation of properties.