Implementing the Set ADT with Lists

Even though we haven’t yet done lists in Java, we can sketch this implementation out in pseudocode, relying on your previous understanding of lists. To implement an ADT, we provide a concrete data structure to represent the list and implementations of each operation in the ADT that satisfy the signature and the properties.

Here, for example, is an implementation of Sets basd on Lists:

A set is a list

addElt(Set, element) =
  (cons element Set)

remElt(Set, element) =
  (remove element Set)  \* assume remove built-in on lists *\

size(Set) =
  (length Set)

hasElt(Set, element) =
  (member element Set)

Not quite done though. Before we can claim to implement the Set ADT, we have to show that this implementation respects the properties of Sets. How does this implementation stack up against the two properties?

Sets are unordered: Does our list-based implementation respect this property? You might argue no, because lists are, by definition, ordered; they have operations like first which clearly expose ordering. The real question, however, is not whether the implementation supports more operations, but whether it exposes them. Nothing in our API for sets lets a user detect ordering. In particular, the sets API does not give access to the first operation on the underlying list.
Sets have no duplicates: We know that cons will accept duplicates (e.g., (cons 3 (cons 3 empty))), but as we have just argued, this is only a problem if our implementation violates the properties through the API. Does our current implementation let us observe duplicates in sets? Yes. Assume we have a set S. What is the difference between (size S) and (size(add 3 (add 3 S)))? It should be 1, but our implementation yields 2. This mismatch is a problem.
Where should we fix the implementation? Your first thought was probably to implement add differently, by checking whether its argument is already in the set before using cons. But remember, our goal is only to make the API behave properly – the data structure in the implementation need not respect the properties of the ADT! We detected the incorrect behavior using a combination of add and size. This suggests that a different implementation of size could also do the trick. In particular, size could simply report the number of unique elements (a more complicated implementation than length).
Which should we choose? It depends on how you will use your set ADT. In terms of efficiency, changing add makes add more expensive but keeps size relatively cheap; changing size makes it much more expensive, but keeps add very cheap. If you expect lots of additions but few size checks, the latter is preferable. (And yes, there are of course other options such as maintaining a separate size variable; we will come back to that option in a future lecture).

Moral: when you implement an ADT, make sure that your implementations satisfy the properties. As you discover useful axioms that embody those properties, include them with your ADT definition. They will help you develop good tests later on. We will begin to check that your test cases for ADT implementations check the required properties and axioms.

1 Summary

Key take-away from this example:

The data structure that you use to implement an abstract data type may have different properties than the ADT itself. What matters is that those differences don’t leak through the operations in the ADT. Axioms within the ADT help check for leaks.