Hashtables and Java HashMaps
Imagine that you are maintaining information on your contacts (name, email, and phone number). This builds on the PhoneEntry idea from yesterday, but dropping the NameAssoc class and just focusing on the phone entry parts.
A collection of contacts could get fairly large in a big organization. Employees often look up phone numbers and email addresses for contacts based on their names. When we only have a few hundred contacts, maintaining the contacts in a list and searching the list for each contact works well enough (this is a straightforward for loop). But once contacts get into the thousands or more, iterating down a list gets expensive. It would be nice if we had a way to look up a contact directly from a name, without having to iterate through a list.
Enter hashtables, a data structure that can perform lookups in near-constant time. A hashtable maps keys (such as names) to values (such as contact info) without traversing a data structure over all the keys. Let’s first try using hashtables to get a feel for how we use them, then we’ll go under the hood a bit to see how they work.
1 Java Classes and Operations on Hashtables
Java provides two data structures for hashtables: one called Hashtable and one called HashMap. HashMap is generally preferred, unless you have to deal with threads and synchronization (not a topic for this course). Hashtable is a legacy data structure from earlier versions of Java. We will therefore use the HashMap classes in Java as our implementation of the general hashtable data structure.
Hashtables are built into Java. To use them, include the line
import java.util.HashMap; |
HashMap, like LinkedList, requires you to specify the type of data that you want to store in it. Actually, for a hashmap, you specify two types: the type of the key, and the type of the data stored in each bucket. If K is the type of the key and V is the type of the value, you create a hashmap by writing
HashMap<K,V> |
In our case, we want to map from names to contact info. Let’s assume we have a class Contact with the following fields:
class Contact { |
String name; |
PhoneNumber phone; |
String email; |
} |
A map from names to contacts would then be defined as
new HashMap<String, Contact>(); |
The most useful HashMap<K,V> methods are:
V put(K key, V value) : stores the value under the key. If there was already a value associated with the key, put returns the previous value; otherwise it returns null.
V get(Object key) : returns the value corresponding to the given key. If there is no value for the key, returns null.
For the full list of HashMap methods, see the Java HashMap documentation.
2 Using HashMaps
To put these operations in context, consider the following class for managing a collection of contacts:
class AllContacts { |
// storing contacts in a hashmap from names to contacts |
HashMap<String, Contact> contactMap = new HashMap<String, Contact>(); |
|
AllContacts(){} |
|
// return contact that goes with a given name |
// will return null if the name isn't in the map |
Contact findContactByName(String name) { |
return contactMap.get(name); |
} |
|
// another addContact version that takes the whole contact as input |
AllContacts addContact(Contact aContact) { |
Contact oldContact = contactMap.put(aContact.name, aContact); |
return this; |
} |
} |
For now, we are ignoring the fact that the findContactByName method might return null if it doesn’t find a contact with that name. We will return to that issue and how to handle it properly after the break.
As a simple interaction with this class, consider the following:
> Contact kathiC = new Contact("Kathi", |
new PhoneNumber(508, 8315118), |
"kfisler@wpi.edu"); |
> AllContacts C = new AllContacts() |
> C.addContact(kathiC) |
> C.findContactByName("Kathi") |
Contact@94bs2ac1 |
For testing purposes such as this, we will find it handy to have a toString method in the Contact class that prints out the contacts in a more readable fashion. For example:
class Contact { |
String name; |
PhoneNumber phone; |
String email; |
|
// print the name and email address (skipping phone to show we can) |
public String toString() { |
return name + ", email: " + email; |
} |
} |
If we had this, the last interaction would now look like:
> C.findContactByName("Kathi") |
Kathi, email: kfisler@wpi.edu |
3 More Complex HashMap Values
Now imagine that we want to add the following two contacts:
Contact pat1 = new Contact("Pat", new PhoneNumber(508, 8315900), "pat@wpi.edu"); |
Contact pat2 = new Contact("Pat", new PhoneNumber(201, 8675309), "pchan@emayle.net"); |
C.addContact(pat1); |
C.addContact(pat2); |
What will we get if we run the following?
C.findContactByName("Pat") |
We will get the pat2 object, but we will have lost the pat1 object. Hashmaps store exactly one value of the given value type with each key. So if you try to associate an object with a key that already exists, Java will replace the old value with the new one.
But isn’t it often useful to have multiple values go with a key, with a collection of contacts being a good example? Yes. Remember that you decide the type of the hashmap value. So if you want to store multiple contacts under a single name, you could set up the value as a LinkedList and change your addContact method accordingly.
HashMap<String,LinkedList<Contact>> multContactsPerName; |
4 More Complex Key Values
Keys can also be more complex than built-in types. You might want a hashmap that starts from a phone number and retrieves the rest of the contact info (many smartphones do this when you start typing a number, for example). That kind of hashmap could be defined as follows:
HashMap<PhoneNumber,Contact> |
For this to work, Java needs a way to determine whether two keys are equal
5 How Do Hashtables Work?
The diagram at the top right of the Wikipedia entry on hashtables illustrates the concept nicely. Every hashtable has a fixed number of "buckets" to which it maps keys. If you have more buckets than actual keys, hashtables provide a perfect match between keys and data. If you have more keys than buckets, keys sometimes collide, costing the accuracy of the data retrieved from the table. We won’t talk about hashtables in depth here, as you’ll see them in detail in Algorithms.
The main thing to understand is that hashtables give constant time access to fields because each key get mapped to a number, which can in turn be used to find the corresponding object in memory quickly (regardless of how many keys/values are in the hashtable).
How do keys get mapped to numbers? For built-in types (like String), Java does the automatically. If you use a class you defined as the key, you have to provide this function, similarly to the way you provide equals. The method you need to write is called hashcode, and it must return an int that must be unique (to high probability) across all the objects in the class. One good way to create a hashcode is to multiply the hashcode for each variable in your class by a different prime number, then sum the results. Classes to be used as keys also need an equals method. Here’s a simple example showing equals and hashcode methods for a non-built-in class.
If you are working in Eclipse, there is a menu item that helps you generate equals and hashcode based on essential fields in your class. Items 7 and 8 in Chapter 3 of Effective Java also discusses methods for writing your own hashcode methods.
6 Summary
Hashtables and hashmaps are a particular data structure that implement a more general ADT called a dictionary or map. These ADTs are used to map from keys to values. Hashtables and hashmaps are particularly fast implementations that achieves performance by converting keys to positive integers that are largely unique across the set of hashed items. Hashmaps themselves use these positive integers to directly access the locations where data is stored in memory (rather than having to search through memory for values).