Arrays
Up until now, we have used lists (particularly LinkedList) to maintain an ordered (not necessarily sorted) set of elements of the same type. In Java and many other languages, Arrays are another common data structure for ordered sets. This lecture gives a brief overview of arrays, partly to contrast to lists, and partly because understanding arrays can help you understand how hashmaps can access elements so quickly.
1 Arrays and When to Use Them
Conceptually, arrays are good for situations when you (1) frequently want to access elements at specific positions in an ordered set, and (2) don’t expect to add or delete elements often. The second requirement means that you should have a pretty good idea what size you need the array to be.
In Java the most common built-in class for arrays is called ArrayList. When you create an ArrayList, you typically (though aren’t required to) indicate how many elements will be in the array. In this example, assume we are using an array to manage the order of finishers in a 5-person race. We might write the following to create and populate the array.
import java.util.ArrayList; |
|
class ArrayExample { |
ArrayList<String> race = new ArrayList<String>(5); |
|
ArrayExample() { |
race.add(0, "Alice"); |
race.add(1, "Bill"); |
race.add(2, "Chen"); |
race.add(3, "Daria"); |
race.add(4, "Elan"); |
} |
} |
The constructs to process arrays are extremely similar to what you use on LinkedList – you write for loops and use get to retrieve the element at a specific index. From the perspective of writing code, the two look much the same.
Under the hood, however, arrays are designed to provide faster access to elements based on their position. To understand how this works, you need to understand a bit about how programs use memory within your computer.
Memory is essentially a collection of numbered slots, where each slot has a fixed size/capacity. One slot can store one primitive datum (like an int) or a reference to one object. The "number" on each slot is called an address. Compilers and the underlying systems that run programs are able to retrieve (in constant time) the item stored in the slot at a particular address.
Basically, within the memory of your computer, the elements of an array are stored in consecutive slots (this is not necessarily true of LinkedLists). So, if you ask for the element in position 3 of an array, Java can compute the address for position 3 from the address of the array (a simple addition), and retrieve the data.
The Java documentation tutorial on arrays shows a diagram of this at the top. The rest of the page uses an alternative syntax for arrays (see next section), rather than the ArrayList interface.
1.1 For Your Information: Alternative Array Syntax
In many languages, including Java, there is an alternative syntax for Arrays, that looks as follows:
class ArrayExampleAlt { |
// create an array of length 5 |
String[] race = new String[5]; |
|
ArrayExampleAlt() { |
race[0] = "Alice"; |
race[1] = "Bill"; |
race[2] = "Chen"; |
race[3] = "Daria"; |
race[4] = "Elan"; |
} |
} |
Here, the square brackets are used to refer to a specific position within the array. CS2102 will not use this notation, but many other languages do, so you should be aware of it.
One key difference between this format and ArrayList is that ArrayLists can only contain objects, whereas this alternative format can include non-object types (such as int). This means that you can only process a square-bracket array using an index-based for loop.
We include this in these notes as you will surely encounter this notation for arrays if you continue in programming beyond CS2102. This course will not use this notation.
2 Arrays versus Lists
We explained above that arrays achieve fast performance by having all of their contents in consecutive slots in memory. If you add or delete elements from an array, a bunch of work has to happen under the hood to maintain the consecutive slots. Add and delete are therefore expensive. Lists, in contrast, don’t necessarily maintain consecutive slots, so adding and deleting is fast, but it takes longer to retrieve an element based on its position.
For the style of programs you are writing in this class, LinkedLists are usually a more appropriate choice, because we add and remove elements frequently. If you go on to take Algorithms, you will use Arrays extensively. It turns out that you get faster implemetations of data structures (such as BSTs) if you use arrays in a particular way to organize your data. That topic is beyond the scope of CS2102 though. (If you had CS in high school, you may have already seen array-based implementations of lists and other data structures which exploit positions for performance – you’ll get back to those if you take Algorithms).
3 Arrays Underlie Hashmaps
Arrays are the underlying data structure for HashMaps. As we discussed briefly last week, a key part of having a hashmap is having a hash function that can map your keys to unique integers. Basically, the hashmap underneath is an array, and the hash function converts specific keys to array positions.
This explanation glosses over a fair bit of detail (what if your hash function produces large numbers, for example), but those details are more than we want to go over in CS2102. Here, we just want you to have a sense of roughly how the hashmap can provide fast access to data.
If you are interested in the details, the Java HashMap documentation explains how sizes are initialized and maintained as you add elements to the hashmap. For lots of details on the subtleties to making hashtables work, see the Wikipedia entry on hashtables.