Tables

Chapter 11 of Shiflet text. Table of information is a set of records. We often lookup records by a key. For example, a table for CS2005 contains a record of students in which we use the name (or userid) as a key.

Could implement as an array of records (structs) or as a linked list.

Two common operations on a table are:

  1. Add a record to a table.

  2. Search for a record given a key.

Hashing

One approach for organizing a table of information requires that the records be stored using an array.

What about using an array with the range of 0 to 999,999,999 (one billion entries)? We could then map each student to its unique slot in the array using student ids--trivial to add and search. But expensive in terms of space.

What about having 10,000 slots (0 to 9999) and use the last four digits of the student id? This is an example of a hash function. A hash function maps a key of a record to an index in the array.

If there are 100 records then it is likely that each student will still map to a unique slot.

Load factor is 100/10000 = 0.01.

Hash Function

Look at with example 6.10 of Kruse text.

Collision Resolution

What if two elements hash to the same index--we have a collision. How to resolve?

Open Hashing (Linear Probing)

If a collision occurs then do a linear search from this point (in a circular fashion) until the element is found or an empty slot is encountered. Can lead to clustering. Simplest approach though and is what I expect to be used in the project.

/* Return index of next entry for addition or match to slot */
char *rgName[HASHSIZE]  // array of names
initialize rgName[i] = NULL for all entries
int HashedIndex(char *sb)
{
    iSave = i = Hash(sb);
    if (rgName[i] == NULL)
        return(i);           /* empty slot */
    while (rgName[i] != NULL) {
        if (strcmp(sb, rgName[i]) == 0)
            return(i);
        i = (i + 1)%HASHSIZE;
        if (i == iSave)
            return(-1);        /* have looped all around -- table full*/
    }
    return(i);                 /* empty slot found */
}

Chained Addressing

Alternate approach to resolve collisions. See Fig 6.12 of Kruse. Use linked lists once the hash value has been found.

  1. We do not have problems with clustering.

  2. No problems with collisions because we just add to the linked list.

  3. Do not have a problem with overflow.

  4. links require space and a little trickier to program.

Analysis of Hashing