Chapter 11 of Shiflet text. Table of information is a set of records. We often lookup records by a key. For example, a table for CS2005 contains a record of students in which we use the name (or userid) as a key.
Could implement as an array of records (structs) or as a linked list.
Two common operations on a table are:
One approach for organizing a table of information requires that the records be stored using an array.
What about using an array with the range of 0 to 999,999,999 (one billion entries)? We could then map each student to its unique slot in the array using student ids--trivial to add and search. But expensive in terms of space.
What about having 10,000 slots (0 to 9999) and use the last four digits of the student id? This is an example of a hash function. A hash function maps a key of a record to an index in the array.
If there are 100 records then it is likely that each student will still map to a unique slot.
Load factor is 100/10000 = 0.01.
For example to hash a string just add the characters together as in the following:
#define HASHSIZE 97 /* size of the array of records */ /* Hash -- map a string of characters to a number in range 0 to HASHSIZE-1 int Hash(char *sb) { int sum = 0; while (*sb != '\0') { sum = sum + *sb; // add in the character value sb++; } return(sum%HASHSIZE); }
Look at with example 6.10 of Kruse text.
What if two elements hash to the same index--we have a collision. How to resolve?
If a collision occurs then do a linear search from this point (in a circular fashion) until the element is found or an empty slot is encountered. Can lead to clustering. Simplest approach though and is what I expect to be used in the project.
/* Return index of next entry for addition or match to slot */ char *rgName[HASHSIZE] // array of names initialize rgName[i] = NULL for all entries int HashedIndex(char *sb) { iSave = i = Hash(sb); if (rgName[i] == NULL) return(i); /* empty slot */ while (rgName[i] != NULL) { if (strcmp(sb, rgName[i]) == 0) return(i); i = (i + 1)%HASHSIZE; if (i == iSave) return(-1); /* have looped all around -- table full*/ } return(i); /* empty slot found */ }
Alternate approach to resolve collisions. See Fig 6.12 of Kruse. Use linked lists once the hash value has been found.