Searching

Objectives:

Definition and terminology

Definition. Assume k1,k2,...,kn is a collection distinct keys and

R = {(k1,I1),(k2,I2),..., (kn,In)}

is a collection of records (where Ij is the information stored in record (kj,Ij)) containing those keys.

Given a key value K, the search problem is to locate a record (kj,Ij) such that K = kj.

Successful search - a record with key kj = K was found

Unsuccessful search - no record with key kj = K was found

General search approaches

  1. Sequential methods
  2. the records are considered one at a time according to a predefined ordering
  3. Direct access methods
  4. the records are accesses directly based on the value of the search key
  5. Indexing methods
  6. keys are organized into some tree structure which allows fast searching

 

 

 

To describe search methods we shall use the following record structure definition:

 

struct record{

key k;

info I;

}

 

Searching arrays

The problem will be to find a record with key value K in an array

record A[n];

 

  1. Sequential search

 

info SeqSearch (record* A, int n,key K){

for (i = 0 ; i < n ; i++)

if (A[i].k == K)

return A[n].I;

return NOT_FOUND;

}

info BinSearch (record* A, int n,key K){

if (n == 1)

if (A[0].k == K)

return A[0].I;

else

return NOT_FOUND;

int mid = n/2;

if (A[mid].k == K)

return A[mid].I;

else

if (K < A[mid].k)

return BinSearch(&A[0], mid , K);

else

return

BinSearch(&A[mid+1], n-mid-1, K);

}

Example. Assume that the keys are integer numbers uniformly distributed between m1 and m2.

A key k should be at index

 

Searching lists

T(n) = 1p1+2p2+...+npn

Example. If each record has the same probability to be searched for:

 

Heuristics for managing self-organizing list:

  1. approximate the probabilities using the number of previous accesses
  2. keep a access counter associated with each record
  3. move a record forward if the associated counter becomes greater than the one of the preceding record

Disadvantages

Hashing

 

Hash functions

Example hash functions

  1. Hash integer keys by modulo

 

int HashMod (int k){

return k % TABLE_SIZE;

}

 

  1. Mid-square method (numbers)

 

// r = log (TABLE_SIZE)

 

int HashMidSq (int k){

unsigned int sq;

sq = k * k;

sq = sq << (sizeof(int)*8 - r) / 2;

sq = sq >> (sizeof(int)*8 - r);

return sq;

}

 

  1. Character folding (strings)

 

int HashFoldCh (char* s){

int sum = 0;

for ( ; *s != `\0' ; s++)

sum += (int) (*s);

return sum % TABLE_SIZE

}

Collision resolution

Collision resolution techniques:

 

  1. Open hashing (chaining)

 

struct entry{

key k;

info I;

entry* link;

};

 

entry HashTable[TABLE_SIZE];

 

info HashSearch (entry H[]; key K){

int i;

entry* pe;

i = Hash (K);

if (H[i].k == EMPTY)

return NOT_FOUND;

else if (H[i].k == K)

return H[i].I;

else

return SeqSearch (H[k].link, K);

}

Example.

  1. Closed hashing
  2. all records are stored in the hash table
  3. a collision is solved by applying secondary hash functions (probing)

void hashInsert (record R){

int h;

int curr = h = Hash (R.k);

for ( int i = 1; H[curr].key != EMPTY ; i++){

curr = (h + p(R.k, i)) % TABLE_SIZE;

if (H[curr].k == R.k)

return EXISTING_KEY;

}

H[curr] = R;

}

 

void hashSearch (record R);

 

Linear probing

p(k, i) = i

Quadratic probing

p(k, i) = i2

Double hashing

p(k, i) = i * h2(k)

 

Analysis of closed hashing