last updated 4/20/07
Linear search
|
Binary search
|
Hashing
|
Unordered storage
|
Ordered Storage
|
Hash Storage
|
Ideas and observations
Assume we want to store values with a range of 1-1000. This is the ideal case since these values can be the key values and thus an index into the array
The referring to a field of a data record would simply be hashTable[key].field
This is unrealistic for large data sets where the key values are scattered and sparse.
Student ids at Juniata are 6 digits numbers, so the range is 1-1000000
To create a 1 million element array for 1300+ students results in 0.13% utilization of the space in the hash table.
Instead, create an array of sufficient size, say 1600. Now we need to transform the key (student id) to an array index by some function.
This requires
a hash function so that a method call such as
hash ( key ) will return an index that can be used as the array subscript.
hashKey = studentId % 1600;
// a simple in-line hash "function" public void add (Hashable element)
// Adds element to this list at position element.hash().
{
int location;
location = element.hash();
list[location] = element;
numElements++;
}
public Hashable get(Hashable element)
// Returns an element e from this list such
// that e.equals(element).
{
int location;
location = element.hash();
return (Hashable)list[location];
}
For hashing to work well you need
It is common to use a combination. Usually the function will finish with division to guarantee that we generate a valid index.
Also, it is recommended to use a representative key set and generate a set of corresponding hash values and analyze its statistical properties for even distribution.
The simplest method of collision resolution is call Linear probing:
final static int SIZE = 127; //a prime number is good choice
Object list [SIZE];
public static void add (Hashable element)
// Adds element to this list at position element.hash(),
// or the next free array slot.
{
int location;
location = element.hash();
while (list[location] != null)
location = (location + 1) % list.length;
list[location] = element;
numElements++;
}
public static Hashable get(Hashable element)
// Returns an element e from this list such that e.equals(element).
{
int location;
location = element.hash();
while (!list[location].equals(element))
location = (location + 1) % list.length;
return (Hashable)list[location];
}
You should not remove items from the hash table in the case of linear probing.
Why?
An approach could be
void remove (element){
list [ element.hash( ) ] = null;
}
Collisions, however, complicate the matter. We cannot be sure that our element is in location element.hash().
Clustering or clumping of data in a local area is a phenomenon of hashing
primary clustering results directly from synonyms
secondary clustering occurs from collisions from clusters growing into other locations
To reduce secondary clustering one of the following can be used

private class LLNode {
//inner class definition for a node
private Object value; private LLNode next;
private LLNode(Object value, LLNode next){
this.value = value;
this.next = next;
}
}
final static int SIZE = 127; LLNode HashTable[] = new LLNode[SIZE]; //the array should be initialize to nulls //LLNode has 'next' and 'obj' inst vars
private int Hash(){
// as an example, this hash function folds strings
int sum = 0;
for(int i=0; i<key.length; i++){
sum = sum + key.charAt(i)*(key.length-i);//weighted
}
return sum % SIZE;
}
public static void add (Hashable element)
// Adds element to this list at position element.hash(),
// or the next free array slot.
{
int location = element.hash();
//assume the key is unique //insert new node in as the head of the synonym list HashTable[location] = new LLNode(element,HashTable[location] ; numElements++;
}
public static Hashable get(Hashable element)
// Returns an element e from this list such that e.equals(element).
{
LLNode temp = HashTable[element.Hash()]; //get head of LL of synonyms
while(temp != null && temp.obj.getKey().compareTo(key)!=0){ //search down the LL
temp = temp.next;
}
if(temp != null){ //return NULL or pointer to found node
return temp.obj;
}else{
return null;
}
}
Objects can be easily deleted in this implementation.
Factors affecting the performance:
The load factor,
, measures how
full a table is.
also can represent the probability that an element is used.
= .5 means that 1/2 of the table is full, or
50% probability that any one element is used
= 1 means that the table is full
= 2 means (in the case of a chained hash
table) that each element has an average chain length of 2
Expected number of probes for successful search of a hash table
| Linear probing | |
| Random, rehash, quadratic | 1/ *ln (1/(1- )) |
| Chaining |
Every object created is stored for fast retrieval through a hash table. The Object class defines a public function hashCode that effectively returns the address of the object. Therefore all Java objects have an associated hash code.
You can take advantage of the Java system's built in hashing function by calling the hashCode method for the object and applying % size to this return address for an easy hash.
The Java Object class exports a hashCode method that returns an int hash code. The standard Java hash code for an object is a function of the object’s memory location.
For most applications, hash codes based on memory locations are not usable. Therefore, many of the Java classes that define commonly used objects (such as String and Integer), override the Object class’s hashCode method with one that is based on the contents of the object.
If you plan to use hash tables in your programs, you should do likewise.
The library includes several other collection classes, such as HashSet, that provide an ADT whose underlying implementation uses the approaches described in this section.
The Java Library includes a HashTable class that uses hash techniques to support storing objects in a table. This is a basic hash table of unique values. Identify the basic operations available.
It is a map which is a pairing of keys and objects