page updated 3/22/07
Searching is one of the common operations applied to a collection, but typical of a vector or list.
We want to search the structure for a value's existence, and possibly its position in order to access related information.
The search function must return two pieces of information:
Search key - the value to be found in the collection or vector
Begin at the one end of the vector, probe each element in turn until search key is found (success) or the other end of the vector is found (failure)
public static int linearSearch ( int vec[],int srchKey)
{
for (int loc = 0; loc< vec.length; loc++)
if(vec[loc] == srchKey)
return loc;
}
return -1;
}
Generalizing to an array of objects...
public static int linearSearch (Object vec[], Object srchKey)
{
for (int loc = 0; loc< vec.length; loc++)
if(vec[loc].equals(srchKey))
return loc;
}
return -1;
}
In the case of an array of objects, you may instead want to search on a value in the state of the objects and return the object....
public static Object linearSearch (
Object vec[], String srchString)
{
for (int loc = 0; loc< vec.length; loc++)
if(vec[loc].getName().equals(srchString))
return vec[loc];
}
return null;
}
This algorithm requires that the vector is sorted.
The "high-low" numbers guessing game is a form of binary searching
public static int binSearch ( int vec[], int srchKey)
{
int lo = 0;
int hi = vec.length-1;
int mid;
while (lo <= hi) { //more to search
mid = (lo + hi)/2; //determine middle
if(vec[mid] == srchKey){ //is a hit?
return mid;
} else if (vec[mid] < srchKey){
lo = mid+1; //continue to right/below
} else {
hi = mid-1; //continue to left/above
}
}
return -1;
}
Enter a search key for the array below and follow the Low, High and Mid indexes through the binary search.
Again you can directly search an array of objects with two requirements:
The class implements Comparable (so that the compareTo function is defined)
The array is sorted accordingly
public static int binSearch ( Object vec[], Object srchStr)
{
int lo = 0;
int hi = vec.length-1;
int mid;
int comparison; //holds result of compareTo
while (lo <= hi) {
mid = (lo + hi)/2; //determine middle
comparison = vec[mid].compareTo(srchStr);
if(comparison == 0){
return mid;
} else if (comparison < 0){
lo = mid+1; //continue to right/below
} else {
hi = mid-1; //continue to left/above
}
}
return -1;
}
Does not require implement Comparable
public static int binSearch ( Object vec[], String srchString)
{
int lo = 0;
int hi = vec.length-1;
int mid;
int comparison; //holds result of compareTo
while (lo <= hi) {
mid = (lo + hi)/2; //determine middle
comparison = vec[mid].getKey().compareTo(srchString);
if(comparison == 0){
return mid;
} else if (comparison < 0){
lo = mid+1; //continue to right/below
} else {
hi = mid-1; //continue to left/above
}
}
return -1;
}
|
Vector size |
maximum iterations |
|
1 |
1 |
|
2 |
2 |
|
3 |
2 |
|
4 |
3 |
|
7 |
3 |
|
15 |
4 |
|
31 |
5 |
|
63 |
6 |
|
127 |
7 |
|
255 |
8 |
Translation: m iterations support up to searching 2m
-1 elements in the array
or the number of
iterations is log2 n, where n
is the vector size
complexity = O(log2 n)
Now we have constant, log and linear complexities on our list operations.
performance in the best possible case
|
Linear search: find the search key on the first probe -> O(1) |
Binary search: same -> O(1) |
Worst case analysis
performance in the worst situations
|
Linear search: search the entire vector -> O(n) |
Binary search: search until low and high pointers cross
-> |
Linear search:
| E(X) = | 1 | (1)+ | 1 | (2)+ | 1 | (3)+...+ | 1 | (n) |
| n | n | n | n |
| E(X) = | 1 | (1+2+3+...+n) |
| n |
| E(X) = | 1 | Σi = | 1 | ( | n(n+1) | ) |
| n | n | 2 |
E(X) = (n+1)/2 ~= n/2
Complexity = O(n) in the average case
If you know something about the frequency access, say from past experience then you can put the most frequently accessed elements earlier in the list.
|
SD |
5 |
|
ND |
4 |
|
PA |
10 |
|
NH |
7 |
|
TX |
30 |
|
CA |
44 |
P(x) = f(x)/
f(x)
E(X) = .05(1) + .04(2) + .10(3) + .07(4) + .30(5) + .44(6)
E(X) = 4.85 for a search length
If you arrange by frequency of access, most to least, then
E(X) = .44(1) + .30(2) + .10(3) + .07(4) + .05(5) + .04(6)
E(X) = 2.11 for an optimal search length
(Could dynamically track access counts and rearrange periodically)
Although formally proven by induction, essentially the analysis process is like the linear search in that E(X) is the sum of integers from 1 to n, but in the binary search we'll sum from 1 to log n.
Thus the average search length is O(log n)