Searching is a common operation applied to a vector or list.
We want to search the structure for a value's existence and to access related information
The search function will return a boolean flag indicating success and also return a pointer/index to the position it was found (a NULL or out of range index can also be used to indicate failure.
Search key - the value to be found in the vector
Begin at the one end of the vector, probe each element in turn until search key is found (success) or the other end of the vector is found (failure)
void Lookup ( /*in */ const ElemType vec[], //PRE: assigned(vec) && vSize >= 0
/*in */ int vSize, // && assigned(key)
/*in */ ElemType key,
/*out*/ Boolean & found, //POST: found == key is an element of vec &&
/*out*/ int & loc) // && found -> key==vec[loc]
{
loc = 0;
while (loc < vSize && vec[loc] != key)
loc++;
found = (loc < vSize);
}
This algorithm requires that the vector is sorted
The "high-low" numbers guessing game is a form of binary searching
void Lookup ( /*in */ const ElemType vec[], //PRE: assigned(vec) && vec is sorted && vSize >= 0
/*in */ int vSize, // && assigned(key)
/*in */ ElemType key,
/*out*/ Boolean & found, //POST: found == key is an element of vec &&
/*out*/ int & loc) // && found -> key==vec[loc]
{
int lo = 0;
int hi = vSize-1;
loc = hi / 2;
while (lo <= hi && vec[loc] != key) {
if (key > vec[loc]){
lo = loc+1;
} else {
hi = loc-1;
}
loc = (lo + hi)/2;
}
found = (lo <= hi)
}
Enter a search key for the array below and follow the Low, High and Mid indexes through the binary search.
Vector size |
maximum iterations |
1 |
1 |
2 |
2 |
3 |
2 |
4 |
3 |
7 |
3 |
15 |
4 |
31 |
5 |
63 |
6 |
127 |
7 |
255 |
8 |
Translation: m iterations support up to 2m -1 elements or number of iterations = log2 n, where n is the vector size
complexity = O(log2 n)
performance in the best possible case
| Linear search: find the search key on the first probe -> O(1) | Binary search: same -> O(1) |
Worst case analysis
performance in the worst situations
| Linear search: search the entire vector -> O(n) | Binary search: search until low and high pointers cross -> O(log2 n) |
Linear search:
Complexity = O(n) in the average case
If you know something about the frequency access, say from past experience then you can put the most frequently accessed elements earlier in the list.
| SD | 5 |
| ND | 4 |
| PA | 10 |
| NH | 7 |
| TX | 30 |
| CA | 44 |
P(x) = f(x)/
f(x)
E(X) = .05(1) + .04(2) + .10(3) + .07(4) + .30(5) + .44(6)
E(X) = 4.85 for a search length
If you arrange by frequency of access, most to least, then
E(X) = .44(1) + .30(2) + .10(3) + .07(4) + .05(5) + .04(6)
E(X) = 2.11 for an optimal search length
(Could dynamically track access counts and rearrange periodically)
Although formally proven by induction, essentially the analysis process is like the linear search in that E(X) is the sum of integers from 1 to n, but in the binary we'll sum from 1 to log n.
Thus the average search length is O(log n)
Variation on the binary search for a slightly faster execution.
It assume that since the average time for a search is the same as worst case, then just assume the worst case and try to improve it.
void Lookup ( /*in */ const ElemType vec[],
/*in */ int vSize,
/*in */ ElemType key,
/*out*/ Boolean & found,
/*out*/ int & loc)
{
int lo = 0;
int hi = vSize-1;
while (lo < hi) { //simpler test here
loc = (lo + hi)/2
if (key > vec[loc]){
lo = loc+1;
} else {
hi = loc;
}
loc = lo;
}
found = (vSize > 0 && vec[loc] == key);
}