Searching

 

Introduction

Searching is a common operation applied to a vector or list.

We want to search the structure for a value's existence and to access related information

The search function will return a boolean flag indicating success and also return a pointer/index to the position it was found (a NULL or out of range index can also be used to indicate failure.

Search key - the value to be found in the vector

 

 

 

 

 

 


Linear searching

Begin at the one end of the vector, probe each element in turn until search key is found (success) or the other end of the vector is found (failure)

void Lookup ( /*in */ const ElemType vec[],  //PRE: assigned(vec) && vSize >= 0
              /*in */ int vSize,             //      && assigned(key)
              /*in */ ElemType key,
              /*out*/ Boolean & found,      //POST: found == key is an element of vec &&
              /*out*/ int & loc)            //      && found -> key==vec[loc]
{
    loc = 0;
    while (loc < vSize && vec[loc] != key)
        loc++;
    found = (loc < vSize);
}

 

 

 

 


Binary Searching

This algorithm requires that the vector is sorted

The "high-low" numbers guessing game  is a form of  binary searching

void Lookup ( /*in */ const ElemType vec[],  //PRE: assigned(vec) && vec is sorted && vSize >= 0
              /*in */ int vSize,             //      && assigned(key)
              /*in */ ElemType key,
              /*out*/ Boolean & found,       //POST: found == key is an element of vec &&
              /*out*/ int & loc)             //      && found -> key==vec[loc]
{
    int lo = 0;
    int hi = vSize-1;
    loc = hi / 2;
    while (lo <= hi && vec[loc] != key) {
        if (key > vec[loc]){
            lo = loc+1;
        } else {
            hi = loc-1;
        }
        loc = (lo + hi)/2;
    }
    found = (lo <= hi)
}

 



Trace of a binary search

Enter a search key for the array below and follow the Low, High and Mid indexes through the binary search.

Enter search key:

Vector:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14
10 12 19 25 27 30 39 42 44 48 57 63 69 75 80

 

 

 

 

 


Maximum loop iterations for the binary search

Vector size

maximum iterations

1

1

2

2

3

2

4

3

7

3

15

4

31

5

63

6

127

7

255

8

Translation: m iterations support up to 2m -1 elements or number of iterations = log2 n, where n is the vector size

complexity = O(log2 n)

 

 

 

 


Best, Worst and Average Case Analysis

Best case analysis

performance in the best possible case  

Linear search: find the search key on the first probe -> O(1)  Binary search: same -> O(1) 

  Worst case analysis

performance in the worst situations

Linear search: search the entire vector -> O(n)  Binary search: search until low and high pointers cross ->
O(log2 n) 

 

 

 

 

Average case analysis

average performance over many searches
E(X) = P(x)f(x)
E(X) is the expected search length for vector X. (Assumption: The search length is proportional to the average running time.)
P(x) is the probability of accessing element x
f(x) is the number of probes (again proportional to the number of instructions) to access (get to) element x

Linear search:

Complexity = O(n) in the average case

 

 

 


Improvements on the linear search

If you know something about the frequency access, say from past experience then you can put the most frequently accessed elements earlier in the list.

SD 5
ND 4
PA 10
NH 7
TX 30
CA 44

P(x) = f(x)/ f(x)

E(X) = .05(1) + .04(2) + .10(3) + .07(4) + .30(5) + .44(6)

E(X) = 4.85 for a search length

If you arrange by frequency of access, most to least, then

E(X) = .44(1) + .30(2) + .10(3) + .07(4) + .05(5) + .04(6)

E(X) = 2.11 for an optimal search length

(Could dynamically track access counts and rearrange periodically)

 

 

 


Binary searching Analysis

Although formally proven by induction, essentially the analysis process is like the linear search in that E(X) is the sum of integers from 1 to n, but in the binary we'll sum from 1 to log n.

Thus the average search length is O(log n)

Variation on the binary search for a slightly faster execution.

It assume that since the average time for a search is the same as worst case, then just assume the worst case and try to improve it.

void Lookup ( /*in */ const ElemType vec[],
              /*in */ int vSize,
              /*in */ ElemType key,
              /*out*/ Boolean & found,
              /*out*/ int & loc)
{
    int lo = 0;
    int hi = vSize-1;

    while (lo < hi) { //simpler test here
        loc = (lo + hi)/2
        if (key > vec[loc]){
            lo = loc+1;
        } else {
            hi = loc;
        }
        loc = lo;
    }
    found = (vSize > 0 && vec[loc] == key);
}