Searching

page updated 3/22/07

 


Introduction

Searching is one of the common operations applied to a collection, but typical of a vector or list.

We want to search the structure for a value's existence, and possibly its position in order to access related information.

The search function must return two pieces of information:

  1.  some indication of success and 
  2. a pointer/index to the position it was found  (if successful)

Search key - the value to be found in the collection or vector

 

 

 

 


Linear searching an array

Begin at the one end of the vector, probe each element in turn until search key is found (success) or the other end of the vector is found (failure)

public static int linearSearch ( int vec[],int srchKey)
{
 for (int loc = 0; loc< vec.length; loc++)
    if(vec[loc] == srchKey)
        return loc;
    }
 return -1;
}

 

 

 

 


Linear Search on an Object Array

Generalizing to an array of objects...

public static int linearSearch (Object vec[], Object srchKey)
{
 for (int loc = 0; loc< vec.length; loc++)
    if(vec[loc].equals(srchKey))
              return loc;
    }
 return -1;
}

In the case of an array of objects, you may instead want to search on a value in the state of the objects and return the object....

public static Object linearSearch (  
                     Object vec[], String srchString)
{
 for (int loc = 0; loc< vec.length; loc++)
     if(vec[loc].getName().equals(srchString))
          return vec[loc];
    }
 return null;
}

 

 

 

 


Binary Searching

This algorithm requires that the vector is sorted.

The "high-low" numbers guessing game  is a form of  binary searching

public static int binSearch ( int vec[], int srchKey)
{
 int lo = 0;
 int hi = vec.length-1;
 int mid;
 while (lo <= hi) {     //more to search
    mid = (lo + hi)/2;    //determine middle
    if(vec[mid] == srchKey){  //is a hit?
       return mid;
    } else if (vec[mid] < srchKey){
       lo = mid+1;      //continue to right/below
    } else {
       hi = mid-1;      //continue to left/above
    }
 }
 return -1;
}

 

 

 


Trace of a binary search

Enter a search key for the array below and follow the Low, High and Mid indexes through the binary search.

Enter search key and then click GO to start  

Vector:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14
10 12 19 25 27 30 39 42 44 48 57 63 69 75 80

 

 

 

 

 

 

 


Binary Search on Objects

Again you can directly search an array of objects with two requirements:

  1. The class implements Comparable (so that the compareTo function is defined)

  2. The array is sorted accordingly

public static int binSearch ( Object vec[], Object srchStr)
{
 int lo = 0;
 int hi = vec.length-1;
 int mid;
 int comparison;  //holds result of compareTo
 while (lo <= hi) {
     mid = (lo + hi)/2;   //determine middle
     comparison = vec[mid].compareTo(srchStr);
     if(comparison == 0){
        return mid;
     } else if (comparison < 0){
        lo = mid+1;      //continue to right/below
     } else {
        hi = mid-1;      //continue to left/above
     }
 }
 return -1;
}

 

 


Binary Search on Objects' Key Strings

Does not require implement Comparable

public static int binSearch ( Object vec[], String srchString)
{
 int lo = 0;
 int hi = vec.length-1;
 int mid;
 int comparison;  //holds result of compareTo
 while (lo <= hi) {
     mid = (lo + hi)/2;   //determine middle
     comparison = vec[mid].getKey().compareTo(srchString);
     if(comparison == 0){
         return mid;
     } else if (comparison < 0){
         lo = mid+1;      //continue to right/below
     } else {
         hi = mid-1;      //continue to left/above
     }
 }
 return -1;
}

 


Maximum loop iterations for the binary search

Vector size

maximum iterations

1

1

2

2

3

2

4

3

7

3

15

4

31

5

63

6

127

7

255

8

Translation: m iterations support up to searching 2m -1 elements in the array
 or the number of iterations is log2 n, where n is the vector size

complexity = O(log2 n)

Now we have constant, log and linear complexities on our list operations.

 

 

 

 


Best, Worst and Average Case Analysis

Best case analysis

performance in the best possible case  

Linear search: find the search key on the first probe -> O(1) 

Binary search: same -> O(1) 

  Worst case analysis

performance in the worst situations

Linear search: search the entire vector -> O(n) 

Binary search: search until low and high pointers cross ->
O(log2 n) 

 

 

 

 


Average case analysis

average performance over many searches
E(X) = Σ P(x)f(x)
E(X) is the expected search length for vector X. (Assumption: The search length is proportional to the average running time.)
P(x) is the probability of accessing element x
f(x) is the number of probes (again proportional to the number of instructions) to access (get to) element x

Linear search:

E(X) = 1 (1)+ 1 (2)+ 1 (3)+...+ 1 (n)
n n n n

E(X) = 1 (1+2+3+...+n)
n

E(X) = 1 Σ= 1 ( n(n+1) )
n n
2

E(X) = (n+1)/2 ~= n/2

Complexity = O(n) in the average case

 

 

 


Improvements on the linear search

If you know something about the frequency access, say from past experience then you can put the most frequently accessed elements earlier in the list.

SD

5

ND

4

PA

10

NH

7

TX

30

CA

44

P(x) = f(x)/ f(x)

E(X) = .05(1) + .04(2) + .10(3) + .07(4) + .30(5) + .44(6)

E(X) = 4.85 for a search length

If you arrange by frequency of access, most to least, then

E(X) = .44(1) + .30(2) + .10(3) + .07(4) + .05(5) + .04(6)

E(X) = 2.11 for an optimal search length

(Could dynamically track access counts and rearrange periodically)

 

 

 


Binary searching Analysis

Although formally proven by induction, essentially the analysis process is like the linear search in that E(X) is the sum of integers from 1 to n, but in the binary search we'll sum from 1 to log n.

Thus the average search length is O(log n)