I found learning about search algorithms in Python a little tricky; I am reproducing the code I generated during interactive exercises here so that I can refresh my memory of how it works.

Linear

A linear search will iterate through every value in the data set until it finds the first instance of the target value. This can be quite resource/time consuming. Use cases for linear algorithms are where we expect the target to be near the front of the dataset, the dataset is small, or the data has not been sorted.

def linear_search(search_list, target_value):
  for idx in range(len(search_list)):
    if search_list[idx] == target_value:
      return idx
  raise ValueError("{0} not in list".format(target_value))

Binary

A binary search is more efficient than a linear search, but requires sorted data. The algorithm checks the middle value of the sorted data, and compares it with the target. If the target is higher than the middle value, then the left portion of the dataset is discarded from the search. If the target is lower than the middle value, then the right portion of the dataset is discarded. The algorithm ‘homes in’ on the target.

There are two types of binary search; recursive, where the search calls itself, and iterative which uses loops.

Recursive

def binary_search(sorted_list, left_pointer, right_pointer, target):

  if left_pointer >= right_pointer:
    return "value not found"
	
  mid_idx = (left_pointer + right_pointer) // 2
  mid_val = sorted_list[mid_idx]

  if mid_val == target:
    return mid_idx
  if mid_val > target:
    return binary_search(sorted_list, left_pointer, mid_idx, target)
  if mid_val < target:
    return binary_search(sorted_list, mid_idx + 1, right_pointer, target)

Iterative

def binary_search(sorted_list, target):
  left_pointer = 0
  right_pointer = len(sorted_list)
  
  while left_pointer < right_pointer:
    mid_idx = (left_pointer + right_pointer)//2
    mid_val = sorted_list[mid_idx]
    if mid_val == target:
      return mid_idx
    if target < mid_val:
      right_pointer = mid_idx
    if target > mid_val:
      left_pointer = mid_idx + 1

  return "Value not in list"

Sparse

Apparently, business like to inject empty values into their data to test things! This is called a sparse data set.

As a result, our searches can hit empty values which makes comparison more tricky. The following code essentially has a second set of pointers which increment from the middle index out; if they exceed the bounds of the search data then the value is not found. These second pointers are also used to check for empty values, and shift the middle index accordingly.

def sparse_search(data, search_val):
  print("Data: " + str(data))
  print("Search Value: " + str(search_val))
  first, last = 0, len(data) - 1
  while first <= last:
    mid = (first + last)//2
    if not data[mid]:
      left = mid - 1
      right = mid + 1
      while (True):
        if (left < first) and (right > last):
          print '{0} is not in the dataset'.format(search_val)
          return
        elif right <= last and data[right]:
          mid = right
          break
        elif left >= first and data[left]:
          mid = left
          break
        right += 1
        left -= 1
    if data[mid] == search_val:
      print('{0} found at position {1}'.format(search_val, mid))
      return
    if search_val < data[mid]:
      last = mid - 1
    if search_val > data[mid]:
      first = mid + 1
  print('{0} is not in the dataset'.format(search_val))