Sort Algorithms in Python

As with search algorithms, I found learning these a little tricky so I’ve put my notes and the code I generated in interactive lessons in one place for easy reference.

Bubble

Bubble sort compares adjacent values in a dataset and then swaps them if required, iterating across the values in full until no more swaps are necessary (the data is sorted). This can be time/resource intensive with large datasets. Bubble sorting can be faster when applied to data which is already mostly ordered (‘a quick tidy up’).

After each iteration of (ascending value) Bubble Sort, the highest number will always be at the end of the dataset. Therefore, the code can be optimised (below) by disregarding this final comparison in the next iteration, making the number of comparisons per iteration decrementally fewer.

def bubble_sort(arr):
  for i in range(len(arr)):
    for idx in range(len(arr) - i - 1):
      if arr[idx] > arr[idx + 1]:
        arr[idx], arr[idx + 1] = arr[idx + 1], arr[idx]

Merge

Merge sorting essentially breaks the data down into individual elements and then reconstructs as a sorted list. It does this by splitting the dataset in half, repeating the split until left with single elements (lists with only one entry!). It then compares pairs of elements, reconstructing them into sorted lists of two elements. Pairs of these two-element lists are then reconstructed into sorted lists of four elements, and so on until a sorted list of the original length is reconstructed.

The Merge code below features two functions, the sorter and the merger.

def merge_sort(items):
  if len(items) <= 1:
    return items
  middle_index = len(items) // 2
  left_split = items[:middle_index]
  right_split = items[middle_index:]
  left_sorted = merge_sort(left_split)
  right_sorted = merge_sort(right_split)
  return merge(left_sorted, right_sorted)

def merge(left, right):
  result = []
  while (left and right):
    if left[0] < right[0]:
      result.append(left[0])
      left.pop(0)
    else:
      result.append(right[0])
      right.pop(0)
  if left:
    result += left
  if right:
    result += right
  return result

Quicksort

Similar to merge, but takes in a specified (or random) ‘pivot’ value to generate the ‘split point’ for each iteration. The algorithm must track the position of the pivot, and the point in the iterating dataset where all preceding values are lower than the pivot (for an ascending sort).

from random import randrange, shuffle

def quicksort(list, start, end):
  if start >= end:
    return
  # select random element to be pivot
  pivot_idx = randrange(start, end + 1)
  pivot_element = list[pivot_idx]
  # swap random element with last element in sub-lists
  list[end], list[pivot_idx] = list[pivot_idx], list[end]

  # tracks all elements left (less than) pivot
  less_than_pointer = start
  
  for i in range(start, end):
    if list[i] < pivot_element:
      # swap element to the right-most portion of lesser elements
      list[i], list[less_than_pointer] = list[less_than_pointer], list[i]
      # move less than pointer
      less_than_pointer += 1
  # move pivot element to the right-most portion of lesser elements
  list[end], list[less_than_pointer] = list[less_than_pointer], list[end]
 # recursively sort left and right sub-lists
  quicksort(list, start, less_than_pointer - 1)
  quicksort(list, less_than_pointer + 1, end)

Radix

Unlike the other searching algorithms, Radix does not perform any comparisons. It is a numeric sort, and operates by siloing data according to their least significant digit (or most, depending on sort). Having done that, the data can now be siloed again according to the next significant digit and so on until data has been siloed according to the most significant digit of the longest number in the data. At this point, the data will have been sorted!

def radix_sort(to_be_sorted):
  maximum_value = max(to_be_sorted)
  max_exponent = len(str(maximum_value))
  #operate on a copy of the original list as sort will clear data!
  being_sorted = to_be_sorted[:]

  for exponent in range(max_exponent):
    position = exponent + 1
    index = -position
    digits = [[] for i in range(10)]
    for number in being_sorted:
      number_as_a_string = str(number)
      try:
        digit = number_as_a_string[index]
        digit = int(digit)
      except IndexError:
        digit = 0
      digits[digit].append(number)
    being_sorted = []
    for numeral in digits:
      #appends every value of one list to another list
      being_sorted.extend(numeral)
  return being_sorted

Bubble vs Quicksort

The lesson project featured coding the sort algorithms to provide different sort orders. The dataset provided in the lesson demonstrated the noticeable difference in processing time for Bubble vs Quick sorts on a bulky dataset. There are three scripts.

utils.py – for importing the data

import csv

def load_books(filename):
  bookshelf = []
  with open(filename) as file:
      shelf = csv.DictReader(file)
      for book in shelf:
          book['author_lower'] = book['author'].lower()
          book['title_lower'] = book['title'].lower()
          bookshelf.append(book)
  return bookshelf

sorts.py – providing the core logic for the sort algorithms

import random

def bubble_sort(arr, comparison_function):
  swaps = 0
  sorted = False
  while not sorted:
    sorted = True
    for idx in range(len(arr) - 1):
      if comparison_function(arr[idx],arr[idx + 1]):
        sorted = False
        arr[idx], arr[idx + 1] = arr[idx + 1], arr[idx]
        swaps += 1
  print("Bubble sort: There were {0} swaps".format(swaps))
  return arr

def quicksort(list, start, end, comparison_function):
  if start >= end:
    return
  pivot_idx = random.randrange(start, end + 1)
  pivot_element = list[pivot_idx]
  list[end], list[pivot_idx] = list[pivot_idx], list[end]
  less_than_pointer = start
  for i in range(start, end):
    if comparison_function(pivot_element, list[i]):
      list[i], list[less_than_pointer] = list[less_than_pointer], list[i]
      less_than_pointer += 1
  list[end], list[less_than_pointer] = list[less_than_pointer], list[end]
  quicksort(list, start, less_than_pointer - 1, comparison_function)
  quicksort(list, less_than_pointer + 1, end, comparison_function)

script.py – where the comparator functions were defined

import utils
import sorts

bookshelf = utils.load_books('books_small.csv')
long_bookshelf = utils.load_books('books_large.csv')

def by_title_ascending(book_a, book_b):
  return book_a['title_lower'] > book_b['title_lower']

def by_author_ascending(book_a, book_b):
  return book_a['author_lower'] > book_b['author_lower']

def by_total_length(book_a, book_b):
  return (len(book_a['title']) + len(book_a['author'])) > (len(book_b['title']) + len(book_b['author']))

mrn00b0t

Interfacing between technophile and technophobe

Sort Algorithms in Python

Bubble

Merge

Quicksort

Radix

Bubble vs Quicksort

Leave a comment Cancel reply

Bubble

Merge

Quicksort

Radix

Bubble vs Quicksort

Share this:

Related

Leave a comment Cancel reply