As with search algorithms, I found learning these a little tricky so I’ve put my notes and the code I generated in interactive lessons in one place for easy reference.
Bubble
Bubble sort compares adjacent values in a dataset and then swaps them if required, iterating across the values in full until no more swaps are necessary (the data is sorted). This can be time/resource intensive with large datasets. Bubble sorting can be faster when applied to data which is already mostly ordered (‘a quick tidy up’).
After each iteration of (ascending value) Bubble Sort, the highest number will always be at the end of the dataset. Therefore, the code can be optimised (below) by disregarding this final comparison in the next iteration, making the number of comparisons per iteration decrementally fewer.
def bubble_sort(arr):
for i in range(len(arr)):
for idx in range(len(arr) - i - 1):
if arr[idx] > arr[idx + 1]:
arr[idx], arr[idx + 1] = arr[idx + 1], arr[idx]
Merge
Merge sorting essentially breaks the data down into individual elements and then reconstructs as a sorted list. It does this by splitting the dataset in half, repeating the split until left with single elements (lists with only one entry!). It then compares pairs of elements, reconstructing them into sorted lists of two elements. Pairs of these two-element lists are then reconstructed into sorted lists of four elements, and so on until a sorted list of the original length is reconstructed.
The Merge code below features two functions, the sorter and the merger.
def merge_sort(items):
if len(items) <= 1:
return items
middle_index = len(items) // 2
left_split = items[:middle_index]
right_split = items[middle_index:]
left_sorted = merge_sort(left_split)
right_sorted = merge_sort(right_split)
return merge(left_sorted, right_sorted)
def merge(left, right):
result = []
while (left and right):
if left[0] < right[0]:
result.append(left[0])
left.pop(0)
else:
result.append(right[0])
right.pop(0)
if left:
result += left
if right:
result += right
return result
Quicksort
Similar to merge, but takes in a specified (or random) ‘pivot’ value to generate the ‘split point’ for each iteration. The algorithm must track the position of the pivot, and the point in the iterating dataset where all preceding values are lower than the pivot (for an ascending sort).
from random import randrange, shuffle
def quicksort(list, start, end):
if start >= end:
return
# select random element to be pivot
pivot_idx = randrange(start, end + 1)
pivot_element = list[pivot_idx]
# swap random element with last element in sub-lists
list[end], list[pivot_idx] = list[pivot_idx], list[end]
# tracks all elements left (less than) pivot
less_than_pointer = start
for i in range(start, end):
if list[i] < pivot_element:
# swap element to the right-most portion of lesser elements
list[i], list[less_than_pointer] = list[less_than_pointer], list[i]
# move less than pointer
less_than_pointer += 1
# move pivot element to the right-most portion of lesser elements
list[end], list[less_than_pointer] = list[less_than_pointer], list[end]
# recursively sort left and right sub-lists
quicksort(list, start, less_than_pointer - 1)
quicksort(list, less_than_pointer + 1, end)
Radix
Unlike the other searching algorithms, Radix does not perform any comparisons. It is a numeric sort, and operates by siloing data according to their least significant digit (or most, depending on sort). Having done that, the data can now be siloed again according to the next significant digit and so on until data has been siloed according to the most significant digit of the longest number in the data. At this point, the data will have been sorted!
def radix_sort(to_be_sorted):
maximum_value = max(to_be_sorted)
max_exponent = len(str(maximum_value))
#operate on a copy of the original list as sort will clear data!
being_sorted = to_be_sorted[:]
for exponent in range(max_exponent):
position = exponent + 1
index = -position
digits = [[] for i in range(10)]
for number in being_sorted:
number_as_a_string = str(number)
try:
digit = number_as_a_string[index]
digit = int(digit)
except IndexError:
digit = 0
digits[digit].append(number)
being_sorted = []
for numeral in digits:
#appends every value of one list to another list
being_sorted.extend(numeral)
return being_sorted
Bubble vs Quicksort
The lesson project featured coding the sort algorithms to provide different sort orders. The dataset provided in the lesson demonstrated the noticeable difference in processing time for Bubble vs Quick sorts on a bulky dataset. There are three scripts.
utils.py – for importing the data
import csv
def load_books(filename):
bookshelf = []
with open(filename) as file:
shelf = csv.DictReader(file)
for book in shelf:
book['author_lower'] = book['author'].lower()
book['title_lower'] = book['title'].lower()
bookshelf.append(book)
return bookshelf
sorts.py – providing the core logic for the sort algorithms
import random
def bubble_sort(arr, comparison_function):
swaps = 0
sorted = False
while not sorted:
sorted = True
for idx in range(len(arr) - 1):
if comparison_function(arr[idx],arr[idx + 1]):
sorted = False
arr[idx], arr[idx + 1] = arr[idx + 1], arr[idx]
swaps += 1
print("Bubble sort: There were {0} swaps".format(swaps))
return arr
def quicksort(list, start, end, comparison_function):
if start >= end:
return
pivot_idx = random.randrange(start, end + 1)
pivot_element = list[pivot_idx]
list[end], list[pivot_idx] = list[pivot_idx], list[end]
less_than_pointer = start
for i in range(start, end):
if comparison_function(pivot_element, list[i]):
list[i], list[less_than_pointer] = list[less_than_pointer], list[i]
less_than_pointer += 1
list[end], list[less_than_pointer] = list[less_than_pointer], list[end]
quicksort(list, start, less_than_pointer - 1, comparison_function)
quicksort(list, less_than_pointer + 1, end, comparison_function)
script.py – where the comparator functions were defined
import utils
import sorts
bookshelf = utils.load_books('books_small.csv')
long_bookshelf = utils.load_books('books_large.csv')
def by_title_ascending(book_a, book_b):
return book_a['title_lower'] > book_b['title_lower']
def by_author_ascending(book_a, book_b):
return book_a['author_lower'] > book_b['author_lower']
def by_total_length(book_a, book_b):
return (len(book_a['title']) + len(book_a['author'])) > (len(book_b['title']) + len(book_b['author']))