Complete Guide on Sorting Techniques in Python [2024 Edition]

Nitika Sharma 08 Jan, 2024 • 13 min read

Introduction

Sorting is a fundamental operation in python and plays a crucial role in various applications. Whether you are organizing data, searching for specific elements, or optimizing algorithms, having a solid understanding of sorting techniques is essential. In this comprehensive guide, we will explore different sorting techniques in Python, understand their efficiency, implement them in code, compare their performance, and provide tips and tricks for efficient sorting. By the end of this article, you will have a deep understanding of sorting techniques in Python and be able to choose the right one for your specific needs.

What is Sorting and Why is it Important?
Understanding the Efficiency of Sorting Algorithms
Different Sorting Algorithms and Implementation in Python
Comparing Sorting Algorithms in Python
Sorting Algorithms in Python Libraries and Modules
Frequently Asked Questions

What is Sorting and Why is it Important?

Sorting refers to the process of arranging elements in a specific order, typically in ascending or descending order. It allows us to organize data in a structured manner, making it easier to search, analyze, and manipulate. Sorting is a fundamental operation in computer science and is used in various applications such as data analysis, database management, search algorithms, and more. Efficient sorting algorithms can significantly improve the performance of these applications, making it a crucial skill for any programmer.

Start your coding journey today! Enroll in our free Python course to master essential sorting techniques, boosting your programming skills effortlessly. Don’t miss out, sign up now!

Understanding the Efficiency of Sorting Algorithms

To choose the right sorting algorithm for a specific task, it is important to understand their efficiency. Let’s explore the efficiency of sorting algorithms in terms of time complexity, space complexity, and best, average, and worst-case scenarios.

Time Complexity Analysis

Time complexity measures the amount of time taken by an algorithm to run as a function of the input size. It provides an estimate of how the algorithm’s performance scales with larger datasets. The time complexity of sorting algorithms can vary significantly, ranging from O(n^2) for Bubble Sort, Selection Sort, and Insertion Sort, to O(n log n) for Merge Sort, Quick Sort, Heap Sort, and Shell Sort, and even O(nk) for Radix Sort.

Space Complexity Analysis

Space complexity measures the amount of memory used by an algorithm to solve a problem as a function of the input size. It provides an estimate of how much memory the algorithm requires to store intermediate results and variables. Most sorting algorithms have a space complexity of O(1) or O(n), indicating that they either use a constant amount of memory or require additional memory proportional to the input size.

Best, Average, and Worst Case Scenarios

Sorting algorithms can have different performance characteristics depending on the input data. The best-case scenario represents the most favorable input that allows the algorithm to run with minimal comparisons and swaps. The average-case scenario represents a typical input that reflects real-world data. The worst-case scenario represents the most unfavorable input that causes the algorithm to perform the maximum number of comparisons and swaps. Understanding these scenarios helps in choosing the right algorithm for specific data distributions.

Elevate your coding game with our free Python course. Master crucial sorting techniques and become a more proficient programmer. Enroll today to level up your skills effortlessly!

Different Sorting Algorithms and Implementation in Python

Python provides several sorting techniques, each with its advantages and disadvantages. Let’s explore some of the most commonly used sorting techniques and their implementation in Python.

Bubble Sort

Bubble Sort is a simple sorting algorithm that repeatedly compares adjacent elements and swaps them if they are in the wrong order. It continues this process until the entire list is sorted. Although Bubble Sort is easy to understand and implement, it is not efficient for large datasets and has a time complexity of O(n^2).

Code:

def bubble_sort(arr):

    n = len(arr)

    for i in range(n):

        for j in range(0, n-i-1):

            if arr[j] > arr[j+1]:

                arr[j], arr[j+1] = arr[j+1], arr[j]

    return arr

Working:

Passes:
- The algorithm performs multiple passes through the list.

Comparisons and Swaps:
- In each pass, it compares adjacent elements in the list.
- If the elements are in the wrong order, it swaps them.

Iteration:
- The process is repeated for each pair of adjacent elements until the end of the list.

Pass Completion:
- After each pass, the largest unsorted element is guaranteed to be at the end of the list.

Repeat:
- The algorithm repeats the process until no more swaps are needed, indicating that the list is sorted.

Selection Sort

Selection Sort works by repeatedly finding the minimum element from the unsorted part of the list and placing it at the beginning. It continues this process until the entire list is sorted. Selection Sort has a time complexity of O(n^2) and is not suitable for large datasets.

Code:

def selection_sort(arr):

    n = len(arr)

    for i in range(n):

        min_idx = i

        for j in range(i+1, n):

            if arr[j] < arr[min_idx]:

                min_idx = j

        arr[i], arr[min_idx] = arr[min_idx], arr[i]

    return arr

Working:

Divide into Sorted and Unsorted:
- The algorithm divides the list into a sorted part at the beginning and an unsorted part.

Minimum Element:
- In each iteration, it finds the minimum element from the unsorted part.

Swap with First Unsorted Element:
- The minimum element is swapped with the first element of the unsorted part, effectively expanding the sorted part.

Repeat:
- The process is repeated, considering the remaining unsorted part, until the entire list is sorted.

Insertion Sort

Insertion Sort works by dividing the list into a sorted and an unsorted part. It iterates through the unsorted part, comparing each element with the elements in the sorted part and inserting it at the correct position. Insertion Sort has a time complexity of O(n^2) but performs well for small datasets and partially sorted lists.

Code:

def insertion_sort(arr):

    n = len(arr)

    for i in range(1, n):

        key = arr[i]

        j = i-1

        while j >= 0 and arr[j] > key:

            arr[j+1] = arr[j]

            j -= 1

        arr[j+1] = key

    return arr

Working:

Divide into Sorted and Unsorted:
- The algorithm divides the list into a sorted part at the beginning and an unsorted part.
Element Selection:
- It selects an element from the unsorted part.
Comparisons and Shifts:
- Iterates through the sorted part, comparing the selected element with each element in the sorted part.
- If an element in the sorted part is greater than the selected element, it shifts that element to the right.
Insertion:
- Inserts the selected element at the correct position in the sorted part.
Repeat:
- The process is repeated for each element in the unsorted part until the entire list is sorted.

Merge Sort

Merge Sort is a divide-and-conquer algorithm that divides the list into smaller sublists, sorts them individually, and then merges them back together. It has a time complexity of O(n log n) and is known for its stability and efficiency. Merge Sort is widely used in practice and is suitable for large datasets.

Code:

def merge_sort(arr):

    if len(arr) <= 1:

        return arr

    mid = len(arr) // 2

    left = merge_sort(arr[:mid])

    right = merge_sort(arr[mid:])

    return merge(left, right)

def merge(left, right):

    result = []

    i = j = 0

    while i < len(left) and j < len(right):

        if left[i] < right[j]:

            result.append(left[i])

            i += 1

        else:

            result.append(right[j])

            j += 1

    result.extend(left[i:])

    result.extend(right[j:])

    return result

Working:

Base Case:
- If the list has zero or one element, it is already considered sorted.

Divide:
- Divide the list into two halves.
- Calculate the midpoint of the list.

Conquer (Recursion):
- Recursively apply Merge Sort to each half of the list.
- This step continues until each sublist contains zero or one element.

Merge:
- Merge the two sorted sublists into a single sorted list.
- Create an auxiliary array to store the merged result.
- Compare the elements from both sublists, selecting the smaller element first and advancing the pointer in the respective sublist.
- Repeat this process until all elements are merged into the auxiliary array.

Copy Back:
- Copy the sorted elements from the auxiliary array back to the original list.

Repeat:
- Repeat the process for each level of recursion until the entire list is merged and sorted.

Quick Sort

Quick Sort is another divide-and-conquer algorithm that works by selecting a pivot element and partitioning the list around it. It recursively sorts the sublists on either side of the pivot. Quick Sort has an average time complexity of O(n log n) but can degrade to O(n^2) in the worst case. It is efficient for large datasets and is widely used in practice.

Code:

def quick_sort(arr):

    if len(arr) <= 1:

        return arr

    pivot = arr[len(arr) // 2]

    left = [x for x in arr if x < pivot]

    middle = [x for x in arr if x == pivot]

    right = [x for x in arr if x > pivot]

    return quick_sort(left) + middle + quick_sort(right)

Working:

Base Case:
- If the list has zero or one element, it is already considered sorted.

Pivot Selection:
- Select a pivot element from the list. Common choices include the middle element, the first element, or a random element.

Partitioning:
- Rearrange the elements in the list so that elements less than the pivot are on the left, and elements greater than the pivot are on the right.
- The pivot is now in its final sorted position.

Recursion:
- Recursively apply Quick Sort to the sublists on the left and right of the pivot.

Repeat:
- Repeat the process for each level of recursion until the entire list is sorted.

Before you move on to the next sorting techniques in python; read our article on how to filer lists in Python.

Heap Sort

Heap Sort is based on the concept of a binary heap, a complete binary tree where each parent node is greater (or smaller) than its children. It builds a max-heap (or min-heap) from the list and repeatedly extracts the maximum (or minimum) element, resulting in a sorted list. Heap Sort has a time complexity of O(n log n) and is efficient for large datasets.

Code:

def heap_sort(arr):

    n = len(arr)

    for i in range(n // 2 - 1, -1, -1):

        heapify(arr, n, i)

    for i in range(n-1, 0, -1):

        arr[i], arr[0] = arr[0], arr[i]

        heapify(arr, i, 0)

    return arr

def heapify(arr, n, i):

    largest = i

    left = 2 * i + 1

    right = 2 * i + 2

    if left < n and arr[i] < arr[left]:

        largest = left

    if right < n and arr[largest] < arr[right]:

        largest = right

    if largest != i:

        arr[i], arr[largest] = arr[largest], arr[i]

        heapify(arr, n, largest)

Working:

Build Heap:
- Build a max-heap from the elements in the list. This involves arranging the elements to satisfy the heap property (parent greater than its children).

Extract Maximum:
- Repeatedly extract the maximum element from the heap and place it at the end of the list.
- After each extraction, adjust the heap to maintain the heap property.

Repeat:
- Repeat the process until the entire list is sorted.

Radix Sort

Radix Sort is a non-comparative sorting algorithm that sorts elements by their digits or bits. It works by distributing the elements into buckets based on the least significant digit, then repeatedly redistributing them based on the next significant digit until the entire list is sorted. Radix Sort has a time complexity of O(nk), where k is the number of digits or bits in the largest element.

Code:

def radix_sort(arr):

    max_value = max(arr)

    exp = 1

    while max_value // exp > 0:

        counting_sort(arr, exp)

        exp *= 10

    return arr

def counting_sort(arr, exp):

    n = len(arr)

    output = [0] * n

    count = [0] * 10

    for i in range(n):

        index = arr[i] // exp

        count[index % 10] += 1

    for i in range(1, 10):

        count[i] += count[i - 1]

    i = n - 1

    while i >= 0:

        index = arr[i] // exp

        output[count[index % 10] - 1] = arr[i]

        count[index % 10] -= 1

        i -= 1

    for i in range(n):

        arr[i] = output[i]

Working:

Determine Maximum Digits:
- Find the maximum number of digits (or bits) among all elements in the list.

Bucket Distribution:
- Iterate through each digit position (from the least significant to the most significant) and distribute elements into buckets based on that digit.

Bucket Collection:
- Collect elements from the buckets in order, creating a new list.

Repeat:
- Repeat the process for each digit position until the entire list is sorted.

Counting Sort

Counting Sort is a non-comparative sorting algorithm that works by counting the number of occurrences of each element and using this information to determine their positions in the sorted list. It has a time complexity of O(n+k), where n is the number of elements and k is the range of input values. Counting Sort is efficient for small datasets with a limited range of values.

Code:

def counting_sort(arr):

    max_value = max(arr)

    count = [0] * (max_value + 1)

    for num in arr:

        count[num] += 1

    sorted_arr = []

    for i in range(len(count)):

        sorted_arr.extend([i] * count[i])

    return sorted_arr

Working:

Counting Occurrences:
- Find the maximum value in the list to determine the range.
- Create an array (count) to store the count of each unique element in the list.
- Iterate through the list, counting the occurrences of each element.

Prefix Sum:
- Modify the count array to store the prefix sum. This step helps determine the position of each element in the sorted list.

Sorted List Construction:
- Iterate through the original list and use the count array to determine the position of each element in the sorted list.
- Place each element in its correct sorted position.

Bucket Sort

Bucket Sort is a distribution sorting algorithm that works by dividing the input into a set of buckets, each representing a range of values. It then sorts the elements within each bucket individually and concatenates them to obtain the sorted list. Bucket Sort has a time complexity of O(n+k), where n is the number of elements and k is the number of buckets. It is efficient for datasets with a uniform distribution.

Code:

def bucket_sort(arr):

    n = len(arr)

    buckets = [[] for _ in range(n)]

    for num in arr:

        index = int(num * n)

        buckets[index].append(num)

    sorted_arr = []

    for bucket in buckets:

        sorted_arr.extend(insertion_sort(bucket))

    return sorted_arr

Working:

Bucket Creation:
- Create an array of empty buckets, where each bucket represents a range of values.

Element Distribution:
- Iterate through the original list and distribute each element into the corresponding bucket based on its value.

Individual Bucket Sorting:
- Sort each bucket individually. This step can be done using any suitable sorting algorithm.

Concatenation:
- Concatenate the sorted buckets to obtain the final sorted list.

Shell Sort

Shell Sort is an extension of Insertion Sort that works by sorting elements at a specific interval. It gradually reduces the interval until it becomes 1, effectively performing a final Insertion Sort. Shell Sort has a time complexity that depends on the chosen gap sequence and can range from O(n log n) to O(n^2). It is efficient for medium-sized datasets.

Code:

def shell_sort(arr):

    n = len(arr)

    gap = n // 2

    while gap > 0:

        for i in range(gap, n):

            temp = arr[i]

            j = i

            while j >= gap and arr[j - gap] > temp:

                arr[j] = arr[j - gap]

                j -= gap

            arr[j] = temp

        gap //= 2

    return arr

Working:

Gap Sequence:
- Choose a gap sequence that determines the intervals between elements to be compared and swapped.

Insertion Sort with Gaps:
- Perform Insertion Sort on sublists created by considering elements at the chosen intervals.
- Initially, the gap is large, and the list is partially sorted within each gap.

Gradual Reduction of Gap:
- Gradually reduce the gap until it becomes 1.
- At this point, the algorithm performs a final Insertion Sort to completely sort the list.

Repeat:
- Repeat the process for different gap sequences until the entire list is sorted.

Comparing Sorting Algorithms in Python

Sorting algorithms can vary significantly in terms of their performance and efficiency. It is crucial to understand the strengths and weaknesses of each algorithm to choose the right one for your specific use case. Let’s analyze the performance and compare the pros and cons of some commonly used sorting algorithms in Python.

Performance Analysis

To compare the performance of sorting algorithms, we can consider factors such as time complexity, space complexity, and stability. Time complexity refers to the amount of time it takes for an algorithm to execute, while space complexity refers to the amount of memory it requires. Stability refers to whether the algorithm maintains the relative order of elements with equal keys.

One of the most popular sorting algorithms is the Quicksort algorithm. It has an average time complexity of O(n log n) and a space complexity of O(log n). Quicksort is famous for its efficiency and is widely used in practice. However, it is not a stable sorting algorithm.

Another commonly used algorithm is Mergesort, which has a time complexity of O(n log n) and a space complexity of O(n). Mergesort is a stable sorting algorithm, making it suitable for scenarios where maintaining the relative order of equal elements is important.

Heapsort is another efficient sorting algorithm with a time complexity of O(n log n) and a space complexity of O(1). However, Heapsort is not a stable sorting algorithm.

Choosing the Right Sorting Algorithm for Your Data

When choosing a sorting algorithm, consider the following factors:

Time complexity: If you are dealing with a large dataset, algorithms with lower time complexity, such as Quicksort or Mergesort, may be more suitable.
Space complexity: If memory usage is a concern, algorithms with lower space complexity, such as Heapsort, may be preferred.
Stability: If maintaining the relative order of equal elements is important, choose a stable sorting algorithm like Mergesort.
Application-specific requirements: Consider any specific requirements or constraints of your application that may influence the choice of sorting algorithm.

Sorting Algorithms in Python Libraries and Modules

Python provides several libraries and modules that offer sorting algorithms for different data structures and scenarios. Let’s explore some of them:

Sorting with the Built-in `sorted()` Function

The built-in `sorted()` function in Python can sort various data types, including lists, tuples, and dictionaries. It uses the Timsort algorithm, which is a hybrid sorting algorithm derived from Quicksort and Mergesort. Timsort is famous for its stability and efficiency.

Code:

numbers = [5, 2, 8, 1, 9]

sorted_numbers = sorted(numbers)

print(sorted_numbers)

Output:

[1, 2, 5, 8, 9]

Sorting with the `sort()` Method

The `sort()` method is available for lists in Python. It sorts the list in-place, modifying the original list. The `sort()` method also uses the Timsort algorithm.

Code:

numbers = [5, 2, 8, 1, 9]

numbers.sort()

print(numbers)

Output:

[1, 2, 5, 8, 9]

Sorting with the `numpy` Library

The `numpy` library provides efficient sorting functions for arrays and matrices. It uses the Quicksort algorithm by default but can also utilize other sorting algorithms based on the input size and data type.

Code:

import numpy as np

numbers = np.array([5, 2, 8, 1, 9])

sorted_numbers = np.sort(numbers)

print(sorted_numbers)

Output:

[1 2 5 8 9]

Sorting with the `pandas` Library

The `pandas` library offers sorting functions for data manipulation and analysis. It provides sorting capabilities for data frames and series, allowing you to sort data based on specific columns or indices.

Code:

import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob'],

        'Age': [25, 30, 20]}

df = pd.DataFrame(data)

sorted_df = df.sort_values(by='Age')

print(sorted_df)

Output:

Name  Age

2   Bob   20

0  John   25

1 Alice   30

Sorting with the `collections` Module

The `collections` module in Python provides a `deque` data structure that supports efficient sorting using the `sorted()` function. The `deque` data structure is a double-ended queue that allows efficient insertion and deletion from both ends.

Code:

from collections import deque

numbers = deque([5, 2, 8, 1, 9])

sorted_numbers = sorted(numbers)

print(sorted_numbers)

Output:

[1, 2, 5, 8, 9]

Conclusion

Sorting is a crucial operation in Python for organizing and analyzing data. In this comprehensive guide, we explored different sorting algorithms, compared their performance, discussed their pros and cons, and provided tips and tricks for efficient sorting. We also delved into sorting algorithms available in popular Python libraries and modules. By mastering sorting techniques in Python, you can enhance your data manipulation and analysis capabilities.

So, go ahead and apply these techniques to your projects and unlock the power of sorting in Python!

Unlock the power of Python with our free introductory course. Join now to unravel Python’s secrets and become a proficient Python pro.

Enroll Now!

Frequently Asked Questions

Q1. What is the most efficient way to sort in Python?

A. The most efficient way is to use the built-in sorted() function or the sort() method for lists, offering flexibility depending on whether you want a new sorted list or to modify the original list in-place.

Q2. What are sorting techniques in Python?

A. Python employs various sorting algorithms like Timsort, quicksort, and mergesort. Each has unique trade-offs in terms of performance and stability, allowing developers to choose based on specific requirements.

Q3. What are sorted Python methods?

A. Python provides built-in methods such as sorted() and sort() for arranging elements. sorted() returns a new sorted list, while sort() modifies the original list in-place, offering different options for sorting data.

Q4. What is the fastest sorting function in Python?

A. The fastest sorting function depends on the data size and characteristics. Timsort, a hybrid sorting algorithm, is often considered one of the fastest and practical choices for various scenarios.

Nitika Sharma 08 Jan 2024

Beginner Python

Complete Guide on Sorting Techniques in Python [2024 Edition]

Introduction

Table of contents

What is Sorting and Why is it Important?

Understanding the Efficiency of Sorting Algorithms

Time Complexity Analysis

Space Complexity Analysis

Best, Average, and Worst Case Scenarios

Different Sorting Algorithms and Implementation in Python

Bubble Sort

Selection Sort

Insertion Sort

Merge Sort

Quick Sort

Heap Sort

Radix Sort

Counting Sort

Bucket Sort

Shell Sort

Comparing Sorting Algorithms in Python

Performance Analysis

Choosing the Right Sorting Algorithm for Your Data

Sorting Algorithms in Python Libraries and Modules

Sorting with the Built-in `sorted()` Function

Sorting with the `sort()` Method

Sorting with the `numpy` Library

Sorting with the `pandas` Library

Sorting with the `collections` Module

Conclusion

Frequently Asked Questions

Frequently Asked Questions

Responses From Readers

Write for us