Exploring Time and Space Complexities of Bucket Sort

This article is a complementary resource to the DSA with Python and DSA with C++ courses.

Exploring Time and Space Complexities of Bucket Sort

Bucket sort is an algorithm that

  1. Distributes elements into a number of buckets.
  2. Sorts these buckets individually (using another algorithm like insertion sort).
  3. Concatenates the sorted buckets.
def bucket_sort(arr):
    if not arr:
        return arr
    
    # Create buckets
    num_buckets = len(arr)
    max_value = max(arr)
    min_value = min(arr)
    range_per_bucket = (max_value - min_value) / num_buckets + 1
    buckets = [[] for _ in range(num_buckets)]
    
    # Distribute elements into buckets
    for num in arr:
        index = int((num - min_value) // range_per_bucket)
        buckets[index].append(num)
    
    # Sort each bucket and concatenate
    sorted_arr = []
    for bucket in buckets:
        sorted_arr.extend(sorted(bucket))
    
    return sorted_arr
void bucket_sort(vector<float>& arr) {
    if (arr.empty())
        return;

    // Create buckets
    size_t num_buckets = arr.size();
    float max_value = *max_element(arr.begin(), arr.end());
    float min_value = *min_element(arr.begin(), arr.end());
    float range_per_bucket = (max_value - min_value) / num_buckets + 1;
    vector<vector<float>> buckets(num_buckets);
    
    // Distribute elements into buckets
    for (float num : arr) {
        size_t index = floor((num - min_value) / range_per_bucket);
        buckets[index].push_back(num);
    }
    
    // Sort each bucket and concatenate
    arr.clear();
    for (auto& bucket : buckets) {
        sort(bucket.begin(), bucket.end());
        arr.insert(arr.end(), bucket.begin(), bucket.end());
    }
}

Time Complexity

Best Case Time Complexity: O(n + k)

The best case time complexity occurs when the elements are distributed uniformly across the buckets. Sorting within each bucket is minimal, leading to an overall linear complexity.

Suppose we have the following array to sort:

0.12, 0.25, 0.37, 0.48, 0.59, 0.61, 0.73, 0.85, 0.94

The elements are already uniformly distributed across the range [0, 1]. They are divided into buckets (e.g., 0 – 0.2, 0.2 – 0.4, etc.).

Each bucket has very few elements, and sorting them takes minimal time. Merging the buckets results in the sorted array.

This leads to a time complexity of O(n + k), as the distribution and sorting within buckets are efficient.

Here, n is the number of elements and k is the number of buckets.

Worst Case Time Complexity: <katex>O(n^{2})</katex>

The worst case time complexity arises when all elements fall into a single bucket. For example,

Consider the array:

0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99

Here, all elements fall into a single bucket because their values are concentrated within a narrow range (e.g., 0.9 – 1.0).

The single bucket contains all the elements, which must be sorted using a sorting algorithm like insertion sort.

In this scenario, sorting the bucket using an algorithm like insertion sort takes <katex>O(n^{2})</katex>.

Average Case Time Complexity: O(n + k)

On average, elements are distributed fairly evenly among the buckets. Sorting each bucket is efficient, and concatenation adds a negligible overhead.

Number of Operations vs Input Size
Number of Operations vs Input Size

Space Complexity of Bucket Sort: O(n + k)

Bucket sort requires additional space for the buckets. The total memory used depends on the number of elements (n) and the number of buckets (k).


Summary: Time and Space Complexity

Best Case Time Complexity O(n + k)
Worst Case Time Complexity <katex>O(n^{2})</katex>
Average Case Time Complexity O(n + k)
Space Complexity O(n + k)

When to Use Bucket Sort?

Bucket sort is highly efficient when the input is uniformly distributed and the number of buckets is appropriately chosen.

When to Use:

  1. Uniformly Distributed Data: Bucket sort works well for datasets where the elements are evenly spread over a range.
  2. Small Datasets: With fewer elements, the sorting within buckets is fast, making the algorithm efficient.

When to Avoid:

  1. Large and Skewed Datasets: The performance degrades significantly for datasets with many elements concentrated in one part of the range.
  2. Memory constraints: The algorithm requires extra space for the buckets, which might be a limitation.
Scenario Comparisons Swaps
Uniformly Distributed Low Few
Concentrated in Buckets High Many

Hence, bucket sort is more suitable for uniformly distributed data.