Exploring Time and Space Complexities of Bucket Sort
This article is a complementary resource to the DSA with Python and DSA with C++ courses.
This article is a complementary resource to the DSA with Python and DSA with C++ courses.
Bucket sort is an algorithm that
def bucket_sort(arr):
if not arr:
return arr
# Create buckets
num_buckets = len(arr)
max_value = max(arr)
min_value = min(arr)
range_per_bucket = (max_value - min_value) / num_buckets + 1
buckets = [[] for _ in range(num_buckets)]
# Distribute elements into buckets
for num in arr:
index = int((num - min_value) // range_per_bucket)
buckets[index].append(num)
# Sort each bucket and concatenate
sorted_arr = []
for bucket in buckets:
sorted_arr.extend(sorted(bucket))
return sorted_arr
void bucket_sort(vector<float>& arr) {
if (arr.empty())
return;
// Create buckets
size_t num_buckets = arr.size();
float max_value = *max_element(arr.begin(), arr.end());
float min_value = *min_element(arr.begin(), arr.end());
float range_per_bucket = (max_value - min_value) / num_buckets + 1;
vector<vector<float>> buckets(num_buckets);
// Distribute elements into buckets
for (float num : arr) {
size_t index = floor((num - min_value) / range_per_bucket);
buckets[index].push_back(num);
}
// Sort each bucket and concatenate
arr.clear();
for (auto& bucket : buckets) {
sort(bucket.begin(), bucket.end());
arr.insert(arr.end(), bucket.begin(), bucket.end());
}
}
Best Case Time Complexity: O(n + k)
The best case time complexity occurs when the elements are distributed uniformly across the buckets. Sorting within each bucket is minimal, leading to an overall linear complexity.
Suppose we have the following array to sort:
0.12, 0.25, 0.37, 0.48, 0.59, 0.61, 0.73, 0.85, 0.94
The elements are already uniformly distributed across the range [0, 1]
. They are divided into buckets (e.g., 0 – 0.2
, 0.2 – 0.4
, etc.).
Each bucket has very few elements, and sorting them takes minimal time. Merging the buckets results in the sorted array.
This leads to a time complexity of O(n + k)
, as the distribution and sorting within buckets are efficient.
Here, n
is the number of elements and k
is the number of buckets.
Worst Case Time Complexity: <katex>O(n^{2})</katex>
The worst case time complexity arises when all elements fall into a single bucket. For example,
Consider the array:
0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99
Here, all elements fall into a single bucket because their values are concentrated within a narrow range (e.g., 0.9 – 1.0
).
The single bucket contains all the elements, which must be sorted using a sorting algorithm like insertion sort.
In this scenario, sorting the bucket using an algorithm like insertion sort takes <katex>O(n^{2})</katex>
.
Average Case Time Complexity: O(n + k)
On average, elements are distributed fairly evenly among the buckets. Sorting each bucket is efficient, and concatenation adds a negligible overhead.
Bucket sort requires additional space for the buckets. The total memory used depends on the number of elements (n
) and the number of buckets (k
).
Best Case Time Complexity | O(n + k) |
Worst Case Time Complexity | <katex>O(n^{2})</katex> |
Average Case Time Complexity | O(n + k) |
Space Complexity | O(n + k) |
Bucket sort is highly efficient when the input is uniformly distributed and the number of buckets is appropriately chosen.
When to Use:
When to Avoid:
Scenario | Comparisons | Swaps |
---|---|---|
Uniformly Distributed | Low | Few |
Concentrated in Buckets | High | Many |
Hence, bucket sort is more suitable for uniformly distributed data.