Exploring Time Complexity of Huffman Coding Algorithm
This article is a complementary resource to the DSA with C++ course.
This article is a complementary resource to the DSA with C++ course.
Huffman coding is a widely used data compression algorithm.
In this article, we'll delve into the time complexity of this algorithm and understand its implications.
Huffman coding works by assigning shorter codes to more frequently occurring characters in a text, resulting in a more efficient binary representation of the text.
This is achieved by constructing a binary tree where the path to each character's leaf node represents its binary code.
def build_huffman_tree(self, s):
# Count the frequency of each character
freq_map = {char:s.count(char) for char in s}
huffman_tree = [HuffmanTreeNode(char, freq) for char, freq in freq_map.items()]
# Build the Huffman Tree
while len(huffman_tree) > 1:
# Sort characters based on their frequencies in ascending order
huffman_tree = sorted(huffman_tree, key=lambda x: x.freq)
# Pop the first two nodes
left = huffman_tree.pop(0)
right = huffman_tree.pop(0)
# Merge the nodes
merged_freq = left.freq + right.freq
merged_node = HuffmanTreeNode(None, merged_freq)
merged_node.add_left_child(left)
merged_node.add_right_child(right)
# Add the merge node to the tree
huffman_tree.append(merged_node)
self.root = huffman_tree[0]
def generate_codes(self, node=None, code="", huffman_code_map={}):
if not node:
node = self.root
# If it's a leaf node, store the code
if node.data:
huffman_code_map[node.data] = code
return huffman_code_map
# Traverse left and right
self.generate_codes(node.left, code + "0", huffman_code_map)
self.generate_codes(node.right, code + "1", huffman_code_map)
return huffman_code_map
def encode(self, s, huffman_codes):
return ''.join([huffman_codes[char] for char in s])
void build_huffman_tree(const string& message) {
unordered_map<T, int> freq_map;
// Count the frequency of each character
for (char c : message) {
freq_map[c]++;
}
vector<HuffmanTreeNode<T>*> huffman_tree;
for (const auto& pair : freq_map) {
huffman_tree.push_back(new HuffmanTreeNode<T>(pair.first, pair.second));
}
// Build the Huffman Tree
while (huffman_tree.size() > 1) {
sort(huffman_tree.begin(), huffman_tree.end(), [](HuffmanTreeNode<T>* a, HuffmanTreeNode<T>* b) {
return a->freq < b->freq;
}
);
HuffmanTreeNode<T>* left = huffman_tree[0];
HuffmanTreeNode<T>* right = huffman_tree[1];
huffman_tree.erase(huffman_tree.begin());
huffman_tree.erase(huffman_tree.begin());
int merged_freq = left->freq + right->freq;
// Merge the nodes
HuffmanTreeNode<T>* merged_node = new HuffmanTreeNode<T>(T(), merged_freq);
merged_node->add_left_child(left);
merged_node->add_right_child(right);
huffman_tree.push_back(merged_node);
}
this->root = huffman_tree[0];
}
void generate_codes(HuffmanTreeNode<T>* node, string code, unordered_map<T, string>& huffman_code_map) {
if (!node) {
return;
}
if (node->data) {
huffman_code_map[node->data] = code;
return;
}
// Traverse the left node
generate_codes(node->left, code + "0", huffman_code_map);
// Traverse the right node
generate_codes(node->right, code + "1", huffman_code_map);
}
string encode(const string& message, const unordered_map<T, string>& huffman_codes) {
string encoded_string;
for (char c : message) {
encoded_string += huffman_codes.at(c);
}
return encoded_string;
}
The time complexity of the above program can be divided into the following components:
1. Creating a Frequency Map
The program builds a frequency map of the characters in the input text by iterating over all the characters in it. So, the time complexity is proportional to the number of characters in the text.
Thus the time complexity is O(n)
, where n
is the number of characters in the input text.
2. Creating a Huffman Tree
The program creates a Huffman tree by iterating over the frequency map and creating a node for each character. The time complexity of this step is proportional to the number of keys (which is equal to the number of unique characters in the input message) in the frequency map.
Thus, the time complexity of this step is O(m)
, where m
is the number of unique characters in the input message.
3. Building the Huffman Tree
The program builds the Huffman tree by repeatedly selecting the two nodes with the lowest frequencies, merging them into a new node, and adding the new node back into the vector.
To select nodes with the lowest frequency, the program uses a sorting function. This operation takes O(m logm)
time, where m
is the number of unique characters in the input text.
The program repeats this process until only one node remains, which is the root of the Huffman tree. This means that the sorting operation is performed m-1
times, where m
is the number of unique characters in the input text.
Thus, this step takes <katex>O(m * m logm) = O(m^{2} logm)<katex>
time.
4. Generating Huffman Codes
The program generates Huffman codes for each character by traversing the Huffman tree, which requires O(m)
time.
5. Encoding the Message
The program encodes the input text using the generated Huffman codes. This step involves iterating over all the characters in the input message.
So, this step takes O(n)
time, where n
is the length of the input text.
The overall time complexity is the sum of all these steps:
= O(n + m^{2} logm)