Exploring Time Complexity of Huffman Coding Algorithm

This article is a complementary resource to the DSA with C++ course.

Exploring Time Complexity of Huffman Coding Algorithm

Huffman coding is a widely used data compression algorithm.

In this article, we'll delve into the time complexity of this algorithm and understand its implications.

How Does the Huffman Coding Algorithm Work?

Huffman coding works by assigning shorter codes to more frequently occurring characters in a text, resulting in a more efficient binary representation of the text.

This is achieved by constructing a binary tree where the path to each character's leaf node represents its binary code.

Python

Java

C++

def build_huffman_tree(self, s):
    # Count the frequency of each character
    freq_map = {char:s.count(char) for char in s}
    
    huffman_tree = [HuffmanTreeNode(char, freq) for char, freq in freq_map.items()]
    
    # Build the Huffman Tree
    while len(huffman_tree) > 1:
        # Sort characters based on their frequencies in ascending order
        huffman_tree = sorted(huffman_tree, key=lambda x: x.freq)
   	 
        # Pop the first two nodes
        left = huffman_tree.pop(0)
        right = huffman_tree.pop(0)

        # Merge the nodes
        merged_freq = left.freq + right.freq
        merged_node = HuffmanTreeNode(None, merged_freq)
        merged_node.add_left_child(left)
        merged_node.add_right_child(right)

        # Add the merge node to the tree
        huffman_tree.append(merged_node)

    self.root = huffman_tree[0]

def generate_codes(self, node=None, code="", huffman_code_map={}):
    if not node:
        node = self.root

    # If it's a leaf node, store the code
    if node.data:
        huffman_code_map[node.data] = code
        return huffman_code_map

    # Traverse left and right
    self.generate_codes(node.left, code + "0", huffman_code_map)
    self.generate_codes(node.right, code + "1", huffman_code_map)

    return huffman_code_map

def encode(self, s, huffman_codes):
    return ''.join([huffman_codes[char] for char in s])

void build_huffman_tree(const string& message) {
    unordered_map<T, int> freq_map;
    // Count the frequency of each character
    for (char c : message) {
        freq_map[c]++;
    }

    vector<HuffmanTreeNode<T>*> huffman_tree;
    for (const auto& pair : freq_map) {
        huffman_tree.push_back(new HuffmanTreeNode<T>(pair.first, pair.second));
    }
    // Build the Huffman Tree
    while (huffman_tree.size() > 1) {
        sort(huffman_tree.begin(), huffman_tree.end(), [](HuffmanTreeNode<T>* a, HuffmanTreeNode<T>* b) {
            return a->freq < b->freq;
            }
        );

        HuffmanTreeNode<T>* left = huffman_tree[0];
        HuffmanTreeNode<T>* right = huffman_tree[1];
        huffman_tree.erase(huffman_tree.begin());
        huffman_tree.erase(huffman_tree.begin());

        int merged_freq = left->freq + right->freq;
        // Merge the nodes
        HuffmanTreeNode<T>* merged_node = new HuffmanTreeNode<T>(T(), merged_freq);
        merged_node->add_left_child(left);
        merged_node->add_right_child(right);

        huffman_tree.push_back(merged_node);
    }

    this->root = huffman_tree[0];
}

void generate_codes(HuffmanTreeNode<T>* node, string code, unordered_map<T, string>& huffman_code_map) {
    if (!node) {
        return;
    }

    if (node->data) {
        huffman_code_map[node->data] = code;
        return;
    }
    // Traverse the left node
    generate_codes(node->left, code + "0", huffman_code_map);
    // Traverse the right node
    generate_codes(node->right, code + "1", huffman_code_map);
}

string encode(const string& message, const unordered_map<T, string>& huffman_codes) {
    string encoded_string;
    for (char c : message) {
        encoded_string += huffman_codes.at(c);
    }
    return encoded_string;
}

Time Complexity Analysis

The time complexity of the above program can be divided into the following components:

1. Creating a Frequency Map

The program builds a frequency map of the characters in the input text by iterating over all the characters in it. So, the time complexity is proportional to the number of characters in the text.

Thus the time complexity is O(n), where n is the number of characters in the input text.

2. Creating a Huffman Tree

The program creates a Huffman tree by iterating over the frequency map and creating a node for each character. The time complexity of this step is proportional to the number of keys (which is equal to the number of unique characters in the input message) in the frequency map.

Thus, the time complexity of this step is O(m), where m is the number of unique characters in the input message.

3. Building the Huffman Tree

The program builds the Huffman tree by repeatedly selecting the two nodes with the lowest frequencies, merging them into a new node, and adding the new node back into the vector.

To select nodes with the lowest frequency, the program uses a sorting function. This operation takes O(m logm) time, where m is the number of unique characters in the input text.

The program repeats this process until only one node remains, which is the root of the Huffman tree. This means that the sorting operation is performed m-1 times, where m is the number of unique characters in the input text.

Thus, this step takes <katex>O(m * m logm) = O(m^{2} logm)<katex> time.

4. Generating Huffman Codes

The program generates Huffman codes for each character by traversing the Huffman tree, which requires O(m) time.

5. Encoding the Message

The program encodes the input text using the generated Huffman codes. This step involves iterating over all the characters in the input message.

So, this step takes O(n) time, where n is the length of the input text.

Overall Time Complexity: O(n + m^{2} logm)

The overall time complexity is the sum of all these steps:

O(n) + O(m) + O(m^{2} logm)

= O(n + m^{2} logm)