Dec 11, 2024

Abhilekh Gautam

LRU Caching

This article is a complementary resource to the DSA with Python and DSA with C++ courses.

LRU Caching

Caching is a technique for storing data temporarily in memory to enable quick retrieval of frequently used data.

LRU (Least Recently Used) is an algorithm employed in caching to manage data stored in the cache.

Before we dive deeper into LRU, let's first explore how caching works and why it's needed.

Why Caching?

Computers use fast memory types like RAM or cache memory to store frequently used data. These memories are much faster than other mediums such as SSDs (solid-state drives).

However, these high-speed memories are expensive and have limited capacity. You might be familiar with RAM sizes like 16 GB or a cache memory of 6 MB, which are smaller than a typical hard disk.

When data stored in these memories gradually accumulates, we may need to replace the existing data with new information.

That's where the LRU algorithm comes in—it provides a simple yet effective method to determine which data to discard when the memory is full.

Let's explore how this works.

How Does an LRU Cache Work?

In LRU caching, we maintain the order of the data based on its recent usage, specifically tracking the data that is least recently used.

When the cache is full, and we need to add new data to the cache, we follow a two-step process:

Remove the least recently used data from the cache.
Insert the new data into the cache.

Choosing the Right Data Structure

In LRU caching, the primary requirement is to identify the data that is least recently used and to maintain the order of the data based on its usage.

A singly linked list could satisfy this requirement because it allows tracking the order of data usage based on their position in the list.

Specifically, the data at the front of the list is the most recently used, while the data at the back is the least recently used.

However, an LRU cache requires us to delete the least recently used data, which means deleting the last node in the list.

Unfortunately, deleting the last node in a singly linked list requires a linear time complexity O(n), which is inefficient for this case.

This problem can be solved using a doubly linked list (DLL), which maintains a tail pointer to the last node, reducing the time complexity of deletion to O(1).

Optimizing Lookup Time

A cache also requires efficient lookup time. Unfortunately, the lookup time for a doubly linked list is typically O(n).

To achieve faster lookups, insertion, and deletion, we need an LRU cache that combines the benefits of different data structures.

We know that a hashmap offers fast lookups, while a doubly linked list enables fast deletion and insertion.

Thus, we can use a mix of a hashmap and a doubly linked list to implement a LRU Cache.

The hashmap stores cache keys and their corresponding values, which are pointers to nodes in the DLL.
The DLL stores the cache entries in the order of usage, where the most recently used node is at the head and the least recently used node is at the tail.

Using a Hash map and a doubly linked list to implement a LRU Cache

Cache Operations

Accessing Cache Entries

The following conditions must be met when accessing a cache entry:

Condition 1:

If the key exists, move the corresponding node to the head of the DLL, marking it as the most recently used.

Condition 2:

If the key doesn't exist, return a miss (e.g., null or an error).

Python

Java

C++

def get(self, key):
    # Return the value for the key if it exists
    if key in self.cache:
        self.cache.move_to_end(key)  # Mark as recently used
        return self.cache[key]
    return None

// Function to access an entry in the hash map
std::optional<int> get(int key) {

    // Condition 1: Key exists
    if (hashmap.find(key) != hashmap.end()) {
        // Get the iterator (pointer) to the node
        list<Node>::iterator node = hashmap[key];

        // Make node the head because
        // it is recently accessed
        dll.splice(dll.begin(), dll, node);

        // Return the value
        return node->value;
    }

    // Condition 2: Key doesn't exist
    return {};
}

Inserting (and Replacing) Cache Entries

The following conditions must be met when inserting a new entry into the cache.

Condition 1:

If the cache is full, remove the least recently used node (tail) and add the new node to the head of the DLL.

Condition 2:

If the key already exists, update the value and move the node to the head of the DLL.

Inserting (and Replacing) cache entries.

Python

Java

C++

// Function to insert(and replace) entries in the cache
void put(int key, int value) {

    // Condition 2: Key already exists
    // Remove the associated node      
    if (hashmap.find(key) != hashmap.end()) {
        dll.erase(hashmap[key]);
    }

    // Condition 1: Cache is full
    else if (dll.size() == capacity) {
        // Remove the tail
        // i.e. least recently-used node
        hashmap.erase(dll.back().key);
        dll.pop_back();
    }

    // Create a new node and make it the head
    dll.push_front(Node(key, value));

    // The new node is the head now
    hashmap[key] = dll.begin();
}

def put(self, key, value):
    # Insert or update the key-value pair in the cache
    if key in self.cache:
        self.cache.move_to_end(key)
    self.cache[key] = value
    if len(self.cache) > self.capacity:
        self.cache.popitem(last=False)  # Remove least recently used item

Source Code

Python

Java

C++

#include <iostream>
#include <list>
#include <optional>
#include <unordered_map>

using namespace std;

class Node {
public:
    int key;
    int value;
    Node(int k, int v) : key(k), value(v) {}
};

class LRUCache {
private:
    int capacity;

    // C++ list is an implementation of a doubly linked list
    list<Node> dll;

    // C++ unordered_map is an implementation of a hash map
    unordered_map<int, list<Node>::iterator> hashmap;

public:
    LRUCache(int capacity) : capacity(capacity) {}

    std::optional<int> get(int key) {
        // Check if key exists
        if (hashmap.find(key) != hashmap.end()) {
            // Get the iterator (pointer) to the node
            list<Node>::iterator node = hashmap[key];

            // Make the node the head
            // because it is recently accessed
            dll.splice(dll.begin(), dll, node);

            // Return the value
            return node->value;
        }

        return {};
    }

    void put(int key, int value) {
        // Condition: Key already exists:
        // Remove the associated node
        if (hashmap.find(key) != hashmap.end()) {
            dll.erase(hashmap[key]);
        }

        // Condition: Cache is full
        else if (dll.size() == capacity) {
            // Remove the tail
            // i.e. least recently-used node
            hashmap.erase(dll.back().key);
            dll.pop_back();
        }

        // Create a new node and make it the head
        dll.push_front(Node(key, value));

        // The new node is the head now.
        hashmap[key] = dll.begin();
    }

    void display() {
        cout << "Cache Contents (Most Recently Used -> Least Recently Used):\n";
        cout << "Key\tValue\n";
        for (const auto& node : dll) {
            cout << node.key << "\t" << node.value << '\n';
        }
        cout << "----------------------\n";
    }
};

int main() {

    // Cache with a capacity of 2.
    LRUCache cache(2);

    // Add a value to cache
    cache.put(1, 1);
    cout << "Added (1,1):\n";
    cache.display();

    // Add another value to cache
    cache.put(2, 2);
    cout << "Added (2,2):\n";
    cache.display();

    // We are accessing 1, so 1 becomes recently used
    cout << "Accessed key 1: " << cache.get(1).value() << endl;
    cache.display();

    // Cache is full, so tail is replaced
    // i.e., removes value 2
    cache.put(3, 3);
    cout << "Added (3,3):\n";
    cache.display();

    // Cache Miss: Value 2 not in cache
    // cout << cache.get(2).value() << endl; // (not found)
    cache.display();

    return 0;
}

from collections import OrderedDict

class LRUCache:

    def __init__(self, capacity):

        # Initialize the cache with a fixed capacity.
        self.capacity = capacity


        # OrderedDict provides a combination of Dictionary and DLL 
        # It stores ordered key-value pairs like a Hashmap
        # And like DLL, it provides constant time complexity
        # for accessing the first and last elements      

        self.cache = OrderedDict()

    def get(self, key):

        # Return the value for the key if it exists
        if key in self.cache:
            self.cache.move_to_end(key)  # Mark as recently used
            return self.cache[key]

        return None

    def put(self, key, value):

        # Insert or update the key-value pair in the cache
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value

        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # Remove least recently used item

    def __str__(self):

        # Return a string representation of the cache contents
        return " -> ".join(f"{key}: {value}" for key, value in self.cache.items())


# Driver code

if __name__ == "__main__":

    # Create a cache with a capacity of 2
    cache = LRUCache(2)

    # Add key-value pairs to the cache
    cache.put(1, 1)
    print("Added (1, 1):", cache)

    cache.put(2, 2)
    print("Added (2, 2):", cache)

    # Access key 1 to mark it as recently used
    print("Accessed key 1:", cache.get(1))
    print("Cache after accessing key 1:", cache)

    # Add another key-value pair, removing the least recently used key
    cache.put(3, 3)
    print("Added (3, 3)", cache)

    # Attempt to access a missing key (key 2)
    print("Accessed key 2 (should be None):", cache.get(2))

    # Display final cache contents
    print("Final cache contents:", cache)

Complexity Analysis

Time Complexity

The time complexity of our LRU cache is O(1) for both get() and put() operations, as they involve constant-time operations such as hashmap lookups, node insertion and deletion.

Space Complexity

The space complexity is O(capacity), where capacity represents the maximum number of elements the cache can hold, since the cache stores a fixed number of nodes in the DLL and the hashmap.