LRU Caching
This article is a complementary resource to the DSA with Python and DSA with C++ courses.
This article is a complementary resource to the DSA with Python and DSA with C++ courses.
Caching is a technique for storing data temporarily in memory to enable quick retrieval of frequently used data.
LRU (Least Recently Used) is an algorithm employed in caching to manage data stored in the cache.
Before we dive deeper into LRU, let's first explore how caching works and why it's needed.
Computers use fast memory types like RAM or cache memory to store frequently used data. These memories are much faster than other mediums such as SSDs (solid-state drives).
However, these high-speed memories are expensive and have limited capacity. You might be familiar with RAM sizes like 16 GB or a cache memory of 6 MB, which are smaller than a typical hard disk.
When data stored in these memories gradually accumulates, we may need to replace the existing data with new information.
That's where the LRU algorithm comes in—it provides a simple yet effective method to determine which data to discard when the memory is full.
Let's explore how this works.
In LRU caching, we maintain the order of the data based on its recent usage, specifically tracking the data that is least recently used.
When the cache is full, and we need to add new data to the cache, we follow a two-step process:
In LRU caching, the primary requirement is to identify the data that is least recently used and to maintain the order of the data based on its usage.
A singly linked list could satisfy this requirement because it allows tracking the order of data usage based on their position in the list.
Specifically, the data at the front of the list is the most recently used, while the data at the back is the least recently used.
However, an LRU cache requires us to delete the least recently used data, which means deleting the last node in the list.
Unfortunately, deleting the last node in a singly linked list requires a linear time complexity O(n)
, which is inefficient for this case.
This problem can be solved using a doubly linked list (DLL), which maintains a tail
pointer to the last node, reducing the time complexity of deletion to O(1)
.
A cache also requires efficient lookup time. Unfortunately, the lookup time for a doubly linked list is typically O(n)
.
To achieve faster lookups, insertion, and deletion, we need an LRU cache that combines the benefits of different data structures.
We know that a hashmap offers fast lookups, while a doubly linked list enables fast deletion and insertion.
Thus, we can use a mix of a hashmap and a doubly linked list to implement a LRU Cache.
The following conditions must be met when accessing a cache entry:
Condition 1:
Condition 2:
def get(self, key):
# Return the value for the key if it exists
if key in self.cache:
self.cache.move_to_end(key) # Mark as recently used
return self.cache[key]
return None
// Function to access an entry in the hash map
std::optional<int> get(int key) {
// Condition 1: Key exists
if (hashmap.find(key) != hashmap.end()) {
// Get the iterator (pointer) to the node
list<Node>::iterator node = hashmap[key];
// Make node the head because
// it is recently accessed
dll.splice(dll.begin(), dll, node);
// Return the value
return node->value;
}
// Condition 2: Key doesn't exist
return {};
}
The following conditions must be met when inserting a new entry into the cache.
Condition 1:
Condition 2:
// Function to insert(and replace) entries in the cache
void put(int key, int value) {
// Condition 2: Key already exists
// Remove the associated node
if (hashmap.find(key) != hashmap.end()) {
dll.erase(hashmap[key]);
}
// Condition 1: Cache is full
else if (dll.size() == capacity) {
// Remove the tail
// i.e. least recently-used node
hashmap.erase(dll.back().key);
dll.pop_back();
}
// Create a new node and make it the head
dll.push_front(Node(key, value));
// The new node is the head now
hashmap[key] = dll.begin();
}
def put(self, key, value):
# Insert or update the key-value pair in the cache
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
self.cache.popitem(last=False) # Remove least recently used item
#include <iostream>
#include <list>
#include <optional>
#include <unordered_map>
using namespace std;
class Node {
public:
int key;
int value;
Node(int k, int v) : key(k), value(v) {}
};
class LRUCache {
private:
int capacity;
// C++ list is an implementation of a doubly linked list
list<Node> dll;
// C++ unordered_map is an implementation of a hash map
unordered_map<int, list<Node>::iterator> hashmap;
public:
LRUCache(int capacity) : capacity(capacity) {}
std::optional<int> get(int key) {
// Check if key exists
if (hashmap.find(key) != hashmap.end()) {
// Get the iterator (pointer) to the node
list<Node>::iterator node = hashmap[key];
// Make the node the head
// because it is recently accessed
dll.splice(dll.begin(), dll, node);
// Return the value
return node->value;
}
return {};
}
void put(int key, int value) {
// Condition: Key already exists:
// Remove the associated node
if (hashmap.find(key) != hashmap.end()) {
dll.erase(hashmap[key]);
}
// Condition: Cache is full
else if (dll.size() == capacity) {
// Remove the tail
// i.e. least recently-used node
hashmap.erase(dll.back().key);
dll.pop_back();
}
// Create a new node and make it the head
dll.push_front(Node(key, value));
// The new node is the head now.
hashmap[key] = dll.begin();
}
void display() {
cout << "Cache Contents (Most Recently Used -> Least Recently Used):\n";
cout << "Key\tValue\n";
for (const auto& node : dll) {
cout << node.key << "\t" << node.value << '\n';
}
cout << "----------------------\n";
}
};
int main() {
// Cache with a capacity of 2.
LRUCache cache(2);
// Add a value to cache
cache.put(1, 1);
cout << "Added (1,1):\n";
cache.display();
// Add another value to cache
cache.put(2, 2);
cout << "Added (2,2):\n";
cache.display();
// We are accessing 1, so 1 becomes recently used
cout << "Accessed key 1: " << cache.get(1).value() << endl;
cache.display();
// Cache is full, so tail is replaced
// i.e., removes value 2
cache.put(3, 3);
cout << "Added (3,3):\n";
cache.display();
// Cache Miss: Value 2 not in cache
// cout << cache.get(2).value() << endl; // (not found)
cache.display();
return 0;
}
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
# Initialize the cache with a fixed capacity.
self.capacity = capacity
# OrderedDict provides a combination of Dictionary and DLL
# It stores ordered key-value pairs like a Hashmap
# And like DLL, it provides constant time complexity
# for accessing the first and last elements
self.cache = OrderedDict()
def get(self, key):
# Return the value for the key if it exists
if key in self.cache:
self.cache.move_to_end(key) # Mark as recently used
return self.cache[key]
return None
def put(self, key, value):
# Insert or update the key-value pair in the cache
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
self.cache.popitem(last=False) # Remove least recently used item
def __str__(self):
# Return a string representation of the cache contents
return " -> ".join(f"{key}: {value}" for key, value in self.cache.items())
# Driver code
if __name__ == "__main__":
# Create a cache with a capacity of 2
cache = LRUCache(2)
# Add key-value pairs to the cache
cache.put(1, 1)
print("Added (1, 1):", cache)
cache.put(2, 2)
print("Added (2, 2):", cache)
# Access key 1 to mark it as recently used
print("Accessed key 1:", cache.get(1))
print("Cache after accessing key 1:", cache)
# Add another key-value pair, removing the least recently used key
cache.put(3, 3)
print("Added (3, 3)", cache)
# Attempt to access a missing key (key 2)
print("Accessed key 2 (should be None):", cache.get(2))
# Display final cache contents
print("Final cache contents:", cache)
The time complexity of our LRU cache is O(1)
for both get()
and put()
operations, as they involve constant-time operations such as hashmap lookups, node insertion and deletion.
The space complexity is O(capacity)
, where capacity
represents the maximum number of elements the cache can hold, since the cache stores a fixed number of nodes in the DLL and the hashmap.