The collections
module in Python provides several specialized container datatypes. Some of the important ones include:
The heapq
module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. A heap is a binary tree - based data structure where each node is smaller (or larger) than its children. The smallest (or largest) element can be accessed in constant time, and insertion and deletion operations take logarithmic time.
The bisect
module provides support for maintaining a list in sorted order without having to sort the list after each insertion. It uses the binary search algorithm to find the insertion point in a sorted list.
from collections import Counter, deque, OrderedDict, defaultdict
# Counter example
words = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
word_count = Counter(words)
print(word_count)
# deque example
d = deque([1, 2, 3])
d.appendleft(0)
d.append(4)
print(d)
# OrderedDict example
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print(od)
# defaultdict example
def default_value():
return 'default'
dd = defaultdict(default_value)
dd['key1'] = 'value1'
print(dd['key1'])
print(dd['key2'])
import heapq
# Creating a heap
nums = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
heapq.heapify(nums)
print(nums)
# Pushing and popping elements
heapq.heappush(nums, 0)
smallest = heapq.heappop(nums)
print(smallest)
import bisect
sorted_list = [1, 3, 5, 7, 9]
# Find the insertion point
insertion_point = bisect.bisect(sorted_list, 4)
print(insertion_point)
# Insert an element in sorted order
bisect.insort(sorted_list, 4)
print(sorted_list)
The Counter
class is very useful when you need to count the occurrences of elements in a list or other iterable. For example, in text processing, you can use it to count the frequency of words in a document.
from collections import Counter
text = "This is a sample text. This text is for demonstration purposes."
words = text.split()
word_counter = Counter(words)
print(word_counter.most_common(3))
Priority queues are useful in many algorithms, such as Dijkstra’s shortest - path algorithm. You can use the heapq
module to implement a priority queue.
import heapq
priority_queue = []
heapq.heappush(priority_queue, (1, 'Task 1'))
heapq.heappush(priority_queue, (3, 'Task 3'))
heapq.heappush(priority_queue, (2, 'Task 2'))
while priority_queue:
priority, task = heapq.heappop(priority_queue)
print(f"Processing {task} with priority {priority}")
The bisect
module can be used to perform binary search in a sorted list. For example, if you want to find if an element exists in a sorted list, you can use the bisect
functions.
import bisect
sorted_list = [1, 3, 5, 7, 9]
index = bisect.bisect_left(sorted_list, 5)
if index < len(sorted_list) and sorted_list[index] == 5:
print("Element found")
else:
print("Element not found")
Counter
, keep in mind that it stores all unique elements and their counts. If the number of unique elements is very large, it can consume a significant amount of memory. For deque
, it has a relatively small constant overhead compared to lists for operations at both ends.bisect
to maintain a sorted list is more efficient than sorting the list after each insertion, especially for large lists.word_count
instead of wc
when using a Counter
for word counting.Python’s advanced data structures in the collections
, heapq
, and bisect
modules are powerful tools that can greatly enhance the efficiency and readability of your code. By understanding their fundamental concepts, usage methods, common practices, and best practices, you can leverage these “hidden gems” to solve a wide range of problems more effectively. Whether you are dealing with data processing, algorithm implementation, or performance optimization, these data structures can be valuable additions to your Python toolkit.