Merge sort is a divide - and - conquer algorithm. It divides the unsorted list into n sub - lists, each containing one element (a list of one element is considered sorted). Then it repeatedly merges sub - lists to produce new sorted sub - lists until there is only one sub - list remaining.
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left_half = arr[:mid]
right_half = arr[mid:]
left_half = merge_sort(left_half)
right_half = merge_sort(right_half)
return merge(left_half, right_half)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] < right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
arr = [38, 27, 43, 3, 9, 82, 10]
sorted_arr = merge_sort(arr)
print(sorted_arr)
Merge sort is often used when a stable sorting algorithm is required. A stable sort maintains the relative order of elements with equal values.
When dealing with large datasets, using the built - in sorted()
function in Python is more efficient as it is implemented in highly optimized C code.
Quick sort is also a divide - and - conquer algorithm. It selects a ‘pivot’ element from the array and partitions the other elements into two sub - arrays, according to whether they are less than or greater than the pivot. The sub - arrays are then sorted recursively.
def quick_sort(arr):
if len(arr) <= 1:
return arr
else:
pivot = arr[0]
less = [x for x in arr[1:] if x <= pivot]
greater = [x for x in arr[1:] if x > pivot]
return quick_sort(less)+[pivot]+quick_sort(greater)
arr = [38, 27, 43, 3, 9, 82, 10]
sorted_arr = quick_sort(arr)
print(sorted_arr)
Quick sort is widely used in practice because of its average case time complexity of $O(n log n)$.
Randomly selecting the pivot can avoid the worst - case scenario (when the array is already sorted) and improve the algorithm’s performance.
Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you’ve narrowed down the possible locations to just one.
def binary_search(arr, target):
low = 0
high = len(arr) - 1
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
arr = [2, 3, 4, 10, 40]
target = 10
result = binary_search(arr, target)
if result != -1:
print(f"Element found at index {result}")
else:
print("Element not found")
Binary search is used when you need to search for an element in a large sorted dataset, such as in a database index.
Make sure the input array is sorted before applying binary search. Otherwise, the algorithm will not work correctly.
Dijkstra’s algorithm is used to find the shortest path between nodes in a graph with non - negative edge weights. It starts at a given source node and explores the graph, gradually finding the shortest path to all other nodes.
import heapq
def dijkstra(graph, start):
distances = {node: float('inf') for node in graph}
distances[start] = 0
priority_queue = [(0, start)]
while priority_queue:
current_distance, current_node = heapq.heappop(priority_queue)
if current_distance > distances[current_node]:
continue
for neighbor, weight in graph[current_node].items():
distance = current_distance + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(priority_queue, (distance, neighbor))
return distances
graph = {
'A': {'B': 1, 'C': 4},
'B': {'A': 1, 'C': 2, 'D': 5},
'C': {'A': 4, 'B': 2, 'D': 1},
'D': {'B': 5, 'C': 1}
}
start_node = 'A'
shortest_distances = dijkstra(graph, start_node)
print(shortest_distances)
Dijkstra’s algorithm is used in routing protocols, map applications to find the shortest route between two locations.
If the graph has negative edge weights, Dijkstra’s algorithm cannot be used. Instead, the Bellman - Ford algorithm should be used.
K - Means clustering is an unsupervised learning algorithm that partitions n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers are called centroids).
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters = 2, random_state = 0).fit(X)
print(kmeans.labels_)
print(kmeans.predict([[0, 0], [4, 4]]))
print(kmeans.cluster_centers_)
K - Means clustering is used in customer segmentation, image compression, and anomaly detection.
The choice of the number of clusters (k) is crucial. The elbow method can be used to determine the optimal value of k.
Advanced algorithms in Python are powerful tools for solving a wide range of complex problems. Whether it’s sorting large datasets, searching for elements efficiently, finding shortest paths in graphs, or performing clustering in machine learning, Python provides the necessary libraries and the ability to implement these algorithms from scratch. By understanding the fundamental concepts, usage methods, common practices, and best practices of these algorithms, developers and data scientists can make more informed decisions and write more efficient code.