At the core of NumPy is the ndarray
(n - dimensional array) object. It is a homogeneous data structure, meaning all elements in the array must be of the same data type (e.g., integers, floating - point numbers). This homogeneity allows NumPy to store and manipulate data more efficiently compared to native Python lists.
NumPy arrays have a contiguous memory layout. This means that the elements of the array are stored adjacent to each other in memory. This layout enables fast access to elements and efficient memory usage, which is crucial for high - performance computing.
Broadcasting is a powerful mechanism in NumPy that allows arrays of different shapes to be used in arithmetic operations. It eliminates the need for explicit looping over array elements, making the code more concise and faster.
import numpy as np
# Create a 1 - D array from a Python list
arr1 = np.array([1, 2, 3, 4, 5])
print("1 - D array:", arr1)
# Create a 2 - D array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print("2 - D array:\n", arr2)
# Create an array of zeros
zeros_arr = np.zeros((3, 3))
print("Array of zeros:\n", zeros_arr)
# Create an array of ones
ones_arr = np.ones((2, 4))
print("Array of ones:\n", ones_arr)
# Create an array with a range of values
range_arr = np.arange(0, 10, 2)
print("Array with a range of values:", range_arr)
# Addition of two arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b
print("Array addition:", c)
# Multiplication of an array by a scalar
d = 2 * a
print("Array multiplied by a scalar:", d)
# Dot product of two arrays
dot_product = np.dot(a, b)
print("Dot product:", dot_product)
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Access a single element
element = arr[1, 2]
print("Single element:", element)
# Slice a row
row_slice = arr[1, :]
print("Row slice:", row_slice)
# Slice a column
col_slice = arr[:, 2]
print("Column slice:", col_slice)
Vectorization is the process of performing operations on entire arrays at once, rather than iterating over individual elements. This approach is much faster as it takes advantage of NumPy’s underlying C implementation.
import time
# Using a loop
start_time = time.time()
a = [i for i in range(1000000)]
b = [i * 2 for i in a]
end_time = time.time()
print("Time taken using loop:", end_time - start_time)
# Using NumPy vectorization
start_time = time.time()
a_np = np.arange(1000000)
b_np = a_np * 2
end_time = time.time()
print("Time taken using NumPy vectorization:", end_time - start_time)
NumPy provides a variety of aggregation functions such as sum
, mean
, min
, and max
that can be applied to arrays.
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Sum of all elements
total_sum = np.sum(arr)
print("Total sum:", total_sum)
# Mean of each column
col_mean = np.mean(arr, axis = 0)
print("Mean of each column:", col_mean)
When working with large NumPy arrays, it’s important to manage memory efficiently. Avoid creating unnecessary copies of arrays. Use in - place operations whenever possible.
arr = np.array([1, 2, 3])
# In - place addition
arr += 1
print("Array after in - place addition:", arr)
Choose the appropriate data type for your arrays based on the range of values you need to store. For example, if you only need to store integers between 0 and 255, use the np.uint8
data type.
small_ints = np.array([10, 20, 30], dtype = np.uint8)
print("Array with appropriate data type:", small_ints)
NumPy is a powerful library for fast array computation in Python. Its efficient memory layout, broadcasting capabilities, and support for vectorized operations make it an essential tool for numerical and scientific computing. By understanding the fundamental concepts, usage methods, common practices, and best practices outlined in this blog, readers can effectively utilize NumPy to write high - performance code.