NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides a high - performance multidimensional array object and tools for working with these arrays. Arrays in NumPy are homogeneous, meaning all elements must be of the same data type.
import numpy as np
# Create a 1 - D array
arr1 = np.array([1, 2, 3, 4, 5])
# Create a 2 - D array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
# Perform element - wise operations
result = arr1 + 2
print(result)
np.sum()
, np.mean()
, np.std()
for statistical calculations.Pandas is a library for data manipulation and analysis. It provides two main data structures: Series
(1 - D labeled array) and DataFrame
(2 - D labeled data structure with columns of potentially different types).
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
# Access a column
ages = df['Age']
# Filter rows
filtered_df = df[df['Age'] > 28]
print(filtered_df)
dropna()
or fillna()
.query()
method for more complex filtering conditions.Matplotlib is a plotting library for Python. It provides a wide range of visualization tools, including line plots, scatter plots, bar plots, and histograms.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.xlabel('X - axis')
plt.ylabel('Y - axis')
plt.title('Sine Wave')
plt.show()
Seaborn is a statistical data visualization library based on Matplotlib. It provides a high - level interface for creating attractive and informative statistical graphics.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
tips = sns.load_dataset('tips')
sns.scatterplot(x='total_bill', y='tip', data=tips)
plt.show()
Scikit - learn is a machine learning library in Python. It provides simple and efficient tools for data mining and data analysis, including classification, regression, clustering, and dimensionality reduction algorithms.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
train_test_split()
function to split the data into training and testing sets.cross_val_score()
to get a more reliable estimate of model performance.TensorFlow is an open - source machine learning library developed by Google. It is used for building and training deep learning models, including neural networks. Tensors are the fundamental data structure in TensorFlow, which can be thought of as multi - dimensional arrays.
import tensorflow as tf
# Create a simple neural network model
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)),
tf.keras.layers.Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
PyTorch is an open - source machine learning library developed by Facebook. It provides a dynamic computational graph, which makes it easier to build and train deep learning models, especially for research purposes.
import torch
import torch.nn as nn
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.fc1 = nn.Linear(4, 10)
self.fc2 = nn.Linear(10, 3)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
model = SimpleNet()
DataLoader
for efficient data handling.Keras is a high - level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It is designed to enable fast experimentation with deep neural networks.
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(10, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
EarlyStopping
and ModelCheckpoint
during training.LightGBM is a gradient boosting framework that uses tree - based learning algorithms. It is designed to be efficient, scalable, and fast, especially for large - scale datasets.
import lightgbm as lgb
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X = pd.DataFrame(iris.data)
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_test = lgb.Dataset(X_test, y_test, reference=lgb_train)
params = {
'objective': 'multiclass',
'num_class': 3,
'metric': 'multi_logloss'
}
model = lgb.train(params, lgb_train, num_boost_round=100, valid_sets=[lgb_test])
XGBoost (Extreme Gradient Boosting) is another gradient boosting library that is known for its high performance and scalability. It has been used in many winning solutions in machine learning competitions.
import xgboost as xgb
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X = pd.DataFrame(iris.data)
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test)
params = {
'objective': 'multi:softmax',
'num_class': 3
}
model = xgb.train(params, dtrain, num_boost_round=100)
predictions = model.predict(dtest)
In this blog, we have explored the top 10 Python frameworks for data science projects. Each framework has its own unique features and use cases. NumPy and Pandas are essential for data manipulation and numerical computations, while Matplotlib and Seaborn are great for data visualization. Scikit - learn provides a wide range of machine learning algorithms, and TensorFlow, PyTorch, and Keras are popular for deep learning. LightGBM and XGBoost are powerful gradient boosting frameworks. By understanding the fundamental concepts, usage methods, common practices, and best practices of these frameworks, data scientists can efficiently build and deploy data science projects.