You can download Python from the official website ( https://www.python.org/downloads/) . Follow the installation instructions for your operating system.
We can use pip
, the Python package manager, to install the required libraries. Open your command prompt or terminal and run the following commands:
pip install pandas
pip install matplotlib
pip install seaborn
pip install scikit - learn
Lists are mutable, ordered collections of elements.
# Creating a list
my_list = [1, 2, 3, 4, 5]
print(my_list)
# Accessing elements
print(my_list[0])
# Modifying elements
my_list[0] = 10
print(my_list)
Tuples are immutable, ordered collections.
# Creating a tuple
my_tuple = (1, 2, 3)
print(my_tuple)
# Accessing elements
print(my_tuple[1])
# Tuples cannot be modified
# my_tuple[1] = 10 # This will raise an error
Dictionaries are unordered collections of key - value pairs.
# Creating a dictionary
my_dict = {'name': 'John', 'age': 30}
print(my_dict)
# Accessing values
print(my_dict['name'])
# Modifying values
my_dict['age'] = 31
print(my_dict)
Pandas is a powerful library for data manipulation and analysis.
import pandas as pd
# Reading a CSV file
data = pd.read_csv('data.csv')
print(data.head())
# Selecting columns
selected_columns = data[['column1', 'column2']]
# Filtering data
filtered_data = data[data['column1'] > 10]
# Grouping data
grouped_data = data.groupby('column3').mean()
Matplotlib is a basic plotting library in Python.
import matplotlib.pyplot as plt
# Creating a simple line plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Plot')
plt.show()
Seaborn is built on top of Matplotlib and provides a high - level interface for creating attractive statistical graphics.
import seaborn as sns
# Loading a sample dataset
tips = sns.load_dataset('tips')
# Creating a scatter plot
sns.scatterplot(x='total_bill', y='tip', data=tips)
plt.show()
Scikit - learn is a popular library for machine learning in Python.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Loading the iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Creating a KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)
# Training the model
knn.fit(X_train, y_train)
# Making predictions
predictions = knn.predict(X_test)
# Evaluating the model
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
a
, use age
if you are storing a person’s age.try - except
blocks to handle potential errors gracefully.try:
result = 10 / 0
except ZeroDivisionError:
print("Cannot divide by zero!")
Python is a powerful and versatile language for data science. With its rich ecosystem of libraries, it provides all the necessary tools for data manipulation, visualization, and machine learning. By following the fundamental concepts, usage methods, common practices, and best practices outlined in this tutorial, beginners can start their journey in data science with Python. As you gain more experience, you can explore more advanced topics such as deep learning, big data processing, and natural language processing.