Pandas offers two primary data structures: Series
and DataFrame
.
import pandas as pd
# Create a Series
data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)
# Create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)
Indexing in Pandas allows you to access specific rows and columns in a Series
or DataFrame
.
# Indexing in a DataFrame
print(df['Name']) # Select a column
print(df.loc[0]) # Select a row by label
print(df.iloc[1]) # Select a row by integer position
Pandas can read data from various file formats and write data back to them.
# Reading a CSV file
csv_df = pd.read_csv('data.csv')
# Writing a DataFrame to an Excel file
df.to_excel('output.xlsx', index=False)
Data cleaning is an essential step in data analysis. Pandas provides functions to handle duplicate values, incorrect data types, etc.
# Removing duplicate rows
df = df.drop_duplicates()
# Changing data types
df['Age'] = df['Age'].astype(float)
Grouping data and performing aggregations is a powerful feature in Pandas.
# Grouping by a column and calculating the mean
grouped = df.groupby('Name').mean()
print(grouped)
Missing values are common in real - world data. Pandas provides methods to handle them.
# Checking for missing values
print(df.isnull().sum())
# Filling missing values
df = df.fillna(0)
Sorting data helps in better understanding and analysis.
# Sorting a DataFrame by a column
sorted_df = df.sort_values(by='Age')
print(sorted_df)
When working with large datasets, memory optimization is crucial.
# Downcasting numerical columns to save memory
df['Age'] = pd.to_numeric(df['Age'], downcast='integer')
Chaining multiple Pandas operations together can make your code more concise and readable.
result = df.drop_duplicates().sort_values(by='Age').groupby('Name').mean()
print(result)
Pandas is a versatile and powerful library for data analysis in Python. It provides a wide range of data manipulation and analysis tools through its intuitive data structures and functions. By understanding the fundamental concepts, usage methods, common practices, and best practices, users can efficiently handle and analyze data, making it an indispensable tool in the data analysis toolkit.