Avocado Prices Analysis – Real-World Data Manipulation with Pandas 2026
The famous Avocado dataset is an excellent example for practicing real-world data manipulation. It contains weekly avocado prices and volumes across different regions and types (conventional vs organic) in the US from 2015 to 2026. In this article, we’ll explore practical Pandas techniques using this dataset.
1. Loading and Initial Exploration
import pandas as pd
# Load the avocado dataset
df = pd.read_csv("avocado.csv", parse_dates=["Date"])
print(df.shape)
print(df.info())
# Quick overview
print(df.head())
# Basic statistics
print(df.describe())
2. Common Data Manipulation Tasks
# Average price by type (organic vs conventional)
avg_price_by_type = df.groupby("type")["AveragePrice"].mean()
print(avg_price_by_type)
# Average price by region
avg_price_by_region = df.groupby("region")["AveragePrice"].mean().sort_values(ascending=False)
print(avg_price_by_region.head(10))
# Total volume by year
df["Year"] = df["Date"].dt.year
yearly_volume = df.groupby("Year")["Total Volume"].sum()
print(yearly_volume)
3. Advanced Analysis with Pivot Tables
# Average price by region and type
pivot = pd.pivot_table(
df,
values="AveragePrice",
index="region",
columns="type",
aggfunc="mean"
).round(2)
print(pivot.head(10))
# Monthly average price trend for conventional avocados
monthly = df[df["type"] == "conventional"].resample("M", on="Date")["AveragePrice"].mean()
print(monthly.head())
4. Visualization Examples
import matplotlib.pyplot as plt
import seaborn as sns
# Price distribution by type
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x="type", y="AveragePrice")
plt.title("Avocado Price Distribution by Type")
plt.show()
# Average price trend over time
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="Date", y="AveragePrice", hue="type")
plt.title("Avocado Average Price Trend (Conventional vs Organic)")
plt.show()
Best Practices Demonstrated
- Using
parse_dateswhen loading date columns - Creating new columns from datetime (
.dt.year) - Using
groupby()with meaningful aggregations - Creating pivot tables for cross-comparisons
- Using Seaborn for insightful visualizations
Conclusion
The Avocado dataset is perfect for practicing real-world data manipulation skills. Through grouping, pivoting, time-based analysis, and visualization, you can extract valuable insights such as price differences between organic and conventional avocados, regional variations, and seasonal trends.
Next steps:
- Download the avocado dataset and try to answer these questions:
- Which region has the highest average avocado price?
- How has the price gap between organic and conventional avocados changed over time?
- When is the peak season for avocado sales?