GroupBy and Aggregation Functions
One of the most powerful capabilities in Pandas is grouping data and performing calculations on each subset.
This technique helps uncover patterns across categories — for example, sales by region, average scores per class, or revenue by product.
The groupby() method splits your data into groups based on the values in one or more columns.
Once grouped, you can apply aggregation functions such as:
sum(): total value per groupmean(): average value per groupcount(): number of rows per groupmax(): highest value per groupmin(): lowest value per group
GroupBy example
Imagine you have a dataset of sales transactions from different cities. You might want to:
- Calculate total sales for each city
- Find the average transaction amount per store
- Count how many transactions happened in each region
Pandas makes this easy. For example, to calculate total sales per city, you can write:
import pandas as pd df = pd.DataFrame({ "City": ["New York", "New York", "Los Angeles", "Los Angeles", "Chicago", "Chicago"], "Sales": [100000, 150000, 200000, 250000, 300000, 350000] }) df.groupby("City")["Sales"].sum() # Output: # City # New York 250000 # Los Angeles 450000 # Chicago 650000
Syntax Overview
Here's a simple pattern:
df.groupby("ColumnName")["TargetColumn"].agg("aggregation_function")
You can also use .agg() to apply multiple aggregation functions at once for richer summaries.
For example, to calculate the sum, mean, and count of the sales for each category, you can write:
df = pd.DataFrame({ "Category": ["A", "A", "B", "B", "C", "C"], "Amount": [100, 200, 300, 400, 500, 600] }) df.groupby("Category")["Amount"].agg(["sum", "mean", "count"]) # Output: # sum mean count # Category # A 300 150 2 # B 700 350 2 # C 1100 550 2
The output shows that:
sum= total of all values in the groupmean= average valuecount= number of rows in the group
The categories (A, B, C) appear as index labels for clarity.
Using the groupby method in Pandas, you can apply aggregation functions like sum, mean, and count to grouped data.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help