Skip to main content

← All cheatsheets · Track →

📊Updated 2026-05

pandas cheatsheet — load, clean, group, join, plot

Hand-picked pandas patterns from the Data Science track. The 80% you actually reach for in real analytics work, not every API.

Load & inspect

Read CSV

Default loader; handles most messy files.

import pandas as pd
df = pd.read_csv("orders.csv")

Read Excel

Same, for spreadsheets.

df = pd.read_excel("orders.xlsx", sheet_name="Q1")

Inspect

First thing after loading.

df.head()
df.info()
df.describe()

Select columns

By name; list for many.

df["price"]
df[["name", "price"]]

Filter & transform

Filter rows

Boolean mask. Combine with & / |.

df[df["price"] > 100]
df[(df["price"] > 100) & (df["status"] == "paid")]

Add column

Vectorised — never loop.

df["price_with_tax"] = df["price"] * 1.2

.apply

Row-wise transform. Slower; use when vectorised won't do.

df["initials"] = df["name"].apply(lambda n: n[0])

Rename + drop

Cleanup before pipeline.

df = df.rename(columns={"old": "new"})
df = df.drop(columns=["unused"])

Tidy + missing

Drop NaN

Remove rows with missing values.

df.dropna(subset=["price"])

Fill NaN

Sensible defaults.

df["score"].fillna(0, inplace=False)

Cast types

Force a column to the right type.

df["price"] = df["price"].astype(float)

Parse dates

Convert string columns to datetime.

df["ts"] = pd.to_datetime(df["ts"])

Group & aggregate

groupby sum

Single metric per group.

df.groupby("country")["revenue"].sum()

groupby agg

Multiple metrics in one shot.

df.groupby("country").agg(
    revenue=("price", "sum"),
    orders=("price", "count"),
)

pivot table

Spreadsheet-style 2D aggregation.

df.pivot_table(index="country", columns="quarter", values="revenue", aggfunc="sum")

Join & combine

merge (SQL join)

Inner / left / outer joins.

pd.merge(orders, customers, on="customer_id", how="left")

concat

Stack two frames vertically or side-by-side.

pd.concat([df1, df2], ignore_index=True)

Time series & rolling

Set datetime index

Unlocks resample + rolling.

df = df.set_index("ts").sort_index()

Resample

Aggregate by time bucket.

df["revenue"].resample("M").sum()

Rolling window

Smooth + lag features.

df["revenue"].rolling(7).mean()

Out + plot

Save CSV

Default export.

df.to_csv("out.csv", index=False)

Save parquet

Compact, typed, fast to re-read.

df.to_parquet("out.parquet")

Quick plot

Get a feel for shape; matplotlib under the hood.

df["revenue"].plot(kind="line")

Want to actually learn these patterns, not just paste them? Open the pandas cheatsheet track — each snippet has a full lesson behind it.