Skip to main content
← All projects
L2CLI · finance · data· 10-16h total

Expense tracker CLI with CSV import + monthly reports

Tiny but interview-defensible CRUD project that teaches schema design, idempotent CSV import, deterministic rules engines, money math with Decimal, and PDF generation — all skills senior engineers reach for daily.

Resume bullet (when finished)

Built a SQLite-backed expense tracker with CSV bank-statement import, deterministic category-rules engine, monthly reports as PDF, and 87% pytest coverage.

Locked tech stack

No "choose your language" — analysis paralysis kills completion. Follow the stack to the letter on your first build.

Python 3.12clickSQLitepandasReportLabpytest

Milestones (5 · ~13h)

  1. M1~2h

    SQLite schema + first migration

    Tables `accounts`, `transactions`, `rules`. Migration via plain SQL files (no ORM).

    CHECK BEFORE MOVING ON:

    • Why SQLite, not Postgres, for this project?
    • Why migrations as plain SQL rather than Alembic?
    $ git commit -m "feat(db): initial SQLite schema + migrations"
  2. M2~3h

    CSV import — bank format

    `expenses import statement.csv --account checking` is idempotent (re-running doesn't duplicate). Uses Decimal for amounts.

    CHECK BEFORE MOVING ON:

    • Why never store money as float?
    • How do you make CSV import idempotent without a deduplication ID column?
    $ git commit -m "feat(import): idempotent CSV ingest with Decimal amounts"
  3. M3~2h

    Rules-based categorisation

    `rules add 'STARBUCKS' coffee`. On import, every uncategorised row matching the substring gets the category. First match wins, deterministic order.

    CHECK BEFORE MOVING ON:

    • Why deterministic order matters for a categorisation engine.
    • When would you reach for ML instead of rules — and why don't you here?
    $ git commit -m "feat(rules): substring-match categorisation engine"
  4. M4~4h

    Monthly PDF report

    `expenses report --month 2026-05 --pdf out.pdf` generates a PDF with totals by category, top-5 expenses, and a sparkline.

    CHECK BEFORE MOVING ON:

    • Why ReportLab instead of WeasyPrint or wkhtmltopdf?
    • What's the right way to test PDF generation without comparing bytes?
    $ git commit -m "feat(report): monthly PDF via ReportLab"
  5. M5~2h

    87% test coverage + CI

    pytest + coverage gate at 85% in GitHub Actions. Tests use an in-memory SQLite for speed.

    CHECK BEFORE MOVING ON:

    • Why a coverage *gate* in CI, not just a *report*?
    • What's wrong with chasing 100% coverage?
    $ git commit -m "test: 87% coverage + CI gate"

60-second demo storyboard

What you say in the recruiter screen when they ask "tell me about your latest project." Practice it out loud.

  1. 0-5s: 'CLI expense tracker — CSV in, PDF report out, deterministic rules engine.'
  2. 5-25s: live demo — import a CSV, add a rule, generate the monthly PDF.
  3. 25-45s: show the rules engine code + 'why first-match-wins matters'.
  4. 45-60s: pytest --cov; 87% in 1.4s.

STAR talking points for behavioral round

STAR — CORRECTNESS

Situation: first version stored amounts as float — `0.1 + 0.2 != 0.3` showed up in tests. Task: fix money math. Action: switched to Decimal everywhere, added a custom SQLite adapter, and made the schema TEXT for amounts. Result: arithmetic exact to the cent and tests went green permanently.

STAR — IDEMPOTENCY

Situation: re-importing a CSV duplicated rows. Task: make import safe to re-run. Action: introduced a synthetic dedup key — sha256 of (account, date, amount, raw_description) — added a unique constraint. Result: rerunning yesterday's import is a no-op, which is exactly what users expect.

Production references — how grown-up systems do this

YNAB

YNAB's import dedup logic is publicly described — same hash-of-row pattern.

ReportLab

ReportLab is the de-facto Python PDF library; their docs cover Platypus (flowable layouts) which is what monthly reports want.

Self-review rubric (before you claim done)

Correctness

  • Money math via Decimal, not float.
  • CSV re-import is a no-op.
  • Reports total to the cent.
  • Rules apply deterministically.

Code quality

  • DB code separated from CLI layer.
  • Migrations live in a versioned directory.
  • Decimal serialisation centralised, not scattered.

Testing

  • ≥85% line coverage on the `expenses/` package.
  • Migration tests verify forward + reverse.
  • PDF tests verify structure, not byte-exact output.

Docs

  • README explains the dedup-hash design.
  • Sample CSV + sample PDF committed for grading.
  • One-paragraph rationale: 'why SQLite for this'.

✱ AI code review

Get a senior-style review before you call it done

Push your finished work to GitHub, open a PR, paste the PR URL below. Claude reviews the diff against this project's rubric and replies with strengths, must-fix items, and one teachable principle.

Tick the rubric items honestly, write the README, push to GitHub, get the AI review above. Once it's clean, email support@learnpython.academy with the repo link — we feature the best ones on /success-stories.

Need Python first? Start Foundations →