Is CodeMentor AI free?

The first 15 Python lessons are free with no signup and no credit card. After that, the 7-day Pro trial unlocks every track; cancel anytime. Pro is $12/month or $89/year.

Can I learn Python without installing anything?

Yes. Every lesson runs Python in your browser — Skulpt for lightweight lessons and Pyodide (full CPython) on the playground. No Anaconda, no pyenv, no terminal commands. Open the page and hit Run.

Is CodeMentor AI good for complete beginners?

Yes — the Foundations track starts with print('Hello, World!') and assumes zero programming background. The first 15 lessons are free to verify the difficulty curve matches you before any signup.

Does the AI tutor replace a human mentor?

It replaces 80% of "I'm stuck at 21:00 and Stack Overflow scared me" moments. You get hints calibrated to your code + a chat for follow-up questions. For project review and career advice the team also answers support@learnpython.academy directly.

Can I learn Python for an AI engineering job?

Yes. The AI Engineering track covers production patterns the US dev community uses in 2026 — Claude/LLM APIs, tool use, RAG, agent loops, prompt caching, evals, voice agents. Build production AI features end-to-end.

Are the courses available in languages other than English?

Yes — the platform UI and most lessons are translated into 18 languages including Ukrainian, Russian, Polish, German, French, Spanish, Portuguese, and more. Pick yours in the language switcher.

← All projects

L3AI Engineering · Developer tools· 18-30h total

GitHub PR review bot powered by Claude

PR review is one of the two LLM use-cases (alongside support) that companies actively spend on in 2026. Building one teaches webhook handling, tool use, prompt iteration against real diffs, and the messy reality of 'is this comment useful?' evals.

Resume bullet (when finished)

“Built a GitHub App that posts Claude-authored PR review comments inline, calling tools to fetch diffs and file context; ran on 120 internal PRs with a 71% reviewer-thumbs-up rate.”

Locked tech stack

No "choose your language" — analysis paralysis kills completion. Follow the stack to the letter on your first build.

Python 3.12FastAPIAnthropic SDKPyGithubPostgreSQLDockerGitHub Webhooks

Milestones (6 · ~25h)

M1~3h
GitHub App + webhook receiver
FastAPI endpoint verifies HMAC signature, parses `pull_request` events.
CHECK BEFORE MOVING ON:
- Why HMAC verification first, before anything else?
- What's the difference between GitHub Apps and OAuth Apps for this use case?
$ git commit -m "feat(bot): webhook receiver with HMAC verification"
M2~3h
Fetch PR diff + context
Use PyGithub to fetch the unified diff + the files touched + the README/CONTRIBUTING for repo context.
CHECK BEFORE MOVING ON:
- Why include CONTRIBUTING.md in context, not just the diff?
- What's the size limit you should enforce and why?
$ git commit -m "feat(context): diff + repo context fetcher"
M3~6h
Claude tool use loop
Two tools: `fetch_file(path)` and `search_code(query)`. Claude can call them while drafting comments.
CHECK BEFORE MOVING ON:
- Why give Claude tools rather than dumping the whole repo into context?
- What's the failure mode if a tool returns 30k tokens?
$ git commit -m "feat(claude): tool use loop with fetch/search"
M4~4h
Post inline review comments
Each Claude finding becomes an inline GitHub review comment via the PullsAPI review endpoint.
CHECK BEFORE MOVING ON:
- Inline vs general review comments — when each?
- What happens if Claude points at a line that no longer exists?
$ git commit -m "feat(review): inline review comments"
M5~5h
Golden eval suite
30 fixture PRs with hand-graded expected comments. Suite gives a 0–1 score per PR. CI runs it on every change to the prompt.
CHECK BEFORE MOVING ON:
- What does 'golden eval' mean for a non-deterministic LLM?
- Why does the prompt need a CI gate?
$ git commit -m "test(eval): 30-PR golden eval suite"
M6~4h
Feedback loop
Reviewers can 👍 / 👎 the bot's comments. Aggregate score lands in a Postgres table read by the eval suite.
CHECK BEFORE MOVING ON:
- Why save the reaction, not just the count?
- What's the right place to surface the score back to the prompt author?
$ git commit -m "feat: thumbs-up/down feedback loop + metrics"

60-second demo storyboard

What you say in the recruiter screen when they ask "tell me about your latest project." Practice it out loud.

0-5s: 'GitHub PR bot — Claude reads the diff, calls tools, posts inline comments.'
5-25s: open a real PR, watch the bot leave 4 inline comments live.
25-45s: show the golden eval suite + CI run gating prompt changes.
45-60s: 71% thumbs-up rate, 120 PRs.

STAR talking points for behavioral round

STAR — EVAL DISCIPLINE

Situation: every prompt tweak felt right in spot-checks but I had no idea if I was actually improving. Task: build an offline eval. Action: assembled 30 fixture PRs, hand-graded the ideal comment set, scored each prompt version. Result: I caught a regression on iteration #14 that would have shipped silently — the 'helpful'-feeling prompt scored 0.42 vs the previous 0.61.

STAR — TOOL DESIGN

Situation: Claude was guessing at file contents instead of asking. Task: make 'ask' easier than 'guess'. Action: gave it `fetch_file` + `search_code` and explicitly listed examples of when to use them. Result: hallucinated-context comments dropped from 18% to under 4%.

Production references — how grown-up systems do this

GitHub Copilot →

Copilot for PRs is the consumer-facing example of this exact pattern — the public docs describe the tool-use approach.

Anthropic →

Anthropic's tool use cookbook is the canonical reference for safe agentic loops.

Greptile →

Greptile (YC W24) is the leading 'AI code review' product — their public posts on eval design are required reading.

Self-review rubric (before you claim done)

Correctness

Webhook signature verification mandatory.
Reviews post inline on the right line/file.
Bot never reviews its own commits.
Failure modes (rate limit, expired token) handled gracefully.

Code quality

Prompt lives in a versioned file, not a string in code.
Tool definitions and handlers separated.
Async throughout — webhooks return in <2s, work done in background.
Eval suite is committed and reproducible.

Testing

Golden eval suite committed.
Webhook handler tested with fixture payloads.
Integration test exercises the full diff→tool-loop→comment path.

Docs

README explains the eval methodology.
Architecture diagram with the tool-use loop.
Section: 'how would you adapt this for a private GHE instance?'

✱ AI code review

Get a senior-style review before you call it done

Push your finished work to GitHub, open a PR, paste the PR URL below. Claude reviews the diff against this project's rubric and replies with strengths, must-fix items, and one teachable principle.

Tick the rubric items honestly, write the README, push to GitHub, get the AI review above. Once it's clean, email support@learnpython.academy with the repo link — we feature the best ones on /success-stories.

Need Python first? Start Foundations →