Is CodeMentor AI free?

The first 15 Python lessons are free with no signup and no credit card. After that, the 7-day Pro trial unlocks every track; cancel anytime. Pro is $12/month or $89/year.

Can I learn Python without installing anything?

Yes. Every lesson runs Python in your browser — Skulpt for lightweight lessons and Pyodide (full CPython) on the playground. No Anaconda, no pyenv, no terminal commands. Open the page and hit Run.

Is CodeMentor AI good for complete beginners?

Yes — the Foundations track starts with print('Hello, World!') and assumes zero programming background. The first 15 lessons are free to verify the difficulty curve matches you before any signup.

Does the AI tutor replace a human mentor?

It replaces 80% of "I'm stuck at 21:00 and Stack Overflow scared me" moments. You get hints calibrated to your code + a chat for follow-up questions. For project review and career advice the team also answers support@learnpython.academy directly.

Can I learn Python for an AI engineering job?

Yes. The AI Engineering track covers production patterns the US dev community uses in 2026 — Claude/LLM APIs, tool use, RAG, agent loops, prompt caching, evals, voice agents. Build production AI features end-to-end.

Are the courses available in languages other than English?

Yes — the platform UI and most lessons are translated into 18 languages including Ukrainian, Russian, Polish, German, French, Spanish, Portuguese, and more. Pick yours in the language switcher.

AI Engineering with Python in 2026: the stack production teams actually ship

If you opened a Python repo at any decent AI-using company today, here's what you'd find. Not what tutorials say, not what conference talks promise — what teams actually run in production. This is a 2026 snapshot.

The default backbone

\\\`python

from anthropic import Anthropic

client = Anthropic()

resp = client.messages.create(

model="claude-opus-4-6",

max_tokens=4096,

system="...",

messages=[{"role": "user", "content": "..."}],

thinking={"type": "adaptive"},

)

\\\`

The Anthropic SDK is the default for serious work. \claude-opus-4-6\ is the workhorse model — Sonnet 4.6 for cheaper bulk, Haiku 4.5 for triage. Adaptive thinking is on for anything non-trivial; the model decides how much to think per request and you don't need to guess a budget.

What changed from 2024: budget_tokens is deprecated. Don't pass it on 4.6 models — adaptive thinking replaces it. If a tutorial tells you to set \budget_tokens=10000\, that tutorial is two years old.

Prompt caching: the easiest 90% cost cut

Any prefix used more than once in 5 minutes should be cached:

\\\`python

resp = client.messages.create(

model="claude-opus-4-6",

system=[

{"type": "text", "text": LONG_SYSTEM_PROMPT,

"cache_control": {"type": "ephemeral"}},

messages=[...],

)

\\\`

Verify with \resp.usage.cache_read_input_tokens > 0\. If it's zero across repeated requests, you have a silent cache invalidator — usually a timestamp in the system prompt, dict-iteration order changing across runs, or a logger adding trailing whitespace. Audit by hashing the rendered (tools, system) blob across two consecutive identical requests; if the hashes differ you've found your bug.

The minimum cacheable prefix is ~1024 tokens. Below that, the cache silently doesn't engage.

Tool use: where most apps spend their time

Most AI apps in 2026 aren't chat apps. They're agents that call your tools to do real work. The shape:

\\\`python

@beta_tool # SDK helper

def lookup_customer(customer_id: str) -> dict:

"""Look up a customer by id. Returns full profile."""

return db.customers.get(customer_id)

resp = client.beta.messages.create(

model="claude-opus-4-6",

tools=[lookup_customer, refund, escalate_to_human],

messages=[...],

)

\\\`

The SDK's tool runner handles the agent loop — calls the function, sends the result back, repeats until Claude is done. For maximum control you can write the loop manually, but for 90% of apps the runner is what you want.

Common mistake: treating tools like RPC. A tool's docstring is the prompt. Write it like a senior engineer explaining the API to a smart junior — what it does, what each param means, what edge cases return. Sloppy docstrings = Claude calling the wrong tool with the wrong args.

RAG: when retrieval helps, when it doesn't

The 2026 RAG recipe most teams converge on:

1. Chunk at 400 tokens, 80-token overlap. PDFs need layout-aware splitting; plain text doesn't.

2. Embed with voyage-3 via Anthropic Batches API (50% discount, 24h SLA — fine for offline indexing).

3. Store in pgvector — works at 10M chunks on a single Postgres, beats most dedicated vector DBs in ops simplicity.

4. Retrieve BM25 top-50 + dense top-50, fuse via Reciprocal Rank Fusion (RRF, k=60), rerank top-10 with a cross-encoder, return top-5 to the LLM.

5. MUST filter on tenant_id server-side if you're multi-tenant. One forgotten filter = one cross-tenant leak = breach.

When RAG hurts: if the answer needs cross-document synthesis, RAG retrieves chunks independently and the LLM can't see the relationships. For those cases, 1M context windows + prompt caching often beat RAG — and you don't get the "lost in the middle" problem because Claude 4.6 attention is good across the full window.

Files API: chat with your document

When the same user asks 10 questions about the same 50-page PDF, you don't want to re-upload it 10 times.

\\\`python

file = client.beta.files.create(

file=open("contract.pdf", "rb"),

purpose="user_data",

)

Reference by file_id in subsequent messages

resp = client.beta.messages.create(

model="claude-opus-4-6",

messages=[{"role": "user", "content": [

{"type": "document", "source": {"type": "file", "file_id": file.id}},

{"type": "text", "text": "Summarize section 3."},

]}],

)

\\\`

Pairs perfectly with prompt caching — the document reference + system + glossary all become a single cacheable prefix. 5-question follow-up session: pay full price once, 10% on follow-ups.

Streaming: not optional past 1 second

If a request takes more than 1 second of perceived latency, stream it:

\\\`python

with client.messages.stream(...) as stream:

for text in stream.text_stream:

yield_to_user(text)

final = stream.get_final_message()

\\\`

Use \get_final_message()\ to get the same shape as a non-streamed response — your downstream code doesn't need to branch on streaming vs not. The helper handles text + tool blocks interleaved correctly.

Server-Sent Events is the wire format. For voice agents, time-to-first-token is the perceived-quality metric. Streaming + prompt caching + low effort takes Claude TTFT from ~2s to <700ms — the gap between "feels alive" and "feels broken."

Effort + adaptive thinking: cost knob you can actually tune

\\\`python

client.messages.create(

model="claude-opus-4-6",

output_config={"effort": "medium"},

thinking={"type": "adaptive"},

...

)

\\\`

Effort levels: low / medium / high / max. Default is high. medium is often the sweet spot — quality usually unchanged, 30-50% fewer tokens. Run your eval suite against effort=high vs effort=medium; if scores match, ship medium and save the money.

\max\ is Opus-only and reserved for correctness-critical paths — code review, legal doc analysis, escalated support.

Multi-agent: fan-out + fan-in

The pattern that comes up over and over:

\\\`python

Planner: 1 call, Sonnet, deep thinking

plan = planner.run(user_request)

Workers: N parallel, Haiku, low effort

results = await asyncio.gather(*[

worker.run(subtask, effort="low") for subtask in plan.subtasks

])

Synthesizer: 1 call, Sonnet, mid effort

final = synthesizer.run(user_request, results)

\\\`

Use cases: process 500 customer-support tickets, summarize 50 source documents, write 100 product descriptions. The planner picks the structure (worth Sonnet's quality), Haiku workers do the parallelizable work (cheap, fast), Sonnet synthesizes (worth the quality again).

Guards: \max_iterations\ on each worker, total \max_tokens\ per request, \timeout\ on the whole orchestration. One stuck worker shouldn't melt the budget.

Evals: the only thing that prevents quality drift

"We tweaked the prompt and it feels better" is how senior teams ship regressions. The fix is a versioned eval suite gated in CI:

1. Golden set — 50-300 labeled examples covering happy + adversarial cases.

2. Metric — exact match for classification, embedding similarity for paraphrase, LLM-as-judge for open-ended (calibrated against humans for a subset).

3. Run on every PR. Block merges that regress by >2% without explicit override.

4. Track over time. Score should be flat or rising; flat is fine.

Without this, prompts silently regress over months. Tutorials skip it because it's boring infrastructure. The teams that ship most reliably do nothing else differently — they just actually have evals.

Compaction: agents that run for hours

Beta feature on 4.6 models — set the \compact-2026-01-12\ beta header and the API summarizes earlier context when total tokens approach 150K. Critical: append the full \response.content\ (not just the text string) back into your messages. Compaction blocks are opaque server-side markers; strip them and the next turn re-sends the uncompacted history and overflows context.

With compaction on, a research agent or coding agent can run for 500+ turns without you implementing manual summarization. Without it, you implement summarization yourself and it's never as good as Claude's.

What we left out (and why)

LangChain. Many teams have removed it. The SDK + tool runner does the same thing with less indirection.
OpenAI compatibility shims. Either you're an Anthropic shop or you're not. Mixed wrappers paper over real differences (Anthropic content blocks vs OpenAI strings, tool result shape, etc) and you find out the hard way.
Custom vector DBs. Pinecone, Weaviate, Qdrant are fine but pgvector is enough for most teams up to ~50M vectors and the ops story is "you already run Postgres."
Self-hosted models. Llama 4 is great. But the price/latency/quality/eval-ops total is hard to beat with Claude Sonnet at $3/$15.

The full stack, in one diagram

| Layer | Choice |

|---|---|

| Model routing | Haiku 4.5 (triage) → Sonnet 4.6 (bulk) → Opus 4.6 (hard) |

| Thinking | adaptive (no budget_tokens) |

| Effort | medium default, max for correctness-critical |

| Caching | cache_control at end of system prompt |

| Tools | SDK tool runner, well-documented signatures |

| RAG | pgvector + BM25 + RRF + cross-encoder rerank |

| Documents | Files API + prompt caching |

| Streaming | always, with get_final_message() |

| Multi-step | fan-out planner + Haiku workers + Sonnet synthesizer |

| Long sessions | compaction (beta) |

| Quality | versioned eval suite in CI |

| Offline jobs | Batches API (50% discount, 24h SLA) |

| Voice | STT + streaming Claude + TTS, <700ms TTFT |

Want to learn this end-to-end?

Our AI Engineering track covers 100 lessons across 6 modules — API fundamentals, tool use, RAG, agent loops, production AI, and the frontier topics above (vision, Files API, Skills API, Batches API, voice agents, multi-agent orchestration). Capstone is a multi-modal customer-support agent with all the production constraints in this article. First 15 lessons free, no signup.

AI Engineering with Python in 2026: the stack production teams actually ship

The default backbone

Prompt caching: the easiest 90% cost cut

Tool use: where most apps spend their time

RAG: when retrieval helps, when it doesn't

Files API: chat with your document

Reference by file_id in subsequent messages

Streaming: not optional past 1 second

Effort + adaptive thinking: cost knob you can actually tune

Multi-agent: fan-out + fan-in

Planner: 1 call, Sonnet, deep thinking

Workers: N parallel, Haiku, low effort

Synthesizer: 1 call, Sonnet, mid effort

Evals: the only thing that prevents quality drift

Compaction: agents that run for hours

What we left out (and why)

The full stack, in one diagram

Want to learn this end-to-end?

related_title

Get one Python lesson + one career idea every Friday