Skip to main content

← All cheatsheets · Track →

🤖Updated 2026-05

AI Engineering cheatsheet — Claude API, tools, RAG, evals

AI Engineering patterns from the dedicated track. Focused on building real systems with Claude — not generic LLM trivia.

Claude API basics

First call

Single-turn message via the Python SDK.

from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print(resp.content[0].text)

System prompt

Set persona / rules once at the top.

client.messages.create(
    model="claude-opus-4-6",
    system="You are a senior Python tutor. Be terse.",
    messages=[{"role": "user", "content": "What does GIL stand for?"}],
    max_tokens=200,
)

Multi-turn

Conversation state lives in the messages list.

history = [
    {"role": "user",      "content": "Define recursion."},
    {"role": "assistant", "content": "..."},
    {"role": "user",      "content": "Now show an example."},
]
resp = client.messages.create(model="claude-opus-4-6", messages=history, max_tokens=400)

Streaming

Render tokens as they arrive — better UX for long outputs.

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[...],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Tool use

Define a tool

Tools are JSON schemas Claude can request.

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a city.",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]

Tool-use loop

Keep calling until Claude returns end_turn without tool calls.

while True:
    resp = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        tools=tools,
        messages=history,
    )
    history.append({"role": "assistant", "content": resp.content})
    if resp.stop_reason != "tool_use":
        break
    tool_use = next(b for b in resp.content if b.type == "tool_use")
    result = call_tool(tool_use.name, tool_use.input)
    history.append({"role": "user", "content": [{
        "type": "tool_result", "tool_use_id": tool_use.id, "content": result,
    }]})

RAG essentials

Chunk + embed

Split docs into chunks, embed each.

from voyageai import Client
vo = Client()
chunks = split_into_chunks(text, target_chars=1500)
embs = vo.embed(chunks, model="voyage-3").embeddings

Cosine search

Top-K relevant chunks for a query.

import numpy as np
q = vo.embed([question], model="voyage-3").embeddings[0]
sims = embs @ np.array(q)
top = np.argsort(sims)[::-1][:5]
context = "\n\n".join(chunks[i] for i in top)

Anchor in prompt

Always cite what part of context informed the answer.

prompt = f"""Answer using ONLY the context below. Quote the relevant sentence verbatim.

Context:
{context}

Question: {question}"""

Evals

Golden eval suite

Frozen Q&A pairs scored each prompt iteration.

GOLD = [
    {"q": "What is RAG?", "must_include": ["retrieval", "augment"]},
    {"q": "How does tool use loop terminate?", "must_include": ["end_turn", "stop_reason"]},
]
for case in GOLD:
    out = ask(case["q"])
    case["pass"] = all(kw.lower() in out.lower() for kw in case["must_include"])

Score = pass rate

Track this number per prompt version.

score = sum(c["pass"] for c in GOLD) / len(GOLD)
print(f"Eval score: {score:.0%}")

Production

Prompt caching

Cache the stable prefix so repeated calls are 10× cheaper.

client.messages.create(
    model="claude-opus-4-6",
    system=[{
        "type": "text",
        "text": LONG_SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"},
    }],
    messages=[...],
    max_tokens=1024,
)

Retry on overload

Exponential backoff for 429s + 529s.

import time
for attempt in range(5):
    try:
        return client.messages.create(...)
    except anthropic.APIStatusError as e:
        if e.status_code in (429, 529):
            time.sleep(2 ** attempt)
            continue
        raise

Want to actually learn these patterns, not just paste them? Open the AI Engineering cheatsheet track — each snippet has a full lesson behind it.