AI Engineering cheatsheet — Claude API, tools, RAG, evals
AI Engineering patterns from the dedicated track. Focused on building real systems with Claude — not generic LLM trivia.
Claude API basics
First call
Single-turn message via the Python SDK.
from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print(resp.content[0].text)
System prompt
Set persona / rules once at the top.
client.messages.create(
model="claude-opus-4-6",
system="You are a senior Python tutor. Be terse.",
messages=[{"role": "user", "content": "What does GIL stand for?"}],
max_tokens=200,
)
Multi-turn
Conversation state lives in the messages list.
history = [
{"role": "user", "content": "Define recursion."},
{"role": "assistant", "content": "..."},
{"role": "user", "content": "Now show an example."},
]
resp = client.messages.create(model="claude-opus-4-6", messages=history, max_tokens=400)
Streaming
Render tokens as they arrive — better UX for long outputs.
with client.messages.stream(
model="claude-opus-4-6",
max_tokens=1024,
messages=[...],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Tool use
Define a tool
Tools are JSON schemas Claude can request.
tools = [{
"name": "get_weather",
"description": "Get current weather for a city.",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
}]
Tool-use loop
Keep calling until Claude returns end_turn without tool calls.
while True:
resp = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=tools,
messages=history,
)
history.append({"role": "assistant", "content": resp.content})
if resp.stop_reason != "tool_use":
break
tool_use = next(b for b in resp.content if b.type == "tool_use")
result = call_tool(tool_use.name, tool_use.input)
history.append({"role": "user", "content": [{
"type": "tool_result", "tool_use_id": tool_use.id, "content": result,
}]})
RAG essentials
Chunk + embed
Split docs into chunks, embed each.
from voyageai import Client
vo = Client()
chunks = split_into_chunks(text, target_chars=1500)
embs = vo.embed(chunks, model="voyage-3").embeddings
Cosine search
Top-K relevant chunks for a query.
import numpy as np
q = vo.embed([question], model="voyage-3").embeddings[0]
sims = embs @ np.array(q)
top = np.argsort(sims)[::-1][:5]
context = "\n\n".join(chunks[i] for i in top)
Anchor in prompt
Always cite what part of context informed the answer.
prompt = f"""Answer using ONLY the context below. Quote the relevant sentence verbatim.
Context:
{context}
Question: {question}"""
Evals
Golden eval suite
Frozen Q&A pairs scored each prompt iteration.
GOLD = [
{"q": "What is RAG?", "must_include": ["retrieval", "augment"]},
{"q": "How does tool use loop terminate?", "must_include": ["end_turn", "stop_reason"]},
]
for case in GOLD:
out = ask(case["q"])
case["pass"] = all(kw.lower() in out.lower() for kw in case["must_include"])
Score = pass rate
Track this number per prompt version.
score = sum(c["pass"] for c in GOLD) / len(GOLD)
print(f"Eval score: {score:.0%}")
Production
Prompt caching
Cache the stable prefix so repeated calls are 10× cheaper.
import time
for attempt in range(5):
try:
return client.messages.create(...)
except anthropic.APIStatusError as e:
if e.status_code in (429, 529):
time.sleep(2 ** attempt)
continue
raise