AI Engineering Patterns · the catalogue
Vol. I · No. 01 · May 2026 · CC BY 4.0

Patterns for building with agents.

Seven sections, twenty-six pages. The durable lessons from building agentic systems — how to structure context, when to trust which model, how multiple agents coordinate without falling over, and how to instrument the whole thing so you can tell when it's broken. Nothing here is theoretical.

— start here

5 minutes to a working mental model.

"Eight sections is a lot. The quickstart routes you to the three pages worth reading first."

Not sure where to start? The QUICKSTART gives you four reading paths based on who you are: engineer, engineering leader, platform builder, or focused on safety. Each path is exactly three pages, ~15 minutes, calibrated to give you a working model of what's happening before you commit to deeper reading.

Need a term defined? See the GLOSSARY — single-page lookup for every load-bearing term used across the repo.

— for leaders

A leader's memo.

"The bottleneck moved from typing to specifying. Most teams have not yet adjusted."

A one-page primer for managers, directors, and execs who are not coding day-to-day but need to make decisions about tooling, hiring, and strategy. Five questions to ask your teams. Where the leverage is. Where the risk is. Read the memo →

Or read it as markdown.

— what's new in v0.5

Seven new pages from unmined source material.

"The repo's strength is being focused. The right move was surgical additions, not a content dump."

Multi-agent (sec 03): Reflection Loops (generate → critique → revise) · Framework Selection (LangGraph / CrewAI / AutoGen decision matrix).

Automation (sec 04): Prompt Injection — the structural vulnerability every agentic system has. Required reading before letting an agent run with real authority.

Tools (sec 07): Tool Description as Prompt — the most load-bearing sub-pattern of tool design, promoted to its own page.

Platform (sec 02): Cross-Platform Parity — Mac / Linux / Windows / WSL discipline.

Resources (sec 08): Shell and Terminal Tools · Cloud and Deployment Tools — deep references for the tooling that hosts agentic work.

Earlier additions (v0.4): 6 Mermaid diagrams + 5 war stories. See CHANGELOG for the full history.

— 01

Concepts — the vocabulary

If you've sat in a meeting and people were saying "MCP" and "context window" and you weren't sure exactly what they meant, this is the section that pins the words down.

01.1
What is an Agent
The loop · Tools · Where they fail
An LLM running in a loop with tools. Everything sophisticated comes from context, tool design, and guardrails — not the model itself.
01.2
Context and Memory
What an agent "remembers"
Context windows, persistent memory, the difference between what's in the prompt and what's stored. Why agents "forget" things.
01.3
MCP Explained
The Model Context Protocol
USB for agents. A standard way for any agent to plug into any tool. What it is, what it doesn't solve, and why it matters at the platform level.
01.4
Trust and Specs
Calibration · Centaur model
How much autonomy to give agents, where specs sit in the workflow, and why "humans judge, agents build" is the right default.
— 02

Platform — the boring infrastructure that makes agents productive

Setting up a project so agents can be productive in it. Teams that skip this end up with agents that flail — they don't know how to run tests, find the build script, or follow conventions.

02.1
The Agentic OS Spine
CLAUDE.md · AGENTS.md · specs/ · .mcp.json
The small set of files and conventions that turn a regular project into one any agent can navigate productively. Includes a file-layout diagram.
02.2
Dev Environment
Abstract patterns
The patterns that make a development environment productive for agentic work. Concrete worked examples live in section 08.
02.3
Project Template
Scaffolding for new projects
A starting structure for new projects that includes the spine from day one. Less to retrofit later.
02.4
Cross-Platform Parity
Mac · Linux · Windows · WSL
The discipline that prevents "works on my machine" failures at the OS layer. Line endings, paths, fonts, shells, and what WSL is good for.
— 03

Multi-agent workflows — where the real leverage lives

Single agents have structural limits — no internal critic, single perspective, sequential thinking. Multi-agent workflows close all three. The single highest-leverage move in agentic engineering.

03.1
Plan / Execute / Judge
The workhorse pattern
Three roles, three agent runs. Includes a flow diagram, a real example prompt set, and a war story about scaling this pattern to a 150-book extraction job.
03.2
Reflection Loops
Generate → critique → revise
The single most under-applied agentic pattern. A small structural change that produces noticeably better output than single-pass generation.
03.3
Spec-Driven
Specs as the contract
Why specs are the connective tissue of multi-agent work, and what a useful spec actually looks like.
03.4
Multi-LLM Review
Same task across providers
Run the same critique across two providers. Differences highlight risks no single model would catch.
03.5
Agent Council
Personas debating decisions
For design decisions and architectural questions where multiple perspectives surface what one agent would miss.
03.6
Handoff
Long tasks across sessions
The discipline that makes multi-session and multi-agent work survive context loss.
03.7
Framework Selection
LangGraph · CrewAI · AutoGen · LangChain
Decision matrix for picking a multi-agent framework. The single most important question: do you need cycles?
03.8
Failure Modes
What goes wrong
The recurring failure modes of multi-agent setups, and how to recognize them before they ship.
— 04

Automation — letting agents run unattended

Where you have to think hard about trust and blast radius. Levels of autonomy, sandboxing, overnight runs, and the failure modes of leaving agents on by themselves.

04.1
Autonomous Control Levels
L0 to L7 spectrum
Eight levels of autonomy from "suggest only" to "fully autonomous." Includes a color-coded spectrum diagram and a war story about a runaway extraction caught by a per-task budget.
04.2
Sandbox Environments
Where unattended runs should live
Containment patterns for letting agents run with real authority without putting your environment at risk.
04.3
Overnight Runs
Discipline for letting it cook
When to use overnight runs, what to capture, what kills the run cleanly when something's gone wrong, and what makes the morning review fast.
04.4
Prompt Injection
The structural vulnerability
Direct vs indirect injection, attack vectors, defenses that work, defenses that don't even though they sound like they should. Required reading before letting an agent run with real authority.
— 05

Local LLMs — when running models on your own hardware makes sense

The capability gap between local and frontier is real, and shrinking, but not closed. When local is the right call, when it isn't, and how to evaluate which models actually work for your tasks.

05.1
Model Selection
How to pick a local model for a task
Matching open-model capability to task needs without paying the frontier-tier cost. The criteria that actually matter beyond the leaderboard rank.
05.2
Benchmarking
Your evals over vendor numbers
Vendor benchmarks age in days. How to design and run evals that predict real performance on your tasks.
— 06

Token efficiency — controlling cost and latency without sacrificing quality

A 5–10x cost reduction is achievable with no quality loss if you apply prompt caching, model tiering, and context discipline correctly. Most teams capture none of it.

06.1
Prompt Caching
Single biggest cost lever
Stable prefix first; variable content last. Cuts input cost 70-90% on cached calls. Most teams structure prompts to defeat the cache without realizing it.
06.2
Model Tiering
Match capability to task
Frontier for hard reasoning, mid-tier for execution, small for batch. Defaulting to the best model on every call is the most common and most expensive mistake.
06.3
Context Management
Three layers · Caching · Compaction · Retrieval
Includes a three-layer diagram and a war story about a 100× compression ratio on a real bash cookbook synthesis.
06.4
Cost Estimation
Budgeting and forecasting
The unit economics of agentic work, what "expensive" actually looks like, and the alerts that protect against runaway loops.
— 07

Tools and MCP — the plumbing

Tool design is where most agentic workflows succeed or fail. The model is downstream of the tools you give it. Hooks add guardrails the model can't forget. Skills package recurring expertise.

07.1
MCP Server Patterns
Tool design discipline
The longest read in this section because tool design has the most subtle craft. Naming, scoping, descriptions, errors — the choices that determine whether an agent uses a tool well or badly.
07.2
Tool Description as Prompt
The most load-bearing sub-pattern
The model decides whether to call your tool by reading the description. Tool descriptions are prompt content, not documentation. The single highest-ROI sub-pattern of tool design.
07.3
Hooks
Event-driven harness automation
Pre/post tool-use guardrails, session-start context loading, the discipline that should not depend on the model remembering.
07.4
Skills
Discoverable, reusable capabilities
The slot between a tool and a workflow. How organizational knowledge accrues into the agent over time.
07.5
Observability
Tracing · Metrics · Audit
Three layers: per-session traces, aggregate metrics, audit trail. Includes a war story about a 30-line audit script that paid back across the whole repo.
— 08

Resources — the practical apparatus

What dev environment to set up, what to read, which tools to actually pick. Opinionated, dated, and re-evaluated annually — the picks change but the criteria hold up.

08.1
Dev Environment
Six-layer reference setup
A productive cross-platform setup for agentic work, with links to working public examples (terminal-stack, dotfiles, macOS counterpart, fresh-machine env synth).
08.2
Shell and Terminal Tools
Modern Unix toolchain · Multiplexers · fzf widgets
Deep reference for shell/terminal tooling for AI engineering. Specific picks per category, security-first scripting headers, agent-friendly output patterns, anti-patterns to avoid.
08.3
Cloud and Deployment Tools
VMs · VPN · Reverse proxy · Containers · Serverless
Patterns for hosting agents and MCP servers cost-effectively. Always-free gateway, sleeping compute, VPN-only access, container orchestration, audit/observability.
08.4
Recommended Reading
Curated by section topic
A curated reading list mapped to sections 01-07, distilled from a personal reference library of ~150 technical books. Plus shell/devops and engineering leadership picks.
08.5
Tool Evaluations
Frameworks · Harnesses · Runtimes
Diplomatic but technically direct assessments of agent harnesses (Claude Code, Cursor, Aider), multi-agent frameworks (LangGraph, CrewAI, AutoGen), local LLM runtimes (Ollama, vLLM), and token-saving utilities (rtk-ai, caveman).
08.6
Ecosystem and Plugins
Reconnaissance map
Notable open-source repos worth knowing about: Everything Claude Code, Graphify, Graphiti/Mem0/Letta, observability platforms, MCP server catalogs, structured-output libraries, DSPy.
— for non-github readers

Send a quick note.

Found something useful? Disagree with a recommendation? Have a war story from your own practice? This form goes straight to the maintainer.

If you have a GitHub account, opening an issue is preferred — it's public and lets others weigh in. This form is the path for everyone else.