Patterns for building with agents.

Seven sections, twenty-six pages. The durable lessons from building agentic systems — how to structure context, when to trust which model, how multiple agents coordinate without falling over, and how to instrument the whole thing so you can tell when it's broken. Nothing here is theoretical.

— start here

5 minutes to a working mental model.

"Eight sections is a lot. The quickstart routes you to the three pages worth reading first."

Not sure where to start? The QUICKSTART gives you four reading paths based on who you are: engineer, engineering leader, platform builder, or focused on safety. Each path is exactly three pages, ~15 minutes, calibrated to give you a working model of what's happening before you commit to deeper reading.

Need a term defined? See the GLOSSARY — single-page lookup for every load-bearing term used across the repo.

— for leaders

A leader's memo.

"The bottleneck moved from typing to specifying. Most teams have not yet adjusted."

A one-page primer for managers, directors, and execs who are not coding day-to-day but need to make decisions about tooling, hiring, and strategy. Five questions to ask your teams. Where the leverage is. Where the risk is. Read the memo →

Or read it as markdown.

— what's new in v0.5

Seven new pages from unmined source material.

"The repo's strength is being focused. The right move was surgical additions, not a content dump."

Multi-agent (sec 03): Reflection Loops (generate → critique → revise) · Framework Selection (LangGraph / CrewAI / AutoGen decision matrix).

Automation (sec 04): Prompt Injection — the structural vulnerability every agentic system has. Required reading before letting an agent run with real authority.

Tools (sec 07): Tool Description as Prompt — the most load-bearing sub-pattern of tool design, promoted to its own page.

Platform (sec 02): Cross-Platform Parity — Mac / Linux / Windows / WSL discipline.

Resources (sec 08): Shell and Terminal Tools · Cloud and Deployment Tools — deep references for the tooling that hosts agentic work.

Earlier additions (v0.4): 6 Mermaid diagrams + 5 war stories. See CHANGELOG for the full history.

— 01

Concepts — the vocabulary

If you've sat in a meeting and people were saying "MCP" and "context window" and you weren't sure exactly what they meant, this is the section that pins the words down.

01.1

What is an Agent

The loop · Tools · Where they fail

An LLM running in a loop with tools. Everything sophisticated comes from context, tool design, and guardrails — not the model itself.

Start here

01.2

Context and Memory

What an agent "remembers"

Context windows, persistent memory, the difference between what's in the prompt and what's stored. Why agents "forget" things.

Concept

01.3

MCP Explained

The Model Context Protocol

USB for agents. A standard way for any agent to plug into any tool. What it is, what it doesn't solve, and why it matters at the platform level.

Protocol

01.4

Trust and Specs

Calibration · Centaur model

How much autonomy to give agents, where specs sit in the workflow, and why "humans judge, agents build" is the right default.

Discipline

— 02

Platform — the boring infrastructure that makes agents productive

Setting up a project so agents can be productive in it. Teams that skip this end up with agents that flail — they don't know how to run tests, find the build script, or follow conventions.

02.1

The Agentic OS Spine

CLAUDE.md · AGENTS.md · specs/ · .mcp.json

The small set of files and conventions that turn a regular project into one any agent can navigate productively. Includes a file-layout diagram.

High leverage

02.2

Dev Environment

Abstract patterns

The patterns that make a development environment productive for agentic work. Concrete worked examples live in section 08.

Setup

02.3

Project Template

Scaffolding for new projects

A starting structure for new projects that includes the spine from day one. Less to retrofit later.

Scaffolding

02.4

Cross-Platform Parity

Mac · Linux · Windows · WSL

The discipline that prevents "works on my machine" failures at the OS layer. Line endings, paths, fonts, shells, and what WSL is good for.

Discipline

— 03

Multi-agent workflows — where the real leverage lives

Single agents have structural limits — no internal critic, single perspective, sequential thinking. Multi-agent workflows close all three. The single highest-leverage move in agentic engineering.

03.1

Plan / Execute / Judge

The workhorse pattern

Three roles, three agent runs. Includes a flow diagram, a real example prompt set, and a war story about scaling this pattern to a 150-book extraction job.

Highest leverage

03.2

Reflection Loops

Generate → critique → revise

The single most under-applied agentic pattern. A small structural change that produces noticeably better output than single-pass generation.

High ROI

03.3

Spec-Driven

Specs as the contract

Why specs are the connective tissue of multi-agent work, and what a useful spec actually looks like.

Foundation

03.4

Multi-LLM Review

Same task across providers

Run the same critique across two providers. Differences highlight risks no single model would catch.

Pattern

03.5

Agent Council

Personas debating decisions

For design decisions and architectural questions where multiple perspectives surface what one agent would miss.

Pattern

03.6

Handoff

Long tasks across sessions

The discipline that makes multi-session and multi-agent work survive context loss.

Discipline

03.7

Framework Selection

LangGraph · CrewAI · AutoGen · LangChain

Decision matrix for picking a multi-agent framework. The single most important question: do you need cycles?

Decision

03.8

Failure Modes

What goes wrong

The recurring failure modes of multi-agent setups, and how to recognize them before they ship.

Diagnostic

— 04

Automation — letting agents run unattended

Where you have to think hard about trust and blast radius. Levels of autonomy, sandboxing, overnight runs, and the failure modes of leaving agents on by themselves.

04.1

Autonomous Control Levels

L0 to L7 spectrum

Eight levels of autonomy from "suggest only" to "fully autonomous." Includes a color-coded spectrum diagram and a war story about a runaway extraction caught by a per-task budget.

Foundational

04.2

Sandbox Environments

Where unattended runs should live

Containment patterns for letting agents run with real authority without putting your environment at risk.

Operations

04.3

Overnight Runs

Discipline for letting it cook

When to use overnight runs, what to capture, what kills the run cleanly when something's gone wrong, and what makes the morning review fast.

Operations

04.4

Prompt Injection

The structural vulnerability

Direct vs indirect injection, attack vectors, defenses that work, defenses that don't even though they sound like they should. Required reading before letting an agent run with real authority.

Required reading

— 05

Local LLMs — when running models on your own hardware makes sense

The capability gap between local and frontier is real, and shrinking, but not closed. When local is the right call, when it isn't, and how to evaluate which models actually work for your tasks.

05.1

Model Selection

How to pick a local model for a task

Matching open-model capability to task needs without paying the frontier-tier cost. The criteria that actually matter beyond the leaderboard rank.

Decision

05.2

Benchmarking

Your evals over vendor numbers

Vendor benchmarks age in days. How to design and run evals that predict real performance on your tasks.

Discipline

— 06

Token efficiency — controlling cost and latency without sacrificing quality

A 5–10x cost reduction is achievable with no quality loss if you apply prompt caching, model tiering, and context discipline correctly. Most teams capture none of it.

06.1

Prompt Caching

Single biggest cost lever

Stable prefix first; variable content last. Cuts input cost 70-90% on cached calls. Most teams structure prompts to defeat the cache without realizing it.

Highest ROI

06.2

Model Tiering

Match capability to task

Frontier for hard reasoning, mid-tier for execution, small for batch. Defaulting to the best model on every call is the most common and most expensive mistake.

Decision

06.3

Context Management

Three layers · Caching · Compaction · Retrieval

Includes a three-layer diagram and a war story about a 100× compression ratio on a real bash cookbook synthesis.

Pattern

06.4

Cost Estimation

Budgeting and forecasting

The unit economics of agentic work, what "expensive" actually looks like, and the alerts that protect against runaway loops.

Operations

— 07

Tools and MCP — the plumbing

Tool design is where most agentic workflows succeed or fail. The model is downstream of the tools you give it. Hooks add guardrails the model can't forget. Skills package recurring expertise.

07.1

MCP Server Patterns

Tool design discipline

The longest read in this section because tool design has the most subtle craft. Naming, scoping, descriptions, errors — the choices that determine whether an agent uses a tool well or badly.

Craft

07.2

Tool Description as Prompt

The most load-bearing sub-pattern

The model decides whether to call your tool by reading the description. Tool descriptions are prompt content, not documentation. The single highest-ROI sub-pattern of tool design.

Foundational

07.3

Hooks

Event-driven harness automation

Pre/post tool-use guardrails, session-start context loading, the discipline that should not depend on the model remembering.

Pattern

07.4

Skills

Discoverable, reusable capabilities

The slot between a tool and a workflow. How organizational knowledge accrues into the agent over time.

Pattern

07.5

Observability

Tracing · Metrics · Audit

Three layers: per-session traces, aggregate metrics, audit trail. Includes a war story about a 30-line audit script that paid back across the whole repo.

Operations

— 08

Resources — the practical apparatus

What dev environment to set up, what to read, which tools to actually pick. Opinionated, dated, and re-evaluated annually — the picks change but the criteria hold up.

08.1

Dev Environment

Six-layer reference setup

A productive cross-platform setup for agentic work, with links to working public examples (terminal-stack, dotfiles, macOS counterpart, fresh-machine env synth).

Practical

08.2

Shell and Terminal Tools

Modern Unix toolchain · Multiplexers · fzf widgets

Deep reference for shell/terminal tooling for AI engineering. Specific picks per category, security-first scripting headers, agent-friendly output patterns, anti-patterns to avoid.

Reference

08.3

Cloud and Deployment Tools

VMs · VPN · Reverse proxy · Containers · Serverless

Patterns for hosting agents and MCP servers cost-effectively. Always-free gateway, sleeping compute, VPN-only access, container orchestration, audit/observability.

Reference

08.4

Send a quick note.

Found something useful? Disagree with a recommendation? Have a war story from your own practice? This form goes straight to the maintainer.

If you have a GitHub account, opening an issue is preferred — it's public and lets others weigh in. This form is the path for everyone else.