- To
- Engineering leaders
- From
- Ryan Gosnell
- Date
- 5 May 2026
- Re
- How AI is actually being used for engineering work — and what to ask your teams
The bottleneck moved from typing to specifying. Most teams have not yet adjusted.
AI in engineering used to mean autocomplete. Today it means agents — LLMs running in loops with tools — that implement entire tasks against written specs. The teams capturing real leverage are the ones investing in spec discipline, multi-agent workflows, and platform-layer infrastructure. The teams that aren't are leaving most of the value on the table while paying for the same tooling.
01The shift
Three years ago, AI in engineering meant a developer typed code and the IDE suggested the next few lines. Useful, but the human drove every keystroke.
Today the work pattern is different. A developer writes a description of what they want — a paragraph, a checklist, a spec — and an agent does the implementation. The developer reviews the result, adjusts the spec, runs again. The unit of work moved from lines of code to tasks. Engineers getting the most leverage are the ones who got good at writing precise specs and reviewing AI-generated work efficiently. Engineers still using AI as autocomplete are leaving most of the value on the table.
02What an "agent" actually is
An agent is an LLM running in a loop with access to tools. It reads the task, decides what to do, calls a tool (read a file, run a command, post a comment), reads the result, and decides the next step. It repeats until the task is done or it gets stuck. Everything sophisticated comes from how the context is structured, which tools are available, and what guardrails exist on the loop.
03Where the leverage is
Two moves dominate. First: multi-agent review workflows. One agent drafts, a second agent critiques, a third agent reconciles. This catches errors that no single-agent setup catches and parallelizes well. Second: investing in the platform layer. A well-structured repo with clear context files, working scripts, and known conventions is one where any agent can be productive immediately. A messy repo is one where every agent run starts from zero.
04Where the risk is
- Trust drift. Teams that start by reviewing every line slowly stop. The quality is high enough that careful review feels redundant. It isn't. Set explicit norms about what gets reviewed and stick to them.
- Cost surprises. Without monitoring, costs can 10x in a week when someone leaves a long-running loop unattended. Monthly budget alerts are the minimum.
- Skill atrophy. Junior engineers who only ever direct agents may not develop the deep mechanical skill that makes them good at directing agents. Think about how juniors get reps on the actual mechanics, not just on supervision.
- Context staleness. Agents work off context files. Those files rot. A team that hasn't updated its context file in three months is a team whose agents are working off bad assumptions.
05Five questions to ask your teams
Not "how much AI are you using" — that question doesn't reveal anything. Instead:
- What's your spec workflow? "I just type questions into chat" is early. "I write a markdown file, dispatch an agent against it, then review the diff" is the modern pattern.
- Are you using multiple agents or just one? Single-agent is fine for small tasks. Multi-agent (plan/execute/judge, council, multi-LLM review) is where leverage lives for non-trivial work.
- What's your trust calibration? Where on the spectrum from "reviews every line" to "lets it run overnight in a sandbox" does your team operate, and why?
- What does it cost? A rough monthly per-developer number. If they have no idea, they're either over-spending or under-using.
- What broke recently? Every team running serious agentic workflows has stories about an agent doing something dumb. Teams that don't have these stories aren't pushing hard enough. Teams that have too many aren't using guardrails.
06Recommendation
Start by reading the seven section overviews in the companion repo (about an hour, end to end). Then pick one of two starting moves: introduce a multi-agent review workflow on a real project (section 03), or invest in the platform layer for an existing repo (section 02). Either compounds. Both is better.
The patterns here are not theoretical. Each one came out of running into the failure mode and figuring out the way around it. The repo is openly licensed (CC BY 4.0) and meant to be shared and adapted.