Logo
Back to Blog
AI & AutomationJune 9, 202613 min read

Loop Engineering: Designing Systems That Prompt AI Agents

Loop engineering is the shift from prompting AI agents by hand to designing the systems that prompt them. Here's what the term means, the five building blocks (plus memory) of an agent loop, how Claude Code and OpenAI Codex implement each piece, a realistic end-to-end loop, and the risks that get sharper as the loop improves.

Lushbinary Team

Lushbinary Team

AI & Cloud Solutions

Loop Engineering: Designing Systems That Prompt AI Agents

For about two years, the way you got value out of a coding agent was simple: write a good prompt, share enough context, read what came back, and type the next thing. You held the tool the entire time, one turn after another. In June 2026 that posture started to change. "Loop engineering" is the name now attached to the shift, and it captures a real change in where the leverage lives: you stop being the person who prompts the agent and start being the person who designs the system that prompts it.

The phrase was popularized by Google engineer Addy Osmani, echoing Peter Steinberger's line that "you should be designing loops that prompt your agents" and Anthropic Claude Code lead Boris Cherny's comment that his job is now to write loops rather than to prompt the model directly. It is still early, the token economics can swing wildly, and verification is harder than ever. But the building blocks now ship inside the products you already use, so the pattern is worth understanding whether or not you adopt it fully.

This guide breaks down what loop engineering actually is, how it differs from prompt and context engineering, the five building blocks (plus memory) that make a loop hold together, how Claude Code and OpenAI Codex implement each piece, what one realistic loop looks like end to end, and the failure modes that get sharper, not easier, as the loop improves. If you write code with agents, this is the next layer of the craft.

1What Loop Engineering Actually Means

Loop engineering is replacing yourself as the person who prompts the agent, and designing the system that does it instead. A loop here is a recursive goal: you define a purpose once, and the agent iterates until the work is actually complete. Instead of you typing the next instruction after every response, a small system finds the work, hands it out, checks the result, writes down what is done, and decides the next thing to do. You let that system poke the agent instead of poking it yourself.

The mental model that helps most: a coding agent already runs an inner loop on every turn. It reasons about what to do, takes an action (calls a tool, edits a file, runs a test), observes the result, and loops back to reason again against the new state. That perceive, reason, act, observe cycle is the agentic loop. Loop engineering sits one floor above it. You are no longer steering each turn by hand. You are building an outer loop that runs on a schedule, spawns helpers, feeds itself work, and keeps going across many of those inner cycles without you in the seat for each one.

💡 The one-sentence definition

Loop engineering is building a system that prompts your agent on a schedule and against a goal, instead of typing each prompt yourself. The leverage moves from the quality of a single prompt to the design of the system that generates and verifies prompts.

What surprised early adopters is that this is no longer a build-it-yourself effort. A year ago, a loop meant a pile of bash scripts you maintained forever and that only you understood. As of mid 2026, the pieces ship inside the products. Peter Steinberger's checklist of what a loop needs maps almost exactly onto the OpenAI Codex app, and nearly the same list onto Anthropic's Claude Code. Once you notice the shape is identical across tools, you stop arguing about which agent is best and start designing a loop that works no matter which one you happen to be sitting in.

Loop engineering is the agentic sibling of two ideas you have probably already met. If you are coming from vibe coding, loop engineering is what happens when the vibes need to run unattended and survive your laptop closing. If you have read about multi-agent development with Claude Code agent teams, loop engineering is the discipline of wiring those agents into a self-running cycle.

2From Prompt Engineering to Loop Engineering

It helps to see loop engineering as the third layer in a stack that has been building for a few years. Each layer wraps the one inside it, and each one moves the leverage point a little further away from the raw model call.

LayerWhat you optimizeUnit of work
Prompt engineeringHow you phrase a single instructionOne turn you type by hand
Context engineeringWhat else goes in the window: docs, history, tool definitionsThe conditions around one answer
Loop engineeringThe system that decides what to prompt and when, and whether the result is acceptableA self-running cycle across many turns

Prompt engineering never goes away. A loop is built out of prompts, and a sloppy prompt inside a loop just produces sloppy work faster. Context engineering does not go away either: the loop still has to put the right files, history, and tool definitions in front of the model on each turn. What loop engineering adds is the autonomous control structure around all of that. The harness runs the single agent. The loop runs the harness on a timer, spawns helpers, and feeds itself.

⚠️ The leverage moved, the work did not get easier

Boris Cherny's point is not that coding got easier. It is that the highest-value thing you can do shifted from writing prompts to designing loops. A well-designed loop multiplies a good engineer. A badly designed loop multiplies a bad decision just as fast, with less of you watching.

For a deeper look at the day-to-day craft of working turn by turn with agents, see our comparison of AI coding agents. Loop engineering is what you reach for once a single agent is no longer the bottleneck.

3The Five Building Blocks (Plus Memory)

A working loop needs five things, and then one place to remember state. The names differ slightly between tools, but the capability is the same. Here is the list, and then the diagram of how the pieces fit into a single self-running cycle.

  1. Automations that fire on a schedule and do discovery and triage by themselves.
  2. Worktrees so two agents working in parallel do not step on each other's files.
  3. Skills to write down the project knowledge the agent would otherwise guess at every session.
  4. Plugins and connectors to plug the agent into the tools you already use.
  5. Sub-agents so one of them has the idea and a different one checks it.

The sixth piece is memory: a markdown file, a Linear or GitHub board, anything that lives outside a single conversation and holds what is done and what is next. It sounds too simple to matter, but it is the trick every long-running agent depends on. The model forgets everything between runs, so the state has to live on disk, not in the context window. The agent forgets. The repo does not.

The Anatomy of One Agent LoopAutomation fires on scheduleDiscover & triage workSub-agent drafts the changeVerifier sub-agent checks itConnectors open PR & ticketnext cycleMemory on disk persists state

The reassuring part of mid-2026 is that you no longer assemble this from scratch. Both Claude Code and OpenAI Codex ship all five blocks plus durable memory, with different command names but the same shape. The next sections walk through each block, what it does, and how the two leading tools implement it, so you can design a loop that survives a switch between them.

If you want the per-tool mechanics of splitting work across multiple autonomous agents, our guide to OpenAI Codex sub-agents and autonomous coding teams covers the orchestration layer in depth.

4Automations: The Heartbeat of a Loop

Automations are what make a loop an actual loop and not just one run you did once. They are the heartbeat: a recurring trigger that surfaces work without you asking. Everything else in the loop reacts to what the automation finds.

In OpenAI Codex

The Codex app has an Automations tab where you pick the project, the prompt to run, the cadence, and whether it runs on your local checkout or a background worktree. Runs that find something land in a Triage inbox; runs that find nothing archive themselves. OpenAI describes using Automations internally for routine work like issue triage, alert monitoring, summarizing CI failures, and writing commit briefings. An automation can call a skill, so the recurring instruction stays maintainable: you fire a named skill instead of pasting a wall of instructions into a schedule nobody will ever update.

In Claude Code

Claude Code reaches the same place through scheduling and hooks. The /loop command schedules a recurring prompt on an interval (it turns your cadence into a cron job and confirms a job ID), hooks fire shell commands at points in the agent lifecycle, and you can push the whole thing to GitHub Actions so it keeps running after you close the laptop. The second in-session primitive is the one closest to loop engineering: /goal keeps working across turns until a condition you wrote is verifiably true, and after every turn a separate, smaller model checks whether you are done, so the agent that wrote the code is not the one grading it.

# Claude Code: run a recurring triage prompt every weekday at 9am
/loop "Read yesterday's CI failures and open issues, write findings
       to TODO.md, and draft fixes for anything labeled quick-win"
       --schedule "0 9 * * 1-5"

# Claude Code: run until a verifiable stopping condition holds
/goal "All tests in test/auth pass and lint is clean"

# OpenAI Codex: persisted long-running objective (CLI 0.128.0+)
codex /goal "Migrate the billing module to the new pricing API,
             keep all existing tests green"

⚠️ Watch the token bill

A scheduled loop with a verifier model running after every turn can burn tokens fast, and usage varies wildly depending on how often the automation fires and how many sub-agents it spawns. Start with a slow cadence and a tight goal condition, watch the cost for a few days, and scale up only once the loop is producing work you actually merge.

The shape is identical across both tools: define an autonomous task, give it a cadence, and let the findings come to you instead of going around checking yourself. The /goal primitive in particular has become the most discussed agent primitive of 2026, precisely because it is the piece that lets a loop decide it is finished without a human in the seat.

5Worktrees: Parallel Agents Without Collisions

The moment you run more than one agent, files start colliding. Two agents writing the same file is the same headache as two engineers committing to the same lines without talking first. A git worktree solves it: a separate working directory on its own branch that shares the same repo history, so one agent's edits literally cannot touch another agent's checkout.

Codex builds worktree support in directly, so several threads can hit the same repo at once without bumping into each other. Claude Code gives you the same isolation through git worktree, a --worktree flag to open a session in its own checkout, and an isolation: worktree setting you put on a subagent so each helper gets a fresh checkout that cleans itself up afterward.

💡 You are still the ceiling

Worktrees remove the mechanical collision, but they do not remove the review bottleneck. Your bandwidth to read and approve merged work decides how many parallel agents you can actually run, not the number of worktrees the tool will spin up. Ten agents producing changes you cannot review is worse than two you can.

6Skills & Memory: Stop Re-Explaining Your Project

A skill is how you stop re-explaining the same project context every session. Both tools use the same format: a folder with a SKILL.md file holding instructions and metadata, plus optional scripts, references, and assets. Codex runs a skill when you call it with $ or /skills, or on its own when a task matches the skill description, which is why a tight, boring description beats a clever one. Claude Code works the same way. When you want to share a skill across repos or bundle several together, you package them as a plugin: the skill is the authoring format, the plugin is how you ship it.

Skills are where intent stops costing you over and over. An agent starts every session cold and will fill any gap in your intent with a confident guess. A skill is that intent written down on the outside: the conventions, the build steps, the "we do not do it this way because of that one incident." Without skills, the loop re-derives your whole project from zero every cycle. With skills, knowledge compounds across runs.

Memory is the close cousin of skills, and it is the spine of any loop that runs longer than a single conversation. Skills hold durable knowledge (how we build, what our conventions are). Memory holds changing state (what got tried, what passed, what is still open). It can be a markdown file, a Linear board, or a GitHub issue list. The only requirement is that it lives outside the context window, because the model forgets everything between runs. Tomorrow morning's run reads the state file and picks up exactly where today stopped.

MechanismHoldsLives in
SkillDurable project knowledge and conventionsSKILL.md in the repo
MemoryChanging state: what is done, what is nextMarkdown file or issue board
ConnectorAccess to external tools and dataMCP server config

Plugins and Connectors: The Loop Touches Your Real Tools

A loop that can only see the filesystem is a tiny loop. Connectors, which are built on the Model Context Protocol (MCP), let the agent read your issue tracker, query a database, hit a staging API, or drop a message in Slack. Codex and Claude Code both speak MCP, so a connector you wrote for one usually works in the other. This is the difference between an agent that says "here is the fix" and a loop that opens the PR, links the ticket, and pings the channel once CI is green, all on its own.

If you are new to MCP, our MCP developer guide covers how connectors are built and secured, which matters a great deal once a loop can act inside your real environment unattended.

7Sub-Agents: Separate the Maker From the Checker

The single most useful structural move in a loop is splitting the agent that writes from the agent that checks. The model that wrote the code is far too generous grading its own homework. A second agent with different instructions, and sometimes a different model, catches the things the first one talked itself into.

In Codex, you define sub-agents as TOML files in .codex/agents/, each with a name, description, instructions, and optional model and reasoning effort. Your security reviewer can be a strong model on high effort while your explorer is a fast, read-only one. Codex spawns sub-agents when you ask, runs them in parallel, and folds the results back into a single answer. Claude Code does the same with sub-agents in .claude/agents/ and agent teams that pass work between them. The usual split in both: one agent explores, one implements, one verifies against the spec.

💡 Why the split matters inside a loop

The loop runs while you are not watching, so a verifier you actually trust is the only reason you can walk away. This is also what Claude Code's /goal does under the hood: a fresh model decides whether the loop is done, not the one that did the work. The maker-checker split is applied to the stop condition itself. Sub-agents do burn more tokens, since each runs its own model and tool calls, so spend them where a second opinion is worth paying for.

For the full mechanics of organizing several agents into a reviewing team, see our guide to Claude Code agent teams.

8What One Loop Looks Like, End to End

Put the pieces together and a single thread turns into a small control panel. Here is a shape that works well as a first loop, and that maps cleanly onto both Codex and Claude Code because the primitives are the same.

  1. An automation runs every weekday morning on the repo. Its prompt calls a triage skill.
  2. The skill reads yesterday's CI failures, the open issues, and recent commits, then writes findings into a memory file or a Linear board.
  3. For each finding worth doing, the thread opens an isolated worktree and sends a sub-agent to draft the fix.
  4. A second sub-agent reviews that draft against the project skills and the existing tests.
  5. Connectors open the PR and update the ticket. Anything the loop cannot handle lands in the triage inbox for you.

The state file is the spine of the whole thing. It remembers what got tried, what passed, and what is still open, so tomorrow morning the run picks up where today stopped. Look at what you actually did: you designed the system once. You did not prompt any of those steps by hand. That is the entire point of loop engineering, and it is the same loop whether you run it in Codex or Claude Code.

# .claude/agents/reviewer.md  (the checker, separate from the maker)
---
name: spec-reviewer
description: Reviews a draft change against project skills and tests.
model: opus            # strong model for the verifier
isolation: worktree    # fresh checkout, no collisions
---
You are an adversarial reviewer. Run the test suite, check the diff
against CONVENTIONS.md, and reject anything that is not verifiably done.

# TODO.md  (the memory: survives every run)
## Open
- [ ] flaky test in test/auth/login.spec.ts (CI run #4821)
## Done
- [x] bump axios to patched version (PR #312, merged)

Start smaller than this if it is your first loop. A single automation that triages CI failures into a markdown file every morning, with no auto-merge, already removes a recurring chore and lets you watch how the loop behaves before you trust it to open PRs.

9The Risks Loop Engineering Does Not Solve

A loop changes the work; it does not delete you from it. Three problems actually get sharper as the loop gets better, not easier. Loop engineering is partly the discipline of designing around them.

1. Verification is still on you

A loop running unattended is also a loop making mistakes unattended. The whole reason you split the verifier sub-agent from the maker is to make the loop's "it is done" mean something. Even then, "done" is a claim, not a proof. Your job is to ship code you confirmed works, which is why human review of merged changes stays in the loop no matter how good the verifier gets.

2. Comprehension debt grows faster

The faster the loop ships code you did not write, the bigger the gap between what exists in the repo and what you actually understand. A smooth loop just makes that gap grow faster, unless you read what the loop produced. This is the same comprehension debt that AI-assisted coding has always carried, accelerated.

3. Cognitive surrender is the comfortable failure

When the loop runs itself, it is tempting to stop having an opinion and accept whatever it returns. Designing the loop is the cure when you do it with judgment, and the accelerant when you do it to avoid thinking. Same action, opposite result. Two people can build the exact same loop and get opposite outcomes: one moves faster on work they understand deeply, the other avoids understanding the work at all. The loop does not know the difference. You do.

⚠️ Build the loop, stay the engineer

Loop engineering is still early, and prompting agents directly by hand is still effective. The goal is balance: set up loops for the recurring, verifiable work, and keep direct control for the parts where your judgment is the value. Build the loop like someone who intends to stay the engineer, not just the person who presses go.

The security dimension deserves its own attention: an autonomous loop with connector access can touch production systems. Our AI agent security guide covers the guardrails, permissions, and audit logging a loop needs before you let it run unattended against real infrastructure.

10Why Lushbinary for Agentic Engineering

Loop engineering is powerful, but designing a loop that ships reliable code without quietly accumulating risk is genuinely hard. It takes good skills authoring, a verifier you can trust, sensible cadence and cost controls, and the security plumbing to let a loop touch real systems safely. Lushbinary has been building production AI integrations and agentic workflows since the GPT-4 era, across healthcare, fintech, SaaS, and e-commerce.

Here is what we bring to an agentic engineering setup:

  • Loop and harness design - we set up automations, worktrees, skills, connectors, and maker-checker sub-agent splits that work across Codex and Claude Code.
  • Skills and memory authoring - we capture your conventions, build steps, and project knowledge so the loop stops guessing and starts compounding.
  • Verifier and evaluation design - we build the verifiable stop conditions and adversarial review agents that make "done" mean something.
  • Cost and cadence tuning - we instrument token usage and tune schedules so the loop pays for itself instead of surprising you on the invoice.
  • Security and AWS infrastructure - connector permissions, audit logging, and production deployment on AWS with the guardrails an unattended loop demands.

🚀 Free Agentic Workflow Consultation

Want to put loop engineering to work without the runaway token bills or the unreviewed merges? Lushbinary will review your current agent setup, design a loop tuned to your codebase, and recommend the verification and cost controls to run it safely - no obligation.

11Frequently Asked Questions

What is loop engineering?

Loop engineering is the practice of designing the system that prompts an AI agent on a schedule, instead of typing each prompt yourself. You define a recursive goal, give the agent a way to find work, act on it, verify the result, and remember what is done, then let that system drive the agent. The term was popularized in June 2026 by Addy Osmani, building on points from Peter Steinberger and Anthropic's Boris Cherny.

How is loop engineering different from prompt engineering?

Prompt engineering optimizes a single instruction you type by hand, one turn at a time. Loop engineering optimizes the autonomous system that decides what to prompt, when to prompt it, and whether the result is acceptable. Prompt engineering treats the agent as a tool you hold; loop engineering treats it as a long-running process with memory, scheduling, evaluation, and orchestration.

What are the building blocks of an agent loop?

A practical loop has five pieces plus a memory store: scheduled automations that do discovery and triage, git worktrees so parallel agents do not collide, skills that capture project knowledge, plugins and MCP connectors that wire the agent into your real tools, and sub-agents that split the maker from the checker. The sixth piece is durable memory on disk (a markdown file or an issue board) that survives between runs.

Do Claude Code and OpenAI Codex support loop engineering?

Yes. Both ship the core primitives. Claude Code has /loop for recurring scheduled prompts and /goal to run until a verifiable condition holds, plus hooks, subagents, and worktree isolation. OpenAI Codex has Automations for unprompted recurring work, a /goal command (added in Codex CLI 0.128.0 on April 30, 2026), built-in worktrees, skills, and TOML-defined subagents. Connectors in both are built on MCP.

What are the risks of loop engineering?

An unattended loop is also a loop that makes mistakes unattended. The main risks are weak verification (the agent claims done without proof), comprehension debt (code ships faster than you can understand it), and cognitive surrender (accepting whatever the loop returns without judgment). Mitigations include a separate verifier sub-agent, human review of merged code, and keeping the engineer in the design and review path.

📚 Sources

Content was rephrased for compliance with licensing restrictions. Definitions, command behavior, and product capabilities sourced from official Anthropic and OpenAI documentation and from Addy Osmani's writing as of June 2026. Tool commands and pricing may change - always verify on the vendor's website before relying on a specific capability.

Ready to Engineer Your First Loop?

From automations and verifier sub-agents to the security plumbing that lets a loop run unattended, Lushbinary designs agentic workflows that ship reliable code. Let's talk about your setup.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

Contact Us

Subscribe · Newsletter

Engineer Loops, Not Just Prompts

Practical breakdowns of agentic workflows, AI coding agents, and the tools reshaping how software gets built.

  • New deep-dives on AI agents and cloud architecture
  • Engineering teardowns of shipped products
  • No spam, unsubscribe in one click

We respect your inbox. Read our privacy policy.

Exclusive Offer for Lushbinary Readers
WidelAI

One Subscription. Every Flagship AI Model.

Stop juggling multiple AI subscriptions. WidelAI gives you access to Claude, GPT, Gemini, and more - all under a single plan.

Claude Opus & SonnetGPT-5.5 & o3Gemini ProSingle DashboardAPI Access

Use code at checkout for 10% off your subscription:

Loop EngineeringAI AgentsAgentic AIPrompt EngineeringContext EngineeringClaude CodeOpenAI CodexAgent LoopSub-AgentsMCPAutomationsDeveloper ToolsAI Coding AgentsMulti-Agent Systems

ContactUs