The Best Agent Harness for Claude Code and Codex
A practitioner's guide to building an agent harness for Claude Code and Codex in 2026, what a harness actually is, and how to pick one that survives model churn.
A year ago “agent harness” was an inside-baseball term used mostly inside AI labs. In 2026 it has become one of the most important decisions a serious AI-coding shop makes, and most teams are making it accidentally.
A harness is everything around the model that helps it do the right thing when it needs to. The model itself is interchangeable. The harness is not. As frontier models keep flipping the leaderboard every few weeks, the harness is increasingly where your real investment lives.
This post is a practical look at what a good agent harness is, what it is made of, and how to think about picking or building one for Claude Code and Codex specifically.
What an agent harness actually is
A harness is context plus restraint plus empowerment.
- Context: the things the model needs to know to do good work in your codebase. Your conventions, your past decisions, the way you build React components, the shape of your data model, the open tracker items related to what is being worked on right now.
- Restraint: the rules that keep the model inside the lines. Do not use dynamic imports here. Never write to the D1 database from this path. Always ask before running this kind of command.
- Empowerment: the tools the model can reach for. Direct access to log files, the ability to query the running app’s state, a sandboxed browser for end-to-end testing, screenshots of the UI it just changed.
Strip those three out and what is left is a chat box pointed at a fast autocomplete engine. Put them in and you have something that can iterate on real work and get measurably better over time.
The parts of a real harness
A harness is not one file. In a working setup, it is at least these:
- A root instruction file (a
CLAUDE.md, anAGENTS.md, or the equivalent startup file your agent reads). The first thing every agent reads. Project conventions, critical rules, the map to the rest of the harness. - Path-scoped rules. Files that activate when the agent touches a particular area. “When you are working on IPC handlers, read this.” “When you are styling components, follow these Tailwind conventions.”
- Skills, examples, and recipes. Worked examples that show, not just tell. The model is much better at imitating a good example than at parsing prose.
- Tools that touch live state. Read the log file. Query the local database. Take a screenshot of the UI. Run the end-to-end test suite in a loop until it passes. These are what turn a code generator into something that can verify its own work.
- A linked workspace. Tracker items, sessions, commits, files, and decisions that are all addressable. So the agent can see “this bug is linked to that session is linked to those files is linked to that commit history.”
If you only have the first one, you have a notes file. If you have all five, you have a harness that compounds in value every week.
Why this matters more for Claude Code and Codex than for any single agent
Many teams that use Claude Code are also testing Codex. Claude is better at some kinds of work, Codex is better at others, and frontier models keep trading positions. The teams getting the most out of agentic coding right now are routing different tasks to different agents.
That changes the harness requirements. A harness that only works inside one vendor’s app does not survive the next model swap. The valuable harness is the one that is portable across agents: the same context, the same rules, the same tools, available to whichever agent you point at the work.
Concretely, that means:
- Instruction files in the repo, not in the vendor’s UI.
CLAUDE.mdandAGENTS.mdchecked into git, readable by any agent that respects them. - Tools exposed through an open protocol. MCP (Model Context Protocol) is the current best answer. Tools written once, reachable by Claude Code, Codex, and whatever lands next.
- A workspace surface that is not owned by one model vendor. If the surface is from the same company as the model, it may eventually optimize for keeping you on that model.
What the “best” harness looks like
For Claude Code plus Codex specifically, the best harness in 2026 has four properties.
Open and inspectable. You can read every file the agent reads. No hidden system prompts owned by a vendor. If a rule is firing, you can find it. If a tool is wrong, you can fix it.
Multi-agent by design. The same harness drives Claude Code, Codex, and any other agent your team wants to try. Switching agents on a task is a one-click decision, not a migration.
Workspace-aware. The harness can see across sessions, tasks, files, and decisions. An agent fixing a bug can read the linked tracker item, the related session transcripts, and the commit history without you copy-pasting any of it.
Loopable. The agent can run, observe, evaluate, and try again, using real tools (Playwright, log queries, screenshots) rather than guessing. This is the difference between agents that need a human in the loop on every step and agents that can grind through a long task while you do something else.
Any harness missing one of those four properties is going to feel limiting within months.
Where Nimbalyst fits
I build Nimbalyst, so this is the part where I tell you what we are doing about it. Nimbalyst is an open-source visual workspace for AI coding that is built to be a harness across Claude Code and Codex.
Concretely:
- Project-level instruction files (
CLAUDE.md,AGENTS.md, scoped rule files) are first-class. Any agent in the workspace reads them. - MCP tools are a first-class part of the system. Agents can query live state, read log files, drive the UI, take screenshots, and run verification loops against the app.
- Sessions, tasks, decisions, mockups, diagrams, and code all live in the same workspace and are linkable. An agent can see the bug, the related sessions, the related files, and the history that connects them.
- Claude Code and Codex are first-class agents today and the agent layer is pluggable for the next one.
- Most of the repo, including the desktop and iOS apps, is MIT-licensed by default. The collaboration server is AGPL-3.0, with a separate commercial option.
It is not the only valid answer. A determined team can hand-roll a harness with a careful repo layout, a shared MCP server, and discipline about which app each developer uses. The reason I think a workspace-shaped harness wins is that the linking is the part that does not exist in a pile of files.
How to pick (or build) yours this quarter
If you are about to invest serious time in agentic coding, three concrete moves:
- Move your harness into the repo. Whatever is in your head about how the codebase should be written needs to be in a
CLAUDE.mdand anAGENTS.md, both checked in. - Write at least one path-scoped rule and one tool that reads live state. Any path-scoped rule. Any live-state tool. The first one is the hardest. The second is when the harness starts to compound.
- Pick a surface that does not lock you to one model vendor. Whether that is Nimbalyst, a careful CLI setup, or something else, the test is the same. Can a new agent that ships next month run against the same harness with one config change?
The model and agent layers will keep moving. The harness is the part you own.
Related posts
-
Best Tools for Agentic Coding in 2026
A practitioner's tour of the agentic coding tool landscape in 2026, covering terminal agents, IDE agents, workspace surfaces, and the gaps that still need closing.
-
Why we put Obsidian, Linear, Terminal, Codex app, and Conductor in one workspace
Plans, diagrams, tasks, agent sessions, and diff review used to live in five different apps. Putting them in one workspace changes how agentic engineering actually feels.
-
What Agentic Engineering Is and How to Practice It
Agentic engineering is a software workflow built around delegation, structured context, parallel execution, and rigorous review. Here is how to do it well.