Five pillars of the agent harness above Claude Code and Codex
A progress report on the agent harness we are building above coding agents like Claude Code and Codex, organized around five pillars: context, context graph, restraint, empowerment, and visual interface.
What we built
A harness is everything an AI model has access to that helps it produce a useful response to a prompt or series of prompts.
Claude Code and Codex are themselves agent harnesses. Each one wraps a frontier model with a system prompt, a tool set, a permission system, and an execution loop.
We have been building a second harness on top of that one. It is the workspace where coding agents do product work alongside files, tasks, diagrams, diffs, and people. We organize that work around five pillars: context, context graph, restraint, empowerment, and visual interface. The rest of this post describes what we built in each.
1. Context
Context covers everything specific to a project: code, specs, design docs, tracker items, data models, past decisions, repo conventions, examples, and recipes.
We built:
- Loaders for root instruction files (
CLAUDE.md,AGENTS.md) that an agent reads at the start of every session. - Path-scoped rule files that load only when the agent touches a specific directory.
- A skill system for reusable, scoped instructions.
- An import system so context files compose without duplication.
Each session starts with the team’s accumulated decisions already in scope, rather than re-derived from the prompt.
2. Context graph
We were tired of having a pile of tabs: tickets in one tool, planning docs in another, design diagrams somewhere else, diffs in the IDE, and working sessions in Claude Code or Codex. We were tired of keeping the connections between those artifacts in our heads. Our agents couldn’t traverse relationships that were never recorded.
We built:
- A persistent, typed link graph between artifacts: tracker item, plan, spec, diagram, mockup, session, diff, files, commit, decision.
- First-class editors for those artifacts inside the workspace, so links resolve to actual content.
- An MCP surface so any agent (Claude Code, Codex, OpenCode) can traverse the graph during a session.
An agent prompt can pull in the related plan, prior session, design diagram, and affected files in a single traversal.
3. Restraint
Restraint covers hard rules, approval boundaries, permission scopes, tool allowlists, and an audit trail. Guardrails that keep an agent from doing the wrong thing quickly.
We built:
- Per-tool permission scopes and allowlists.
- Approval flows for actions that touch shared state (push to main, drop a table, hit a paid API, run a destructive shell command).
- A durable audit trail of every approval, tool call, and file change.
- Path-scoped rules that block agents from editing specific files or directories.
A capable agent without restraint will eventually push to main, drop a table, or burn through a paid API. The restraint pillar is what makes capability safe to leave running.
4. Empowerment
Empowerment covers tools that touch live state: reading the log file, querying the running database, driving the UI and taking a screenshot to check the result, running an end-to-end test loop until the agent’s code passes.
We built:
- MCP tools that read live application logs.
- MCP tools that query the running database.
- A Playwright-driven UI loop so an agent can interact with a running app, take a screenshot, and verify the result.
- An extension SDK so teams can write their own MCP tools and ship them inside the workspace.
An agent that can verify its own output closes the loop without a human in the middle of every step.
5. Visual interface
Visual work is part of the input to the agent and therefore part of the harness: markdown review and editing, UI mockups, architecture diagrams, data models, red and green diffs of a proposed change, screenshots of a running app, a canvas to sketch against.
We built:
- A markdown editor with red and green AI diff visualization.
- Mockup, diagram, and data model editors as first-class file types.
- Diff review across every file the agent touched in a session.
- Image, screenshot, and diagram inputs that agents can read and produce directly.
The visual surface and the agent’s working surface are the same surface. An agent can render a mockup, take a screenshot, look at the screenshot, and decide whether what it built matches the request.
A harness in action
Here is what the five pillars look like filled in for a single concrete prompt:
Results
What we observed once these five pillars were in place in our own workflow:
- Sessions resume from prior context without re-prompting. The agent picks up from where the last session left off.
- A single prompt pulls in the linked plan, prior session, spec, and affected files through one traversal of the graph.
- Permission scopes and the audit trail let us leave agents running on multi-step work and review the changes after the fact, instead of approving each step.
- Agents verify their own UI and backend changes through screenshots, log queries, and end-to-end test loops before asking for review.
- We switch the same task between Claude Code and Codex by changing a setting. The rules, tools, and graph stay the same.
Investment
We treat our harness as a product we ship to ourselves. A meaningful share of the time and tokens we spend on AI work goes into improving the harness itself, beyond consuming completions: writing better rules, building better MCP tools, recording better decisions, and tightening the verification loop.
Every release cycle includes at least one improvement to one of the five pillars. We log decisions as they happen so future sessions can read them, rather than re-deriving the same answer. The investment compounds. Every rule, tool, and link makes every future session a little better.
Ownership
We want to own our harness. Every part of it (instructions, rules, tool definitions, links between work items, audit logs, skills) lives in files on our own machines, in formats we control. We can read them, edit them, version them, point any agent at them, and take them with us if we change tools.
That posture shows up in how we built Nimbalyst itself: MIT-licensed desktop app, open file formats, data on the local filesystem, no required server.
Portability across coding agents
We built the harness so it runs across coding agents. Same files, same rules, same tools, same graph underneath. A team can point a session at Claude Code, Codex, or whatever lands next without rebuilding the workflow above it.
Open source
Nimbalyst is the open-source harness we are building across all five pillars. The visual interface, context graph, empowerment tools, cross-model CLAUDE.md, and skills are in the open. Study how it is wired, copy what is useful, or run it as a workspace.
Related posts
-
The Best Agent Harness for Claude Code and Codex
A practitioner's guide to building an agent harness for Claude Code and Codex in 2026, what a harness actually is, and how to pick one that survives model churn.
-
Integrate 80% of everything for agent and human context
To ship one feature with a coding agent, most teams touch seven tools. Each owns a slice. The connections live in human heads. Why context has to be a graph, why the editors have to be in the same place, and why building the 80% of those products is the cheapest path to it.
-
Best Tools for Agentic Coding in 2026
A practitioner's tour of the agentic coding tool landscape in 2026, covering terminal agents, IDE agents, workspace surfaces, and the gaps that still need closing.