Five pillars of the agent harness above Claude Code and Codex

A progress report on the agent harness we are building above coding agents like Claude Code and Codex, organized around five pillars: context, context graph, restraint, empowerment, and visual interface.

Karl Wirth ·

What we built

A harness is everything an AI model has access to that helps it produce a useful response to a prompt or series of prompts.

Claude Code and Codex are themselves agent harnesses. Each one wraps a frontier model with a system prompt, a tool set, a permission system, and an execution loop.

We have been building a second harness on top of that one. It is the workspace where coding agents do product work alongside files, tasks, diagrams, diffs, and people. We organize that work around five pillars: context, context graph, restraint, empowerment, and visual interface. The rest of this post describes what we built in each.

A second harness above Claude Code and Codex: prompt, five-pillar harness, agent harness, and the underlying model

1. Context

Context covers everything specific to a project: code, specs, design docs, tracker items, data models, past decisions, repo conventions, examples, and recipes.

We built:

  • Loaders for root instruction files (CLAUDE.md, AGENTS.md) that an agent reads at the start of every session.
  • Path-scoped rule files that load only when the agent touches a specific directory.
  • A skill system for reusable, scoped instructions.
  • An import system so context files compose without duplication.

Each session starts with the team’s accumulated decisions already in scope, rather than re-derived from the prompt.

Context pillar: CLAUDE.md, rule files, examples, data model, skills

2. Context graph

We were tired of having a pile of tabs: tickets in one tool, planning docs in another, design diagrams somewhere else, diffs in the IDE, and working sessions in Claude Code or Codex. We were tired of keeping the connections between those artifacts in our heads. Our agents couldn’t traverse relationships that were never recorded.

We built:

  • A persistent, typed link graph between artifacts: tracker item, plan, spec, diagram, mockup, session, diff, files, commit, decision.
  • First-class editors for those artifacts inside the workspace, so links resolve to actual content.
  • An MCP surface so any agent (Claude Code, Codex, OpenCode) can traverse the graph during a session.

An agent prompt can pull in the related plan, prior session, design diagram, and affected files in a single traversal.

Context graph pillar: tracker, sessions and files, commits and tasks, decision log, memory graph

3. Restraint

Restraint covers hard rules, approval boundaries, permission scopes, tool allowlists, and an audit trail. Guardrails that keep an agent from doing the wrong thing quickly.

We built:

  • Per-tool permission scopes and allowlists.
  • Approval flows for actions that touch shared state (push to main, drop a table, hit a paid API, run a destructive shell command).
  • A durable audit trail of every approval, tool call, and file change.
  • Path-scoped rules that block agents from editing specific files or directories.

A capable agent without restraint will eventually push to main, drop a table, or burn through a paid API. The restraint pillar is what makes capability safe to leave running.

Restraint pillar: path-scoped rules, hard rules, permission scopes, tool allowlists, audit trail

4. Empowerment

Empowerment covers tools that touch live state: reading the log file, querying the running database, driving the UI and taking a screenshot to check the result, running an end-to-end test loop until the agent’s code passes.

We built:

  • MCP tools that read live application logs.
  • MCP tools that query the running database.
  • A Playwright-driven UI loop so an agent can interact with a running app, take a screenshot, and verify the result.
  • An extension SDK so teams can write their own MCP tools and ship them inside the workspace.

An agent that can verify its own output closes the loop without a human in the middle of every step.

Empowerment pillar: logs and DB queries, UI and vision, Playwright loop, MCP tools, sandbox and shell

5. Visual interface

Visual work is part of the input to the agent and therefore part of the harness: markdown review and editing, UI mockups, architecture diagrams, data models, red and green diffs of a proposed change, screenshots of a running app, a canvas to sketch against.

We built:

  • A markdown editor with red and green AI diff visualization.
  • Mockup, diagram, and data model editors as first-class file types.
  • Diff review across every file the agent touched in a session.
  • Image, screenshot, and diagram inputs that agents can read and produce directly.

The visual surface and the agent’s working surface are the same surface. An agent can render a mockup, take a screenshot, look at the screenshot, and decide whether what it built matches the request.

Visual interface pillar: visual workspace, diffs and review, approvals, team handoffs, discussions

A harness in action

Here is what the five pillars look like filled in for a single concrete prompt:

A harness in action: a prompt, the harness pillars populated with project-specific entries, the agent harness, the model, and the resulting outcome

Results

What we observed once these five pillars were in place in our own workflow:

  • Sessions resume from prior context without re-prompting. The agent picks up from where the last session left off.
  • A single prompt pulls in the linked plan, prior session, spec, and affected files through one traversal of the graph.
  • Permission scopes and the audit trail let us leave agents running on multi-step work and review the changes after the fact, instead of approving each step.
  • Agents verify their own UI and backend changes through screenshots, log queries, and end-to-end test loops before asking for review.
  • We switch the same task between Claude Code and Codex by changing a setting. The rules, tools, and graph stay the same.

Investment

We treat our harness as a product we ship to ourselves. A meaningful share of the time and tokens we spend on AI work goes into improving the harness itself, beyond consuming completions: writing better rules, building better MCP tools, recording better decisions, and tightening the verification loop.

Every release cycle includes at least one improvement to one of the five pillars. We log decisions as they happen so future sessions can read them, rather than re-deriving the same answer. The investment compounds. Every rule, tool, and link makes every future session a little better.

Ownership

We want to own our harness. Every part of it (instructions, rules, tool definitions, links between work items, audit logs, skills) lives in files on our own machines, in formats we control. We can read them, edit them, version them, point any agent at them, and take them with us if we change tools.

That posture shows up in how we built Nimbalyst itself: MIT-licensed desktop app, open file formats, data on the local filesystem, no required server.

Portability across coding agents

We built the harness so it runs across coding agents. Same files, same rules, same tools, same graph underneath. A team can point a session at Claude Code, Codex, or whatever lands next without rebuilding the workflow above it.

Open source

Nimbalyst is the open-source harness we are building across all five pillars. The visual interface, context graph, empowerment tools, cross-model CLAUDE.md, and skills are in the open. Study how it is wired, copy what is useful, or run it as a workspace.