preloader
post-thumb

Last Update: June 13, 2026


BYauthor-thumberic

|Loading...

Keywords

Put OpenAI's Codex, Pi from pi.dev, and Nous Research's Hermes Agent side by side and the first reaction is: why are these three even in the same article? One is a commercial product from the biggest AI lab in the world. One is a minimalist open-source toolkit largely written by a single developer. One is a sprawling Python personal assistant that lives in your Telegram.

That is exactly why they belong together. We recently spent some time reading the source code of Pi and Hermes (both are open source — we cloned and analysed them properly, not just the READMEs) and the public documentation of Codex (whose CLI is open, but whose brains are not). The three of them are so different that the differences themselves become a map. If you understand why these three agents are built the way they are, you understand most of the AI agent landscape — what each type of agent is, what it can do, and which one can actually help you.

First, what is an "agent" anyway?

Strip away the hype and every AI agent is the same machine: a loop that sends context to a language model, reads back text and tool calls, executes those tools (run a command, edit a file, search the web), feeds the results back in, and repeats until the job is done.

The model supplies the reasoning. Everything else — what the model sees, what it is allowed to touch, what survives between sessions, what happens when things go wrong — is supplied by the software wrapped around it. That wrapper is called the harness, and the three agents in this article are, more than anything, three very different opinions about what the harness should be.

Codex: the vertically-integrated product

Codex is what you get when a frontier lab builds the whole stack: proprietary models (the GPT-5.x-Codex family) co-designed with the harness, delivered through every surface you might work in — terminal CLI, IDE extensions, a desktop app, mobile, a cloud service that runs tasks in parallel containers, and a GitHub bot that reviews your pull requests when you tag @codex.

Its strengths come from that integration:

  • Raw coding capability. The Codex models lead or sit near the top of agentic coding benchmarks (SWE-bench Verified, Terminal-Bench), and they are tuned for long autonomous runs — hundreds of tool calls without a human nudge.
  • A real security model. Codex is the only one of the three that ships OS-enforced sandboxing by default: Seatbelt on macOS, bubblewrap on Linux, network access denied unless you opt in. There is a clean separation between the sandbox (a technical boundary) and the approval policy (when to ask a human). This is what a production-grade safety story looks like.
  • Token efficiency. Community comparisons consistently report it doing comparable work with two to three times fewer tokens than its rivals — which matters when you pay by subscription tier.

The trade-off is just as clear: the models are closed, deprecations are forced on OpenAI's schedule, the best models are sometimes gated behind expensive tiers, and the extension ecosystem (hooks, skills, plugins arrived only in 2026) is young. You are renting a very good employee from a company that can change the terms at any time.

Codex is the answer to: "I want the strongest coding agent today and I don't need to own it."

Pi: the harness as a library

Pi (from pi.dev) is the opposite philosophy taken seriously. It ships zero models. It is a TypeScript monorepo of four packages — a unified LLM API over roughly forty providers, a tiny agent runtime (~2,700 lines), the coding-agent CLI, and a terminal UI framework — and its core design rule is to refuse features. No MCP. No built-in sub-agents. No permission popups. No plan mode. No to-do lists. The author has written publicly about why each of these is excluded; everything on that list can be added back as a TypeScript extension in about a hundred lines.

Reading Pi's source is the best education in agent engineering we have found anywhere:

  • Sessions are an append-only tree. Every entry in the session file has an id and a parentId, so branching, forking, and time-travelling through a conversation cost nothing — no database, no file rewrites.
  • Compaction is engineered, not hand-waved. When the context window fills, older history is summarised into a structured format (goal, progress, decisions, next steps) with cumulative lists of every file read and modified — so the agent never forgets what it has touched, no matter how many times the history is squashed.
  • The provider layer absorbs chaos. Thinking levels, tool-call ID formats, cache semantics and reasoning quirks of ~30 providers are normalised behind one streaming interface, so you can switch models mid-conversation without breaking anything.

The trade-offs are deliberate: there is no sandbox at all (you are expected to containerise it yourself), and the project carries a bus factor of roughly one. It is a power tool, with everything that implies.

Pi is the answer to: "I want to own my agent, choose my models, and understand every moving part — or build my own agent on proven foundations." (Not a hypothetical: OpenClaw, one of the most popular open personal assistants, is built on Pi's packages.)

Hermes: the agent that lives with you

Hermes Agent, by Nous Research, is not really a coding agent at all — and that is the point. It is a persistent personal assistant: a Python daemon that connects to more than twenty messaging platforms (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, WeChat...), executes work on any of six backends (your machine, Docker, SSH, or serverless clouds like Modal where an idle agent hibernates for fractions of a cent), runs scheduled jobs while you sleep, and — its signature feature — improves itself.

Hermes closes a loop the other two don't have:

  • It writes its own skills (procedural knowledge captured as markdown files) from experience.
  • A background curator periodically reviews those skills — consolidating, patching, archiving — like a librarian maintaining the agent's own operations manual.
  • A pluggable memory system builds a model of who you are across sessions, and full-text search over every past conversation means it can ask itself "have I solved this before?"

Where Codex and Pi start every session from zero (plus whatever instructions you hand them), Hermes compounds. The costs are equally visible in the code: a huge attack surface (twenty-three platform adapters all parsing untrusted input), a single SQLite file as the global state store, and general-purpose breadth instead of coding depth — no IDE integration, no native git-worktree workflow. Its SECURITY.md is refreshingly honest: OS-level isolation is the only real boundary; everything else is best-effort.

Hermes is the answer to: "I want an assistant that is always on, reachable from my chat apps, and gets better the longer it works for me."

The same machine, three species

Codex
Pi
Hermes
What it is
Commercial coding agent
Open agent toolkit
Persistent personal assistant
Lives in
Terminal, IDE, cloud, GitHub
Your terminal
Your chat apps, your servers
Models
OpenAI's, closed
Any of ~40 providers
Any OpenAI-compatible provider
Memory between sessions
Preview feature
None (by design)
Its defining feature
Safety
OS sandbox, default on
None — bring your own container
OS isolation via exec backends
Extensibility
Config, skills, MCP, plugins
Full TypeScript runtime
Skills, plugins, MCP server
Biggest risk
Vendor lock-in
Bus factor of one
Security surface
Best at
Hard coding tasks, today
Being understood and owned
Compounding over months

Three lessons fall out of this comparison, and they generalise well beyond these three projects.

1. The model is a commodity; the harness is the product. All three can call the same class of frontier models. What you are actually choosing between is harness decisions: how context is managed, what tools look like, what persists, what is sandboxed. Nearly every agent failure you will ever observe — the agent "forgetting" the goal, going off the rails late in a session, missing the error that contained the answer — is a harness failure, not a model failure.

2. "Agent" is not one product category. A coding agent optimises for depth on a workstation: tight tool loops, repository awareness, review workflows. An assistant agent optimises for breadth and time: reachability, scheduling, memory, identity. The harness decisions that make one excellent make the other worse. Asking "which agent is best?" without specifying the species is like asking whether a truck is better than a motorbike.

3. Memory and skills are where the next differentiation happens. Codex represents peak per-session capability; Hermes represents the bet that across-session learning matters more. A skill — a piece of captured procedural knowledge the agent can load on demand — lets a mid-tier model with the right manual outperform a frontier model improvising. The agent that remembers your environment, your conventions, and its own past mistakes eventually beats the smarter agent with amnesia.

So which one helps you?

  • You write code for a living and want results now: Codex (or its direct competitors). Bundled with a ChatGPT subscription, sandboxed by default, strongest models.
  • You are a tinkerer, you care about model choice and data ownership, or you are building your own agent: Pi. Read its source even if you never run it — the session-tree and compaction designs alone are worth the afternoon.
  • You want a personal assistant rather than a pair programmer — something that triages, schedules, researches, and remembers: Hermes, or its TypeScript cousin OpenClaw (which, fittingly, is built on Pi).

And if you are building agents rather than just using them, the real takeaway is to read Pi and Hermes side by side: one shows you how to build the engine, the other shows you how to build the life around it. The whole field is currently busy discovering that both halves matter.

This post is the first in a series from our research into AI agent architecture. Next up: a deep dive into what an agent harness actually does, and why it — not the model — is where the engineering lives.

Comments (0)

Leave a Comment
Your email won't be published. We'll only use it to notify you of replies to your comment.
Loading comments...
Previous Article
post-thumb

Jun 13, 2026

What an AI Agent Harness Actually Does

We read the source code of six AI agent harnesses — Pi, Hermes, OpenCode, Kimi Code, OpenAI's Codex CLI and Google's Gemini CLI. This is what the harness really does for the model, why almost every agent failure is a harness failure, and why the engineering lives there, not in the model.

Next Article
post-thumb

Jun 12, 2026

Tynion v2: See What Every Prompt Term Looks Like

Tynion, our text-to-image prompt helper, just got its biggest update yet: nearly 1,000 prompt terms, and every one of them now explains itself. Hover over any term — Golden Hour, Rule of Thirds, Contrapposto — and a card pops up with a plain-English definition and a real example image showing exactly what that term does to a picture.

agico

We transform visions into reality. We specializes in crafting digital experiences that captivate, engage, and innovate. With a fusion of creativity and expertise, we bring your ideas to life, one pixel at a time. Let's build the future together.

Copyright ©  2026  TYO Lab