Theme
OpenAI Codex CLI vs Claude Code: A Practical Harness Comparison for Real Repos

OpenAI Codex CLI vs Claude Code: A Practical Harness Comparison for Real Repos

11 min read

Terminal coding agents are suddenly everywhere, and they all promise the same thing: “Point me at a repo and I will fix it.”

I have been using these tools on real repos: the ones with half-finished refactors, flaky tests, questionable scripts, and a README that lies to you. In that world, the model matters, but the harness matters more.

By harness I mean the boring, high-leverage details around the model: how the tool reads code, runs commands, asks for confirmation, handles secrets, and integrates with your workflow. Those choices determine whether you can trust it on a real codebase, not a demo project.

This post compares OpenAI Codex CLI and Claude Code through that lens. Claude Code is not open-source in the same way as Codex CLI (and its GitHub repo does not currently list an open-source license), so where I cannot verify internals I say so and stick to what is documented. (I am writing this in January 2026.) https://github.com/anthropics/claude-code

TL;DR (the take people actually want)

What each tool is (one paragraph each)

Codex CLI is OpenAI’s open-source terminal agent for coding, built for an agentic workflow with tool calls, diffs, and configurable approval modes. It supports MCP for extending context and tool access. Source and docs: https://github.com/openai/codex and https://developers.openai.com/codex/cli.

Claude Code is Anthropic’s coding agent experience that runs in your terminal and IDE, optimized for navigating and editing repos with human approvals. It includes an integrations story and a GitHub workflow. Docs and repo: https://www.anthropic.com/claude-code and https://github.com/anthropics/claude-code.

What the harness decides for you

When these tools feel “good”, it is usually because the harness is doing a bunch of invisible work for you. When they feel dangerous, it is also the harness.

In my day-to-day use, these are the decisions that matter most:

  1. What it can execute and when it asks me first.
  2. What it can read and how it chooses what to ingest.
  3. What I can extend without gluing together fragile scripts.
  4. What I can audit after the fact when something looks off.

Codex CLI vs Claude Code: the practical comparison

1) Sandboxing and permissions

The first time an agent offers to run a script, you realize why sandboxing is not optional.

Codex CLI is explicit about sandboxing and approvals as first-class concepts. It has configurable approval modes and can run with more or less autonomy depending on your risk tolerance. (See the CLI docs: https://developers.openai.com/codex/cli.)

Claude Code also emphasizes approvals and safety for code execution and edits, but because parts of the runtime are not as transparent, I always validate how it behaves in my environment (what is executed automatically, what is proposed, what is gated). Start with Anthropic’s docs and confirm with a local dry run. (Overview: https://www.anthropic.com/claude-code.)

My take: I want a hard, inspectable boundary between “read” and “execute”. I am more likely to trust the tool that makes those knobs obvious, and keeps the paper trail.

2) Approval model: when does it ask you first?

This is where similar tools feel wildly different in practice.

Codex CLI has multiple approval modes, including modes that require confirmation before shell commands and edits. This is useful when you want a tight loop: you can keep the agent moving while still reviewing any risky steps. (Docs: https://developers.openai.com/codex/cli.)

Claude Code’s UX is built around review and iteration. The most important question is whether it is consistent about prompting before risky actions (especially command execution and dependency installation). Anthropic positions it as a coding agent with human-in-the-loop control. (Docs: https://www.anthropic.com/claude-code.)

What I do:

  • I start in the most restrictive mode, even if it is annoying.
  • I keep a tiny allowlist of commands I can approve without thinking.
  • I treat installs, scripts, and codegen as untrusted until I read them.

3) Context and repo navigation

Context loading is where accidental leakage happens, and it is also where “smart” agents quietly get scary.

Codex CLI exposes an extensibility layer via MCP, which is a practical way to add “safe context” without dumping internal systems into the prompt. You can put a narrow server in front of your sensitive stuff and log what was requested. (Codex CLI docs mention MCP support: https://developers.openai.com/codex/cli.)

Claude Code also supports integrations and extensions, but exact mechanics vary by setup. I treat it as “capable by default”, then explicitly constrain it to the sources I want it to use. (See Anthropic’s Claude Code docs and repo: https://www.anthropic.com/claude-code and https://github.com/anthropics/claude-code.)

My take: the best harness is the one that makes it easy to give the agent less power, not more.

4) Extensibility: MCP vs plugins

Codex CLI leans into MCP. If your team already has internal search, CI status, or runbooks, MCP is a clean interface for exposing those capabilities as tools. That is how you get an agent that behaves like “your engineer”, not “a smart autocomplete”. (MCP is documented in the Codex CLI docs: https://developers.openai.com/codex/cli.)

Claude Code emphasizes integrations and a workflow that includes IDE usage and GitHub. If your process is already built around PR review, code owners, and CI gates, that can be a strong fit. (Anthropic overview: https://www.anthropic.com/claude-code.)

My take: once you use agents at team scale, extensibility is not a nice-to-have. It is how you keep the agent inside your process instead of letting it invent one.

5) Instruction file conventions: AGENTS.md vs CLAUDE.md

This sounds petty until you are operating at team scale.

Codex (and other agent workflows) commonly use an AGENTS.md file in the repo to tell the agent how to behave, what to run, and what not to touch. OpenAI even uses it in the Codex repo itself. https://github.com/openai/codex/blob/main/AGENTS.md

Anthropic has pushed a CLAUDE.md convention for “project instructions” in the Claude Code ecosystem. Some developers have asked for AGENTS.md support directly in the Claude Code repo. https://github.com/anthropics/claude-code/issues/722

My take: I do not care which filename wins long-term. I care that it is one more place where the ecosystem splits into incompatible conventions, and teams pay the tax.

6) Observability and audit trails

If you cannot answer “what changed and why” in five minutes, you cannot use this at scale.

For Codex CLI, auditability benefits from being open-source and oriented around explicit tool calls, diffs, and approvals.

For Claude Code, treat the agent session as a work artifact: keep changes in small diffs, require PRs, and make sure the tool history is preserved in your workflow.

7) Open-source and transparency (what can you actually inspect?)

Codex CLI is Apache-2.0 licensed and the harness is in the open. When I hit weird behavior, I can follow the implementation and decide whether to work around it, fix it, or just not use that capability. https://github.com/openai/codex

Claude Code has a public repo, but as of writing it does not present itself as open-source with a standard license identifier. In practice, I treat the product behavior as “documented and observed”, not “auditable end-to-end”. https://github.com/anthropics/claude-code

8) Billing and credential friction (this matters more than you think)

This is the awkward part of the “agent as a product” era: the harness is great, but the credentialing and plan rules can change what is feasible.

Anthropic documents how Claude Code works with Pro/Max plans. https://support.claude.com/en/articles/11278770-using-claude-code-with-a-pro-max-plan

Developers have also reported hitting errors when trying to use Claude credentials outside Claude Code, with messages like “This credential is only authorized for use with Claude Code.” https://github.com/anthropics/claude-code/issues/8042

My take: I understand why a vendor would tie credentials to an official client. It is within their rights (and likely within their terms), and it reduces abuse. But as a user, it feels like clamping down on flexibility in a way that burns goodwill. It also nudges teams toward vendor-specific workflows, even when they want to standardize.

9) Cost and throughput control

Neither tool magically fixes cost. The harness can help, but only if you decide what you optimize for.

In my own use, I try to keep agents pointed at the work that is expensive for humans and cheap for machines: design docs, refactor plans, test-writing, and the annoying glue code that I would otherwise procrastinate. When the task turns into “touch 40 files”, I cap the budget and insist on tests.

How I compare them in practice

I keep it simple: I run both tools on the same kinds of tasks I actually ship, and I pay attention to the moments where I have to stop and think, “Do I trust this next step?”

If a tool makes it easy to stay in control, explain what it is doing, and recover when it goes sideways, I keep using it. If I find myself babysitting it, I stop.

Where OpenCode fits (and why it matters)

If you want an open-source terminal agent that is not backed by a single vendor’s product strategy, you will keep running into projects like OpenCode. It is an open-source alternative in the same “CLI agent” category, and it is useful as a reference point even if you do not adopt it. https://github.com/opencode-ai/opencode

My take: the ecosystem gets healthier when there are credible open options. It forces the commercial tools to stay honest.

The call

If you want my honest answer: the best tool is the one whose harness matches your constraints, but I still pick a favorite.

Claude Code is the safe bet, and if your priority is a polished workflow with minimal configuration, it is a reasonable default. https://www.anthropic.com/claude-code

Codex is the one I am more excited to bet on, because the harness is open and the community support is real. When I run into an edge case, I can usually find context in public issues, and the maintainers are present in the same place the work happens. That matters more than any marketing page. https://github.com/openai/codex

The wrong move is picking based on model hype. Pick based on the boundaries, the review flow, and whether you can safely integrate it into your engineering system.

My verdict (why I am favoring Codex right now)

Claude Code is well established and a safe bet. If you want to onboard a team and minimize experimentation, I get why it is the default choice. https://www.anthropic.com/claude-code

But in my own use, I keep getting pulled toward Codex for one simple reason: when I am stuck, I can see daylight.

Codex is shipped in the open, and the whole harness lives in a public repo. That means the community can actually rally around it, and you are not waiting on a black box to change. More practically, it means support shows up where I already work: in issues, PRs, and docs updates. https://github.com/openai/codex

I also like that the people building Codex are visible. I have found it easier to get signal, context, and direction from the Codex repo than from closed or partially closed tooling ecosystems. That is a real advantage when you are betting on a tool for production work. https://github.com/openai/codex

Codex also gives me a cleaner path to making the agent behave like our engineering org, not like the vendor’s product. MCP is a big part of that. https://developers.openai.com/codex/cli

On the flip side, I do not love the direction of vendor lock-in signals in the Claude Code ecosystem. The CLAUDE.md convention and the credential restrictions are defensible choices, but they add friction and burn goodwill for people trying to standardize. https://github.com/anthropics/claude-code/issues/722 and https://github.com/anthropics/claude-code/issues/8042

Sources