Theme
Human Approvals for Agents: Where to Put the Breakpoints

Human Approvals for Agents: Where to Put the Breakpoints

6 min read

The approval problem in agent systems is usually phrased backwards.

Teams ask, “Where should we require a human?” That sounds reasonable, but it leads to blanket approvals, vague prompts, and people clicking through the same confirmation over and over. The better question is, “Where does human judgment change the outcome more than the model already can?”

OpenAI’s agent safety docs keep the answer simple: keep tool approvals on when tools can act, and treat human approval as part of the safety design, not a visual flourish.1 OWASP’s LLM Top 10 names the risk pattern too, under excessive agency.2 And NIST’s AI RMF treats human oversight as something you define, assess, and document, which is the right bar for production systems.3

Approvals are not there to make the product feel cautious. They are there to stop a specific class of mistake from becoming a side effect.

TL;DR

  • Approve side effects, not routine reads.
  • Tie each approval to one object, one payload, and one time window.
  • Show the exact diff or action, not a paraphrase.
  • Re-check the approval if the underlying state changes before commit.
  • Record the approver, the reviewed payload, and the final executed payload.

The breakpoint is the product

If the breakpoint is wrong, the whole control collapses.

Ask for approval too early, and the user learns to ignore it. Ask too late, and the system has already changed the world. Ask in vague language, and the approval becomes theater.

The useful breakpoint is where the human sees something the model cannot guarantee on its own: intent, policy, or consent.

That is why approval design should follow the action graph, not the UI graph.

Three breakpoints that actually matter

1. Before irreversible side effects

If the next step will create, modify, delete, send, charge, or publish something real, that is the default approval boundary.

Examples:

  • sending an external email
  • opening or closing a support ticket
  • changing a customer record
  • provisioning infrastructure
  • issuing a refund

This is the cleanest breakpoint because the effect is legible.

2. When the agent crosses a trust boundary

Some actions are dangerous not because they are irreversible, but because they cross from one authority domain to another.

Examples:

  • moving from read-only lookup to write access
  • using broader credentials than the session started with
  • passing from internal data to an external system
  • turning retrieved text into an outbound action

If a system crosses trust boundaries without a human noticing, the approval was too late or too narrow.

3. When uncertainty becomes operational

Agents are often uncertain in ways that sound confident.

If the model is choosing between ambiguous targets, incomplete fields, or multiple plausible actions, the human should usually correct the ambiguity, not merely approve the attempt.

Examples:

  • two customers share the same name
  • the deployment target is not unique
  • a required field was inferred rather than supplied

In those cases, the best approval is often a clarification.

What a useful approval payload contains

The approval surface should show the user enough to make a real decision.

FieldWhy it belongs on the card
ActionTells the human what will happen
TargetMakes the object or account explicit
Diff or payloadShows the exact change
Side effectsSurfaces what else may happen
ScopeTells the human what authority is being used
ExpiryPrevents blanket session approval
Idempotency keyTies the approval to one replayable operation

If the UI cannot show the exact action, the system does not have a reviewable approval. It has a hope.

Concrete example:

Send [email protected] invoice reminder using template late-payment-v2 and attach invoice INV-2041.

That is better than “Proceed?” because it tells the human what is about to leave the system.

Two-phase execution makes approvals real

The cleanest pattern is prepare then commit.

  1. prepare_* returns a machine-readable summary of the intended effect.
  2. The human reviews that summary.
  3. commit_* executes against the same payload hash or confirmation token.

This matters because an approval is only meaningful if the action that executed is the same one that was reviewed.

If the state changes between prepare and commit, re-run the prepare step. That is the difference between a control and a checkbox.

Where approvals go wrong

The failure modes are consistent:

  • approving a whole session instead of one action
  • approving read-only lookups that do not need it
  • approving after the tool call already happened
  • approving a paraphrase when the exact payload matters
  • approving without an audit record

Those bugs do not just annoy users. They teach the system to treat approvals as noise.

Another one to avoid: letting a human approve a result without seeing the source data that drove it. If the agent says “I found the right record,” the reviewer needs the record id or the diff, not just the model’s confidence.

Treat approvals as audit events

Every approval should emit an auditable event with:

  • approver id
  • time
  • action type
  • target id
  • reviewed payload hash
  • expiry
  • decision
  • whether the executed payload matched the reviewed payload

That last field catches drift. If the system commits something different from what was approved, the event stream should show it immediately.

NIST’s AI RMF is useful here because it frames oversight as a defined process, not a one-off interaction.3

What I would ship first

For a new agent product, I would start with approvals on exactly four things:

  1. external writes
  2. outbound communication
  3. access to broader credentials
  4. ambiguity that cannot be resolved automatically

Everything else can be tightened later. Those four are where the expensive mistakes usually live.

What not to do

Do not require approval for every tool call just because the system “feels agentic”. That makes users click through until the approval loses meaning.

Do not rely on the model to summarize its own risky action and then treat that summary as the thing the human approved. The human should review the actual action, not the model’s paraphrase.

And do not hide the approval in a separate admin log that the operator will never see. If it matters enough to block action, it matters enough to show in the workflow.

The point

Good approval design slows down the right thing at the right moment and nothing else.

That is the standard. Not “be safe”. Not “ask a lot”. Be precise about where consent and authority diverge, then make that divergence visible in the UI and in the trace.

Footnotes

  1. OpenAI, “Safety in building agents”: https://platform.openai.com/docs/guides/agent-builder-safety

  2. OWASP Top 10 for Large Language Model Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/

  3. NIST AI RMF Core: https://airc.nist.gov/airmf-resources/airmf/5-sec-core/ 2