Human Approvals for Agents: Where to Put the Breakpoints
The approval problem in agent systems is usually phrased backwards.
Teams ask, “Where should we require a human?” That sounds reasonable, but it leads to blanket approvals, vague prompts, and people clicking through the same confirmation over and over. The better question is, “Where does human judgment change the outcome more than the model already can?”
OpenAI’s agent safety docs keep the answer simple: keep tool approvals on when tools can act, and treat human approval as part of the safety design, not a visual flourish.1 OWASP’s LLM Top 10 names the risk pattern too, under excessive agency.2 And NIST’s AI RMF treats human oversight as something you define, assess, and document, which is the right bar for production systems.3
Approvals are not there to make the product feel cautious. They are there to stop a specific class of mistake from becoming a side effect.
TL;DR
- Approve side effects, not routine reads.
- Tie each approval to one object, one payload, and one time window.
- Show the exact diff or action, not a paraphrase.
- Re-check the approval if the underlying state changes before commit.
- Record the approver, the reviewed payload, and the final executed payload.
The breakpoint is the product
If the breakpoint is wrong, the whole control collapses.
Ask for approval too early, and the user learns to ignore it. Ask too late, and the system has already changed the world. Ask in vague language, and the approval becomes theater.
The useful breakpoint is where the human sees something the model cannot guarantee on its own: intent, policy, or consent.
That is why approval design should follow the action graph, not the UI graph.
Three breakpoints that actually matter
1. Before irreversible side effects
If the next step will create, modify, delete, send, charge, or publish something real, that is the default approval boundary.
Examples:
- sending an external email
- opening or closing a support ticket
- changing a customer record
- provisioning infrastructure
- issuing a refund
This is the cleanest breakpoint because the effect is legible.
2. When the agent crosses a trust boundary
Some actions are dangerous not because they are irreversible, but because they cross from one authority domain to another.
Examples:
- moving from read-only lookup to write access
- using broader credentials than the session started with
- passing from internal data to an external system
- turning retrieved text into an outbound action
If a system crosses trust boundaries without a human noticing, the approval was too late or too narrow.
3. When uncertainty becomes operational
Agents are often uncertain in ways that sound confident.
If the model is choosing between ambiguous targets, incomplete fields, or multiple plausible actions, the human should usually correct the ambiguity, not merely approve the attempt.
Examples:
- two customers share the same name
- the deployment target is not unique
- a required field was inferred rather than supplied
In those cases, the best approval is often a clarification.
What a useful approval payload contains
The approval surface should show the user enough to make a real decision.
| Field | Why it belongs on the card |
|---|---|
| Action | Tells the human what will happen |
| Target | Makes the object or account explicit |
| Diff or payload | Shows the exact change |
| Side effects | Surfaces what else may happen |
| Scope | Tells the human what authority is being used |
| Expiry | Prevents blanket session approval |
| Idempotency key | Ties the approval to one replayable operation |
If the UI cannot show the exact action, the system does not have a reviewable approval. It has a hope.
Concrete example:
Send
[email protected]invoice reminder using templatelate-payment-v2and attach invoiceINV-2041.
That is better than “Proceed?” because it tells the human what is about to leave the system.
Two-phase execution makes approvals real
The cleanest pattern is prepare then commit.
prepare_*returns a machine-readable summary of the intended effect.- The human reviews that summary.
commit_*executes against the same payload hash or confirmation token.
This matters because an approval is only meaningful if the action that executed is the same one that was reviewed.
If the state changes between prepare and commit, re-run the prepare step. That is the difference between a control and a checkbox.
Where approvals go wrong
The failure modes are consistent:
- approving a whole session instead of one action
- approving read-only lookups that do not need it
- approving after the tool call already happened
- approving a paraphrase when the exact payload matters
- approving without an audit record
Those bugs do not just annoy users. They teach the system to treat approvals as noise.
Another one to avoid: letting a human approve a result without seeing the source data that drove it. If the agent says “I found the right record,” the reviewer needs the record id or the diff, not just the model’s confidence.
Treat approvals as audit events
Every approval should emit an auditable event with:
- approver id
- time
- action type
- target id
- reviewed payload hash
- expiry
- decision
- whether the executed payload matched the reviewed payload
That last field catches drift. If the system commits something different from what was approved, the event stream should show it immediately.
NIST’s AI RMF is useful here because it frames oversight as a defined process, not a one-off interaction.3
What I would ship first
For a new agent product, I would start with approvals on exactly four things:
- external writes
- outbound communication
- access to broader credentials
- ambiguity that cannot be resolved automatically
Everything else can be tightened later. Those four are where the expensive mistakes usually live.
What not to do
Do not require approval for every tool call just because the system “feels agentic”. That makes users click through until the approval loses meaning.
Do not rely on the model to summarize its own risky action and then treat that summary as the thing the human approved. The human should review the actual action, not the model’s paraphrase.
And do not hide the approval in a separate admin log that the operator will never see. If it matters enough to block action, it matters enough to show in the workflow.
The point
Good approval design slows down the right thing at the right moment and nothing else.
That is the standard. Not “be safe”. Not “ask a lot”. Be precise about where consent and authority diverge, then make that divergence visible in the UI and in the trace.
Footnotes
-
OpenAI, “Safety in building agents”: https://platform.openai.com/docs/guides/agent-builder-safety ↩
-
OWASP Top 10 for Large Language Model Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/ ↩
-
NIST AI RMF Core: https://airc.nist.gov/airmf-resources/airmf/5-sec-core/ ↩ ↩2