Theme
Shipping Safe Tooling: Schemas, Validation, and Failure Modes in Tool Calling

Shipping Safe Tooling: Schemas, Validation, and Failure Modes in Tool Calling

6 min read

The first time I watched a model call a tool in production, I felt the same kind of relief you feel when a flaky job finally goes green.

We had a workflow that used to require a human to copy data between systems. Then one day, it did not. The agent read the request, pulled the right record, updated the right fields, and posted a clean summary back to the user. A minute of work disappeared.

Two weeks later, that same tool path became the source of our first real incident.

Nothing “hacked” us. Nobody found a jailbreak. The model just did what models do: it made a confident decision based on an incomplete picture. It retried after a timeout. It guessed a missing parameter. It used the right tool with the wrong identity context. The end result was not catastrophic, but it was embarrassing and expensive. We had to unwind actions that looked legitimate on paper.

That is the reality of tool calling: once language turns into actions, you are no longer debating style or helpfulness. You are shipping a distributed system with an unreliable planner at the center.

This post is a guide to making that system safe enough to ship. Not perfect. Not “secure by prompt”. Safe enough that when the model is wrong, your product fails in a way you can live with.

Why tool calling changes the risk shape

Plain chat failures are reputational. Tool calling failures are operational.

If a chat assistant hallucinates, you get a wrong answer. If a tool-using agent hallucinates, you can get:

  • wrong state written to a system of record
  • duplicate actions (because retries are normal)
  • accidental data exposure (because authorization is subtle)
  • expensive cascades (because agents compound mistakes)

OWASP calls out the relevant buckets directly: prompt injection, insecure output handling, insecure plugin design, and excessive agency.1

If you want one phrase that captures why this is hard: tool calling turns your model into a confused deputy. It can be manipulated by inputs, and it can also simply misunderstand. Either way, it is still holding credentials.

A practical threat model for tool calling

Tool calling failures are not all “security” failures, but the mitigations often overlap. This table is how I frame it during design reviews:

Failure modeWhat it looks like in productionWhat usually fixes it
Prompt injection (direct or indirect)Tool arguments that reflect instructions found in a document, email, or web pageTreat tool outputs and retrieved content as untrusted, limit scope, require confirmations for side effects
Insecure output handlingYou execute model-generated commands, queries, or templates without validationStrict schemas, server-side validation, safe encoders, parameterization
Excessive agencyA model has permissions it does not need, and a mistake becomes a real actionLeast privilege, scoped tokens, separate read vs write tools, human approvals
Ambiguity and guessingModel fills in missing parameters because “it seems obvious”Make tools demand explicit fields, train the system to ask clarifying questions
Retry chaosTimeouts cause duplicate actions, out-of-order effects, stuck loopsIdempotency keys, timeouts, bounded retries, state machines
Tool result poisoningA tool returns untrusted text that the model treats as instructionsPut tool results in a “data-only” lane, strip/transform, and never feed raw tool output into “system” authority

The main idea is simple: your mitigations cannot live only in prompts. They need to exist at the tool contract layer, the authorization layer, and the execution layer.

Design tools like public APIs (because they are)

The quickest way to hurt yourself is to treat tools as internal helper functions that only your model will call.

In practice, any tool that can be triggered by language should be treated like a public API:

  • adversarial inputs exist, even if you do not want to think about them
  • retries happen
  • timeouts happen
  • partial failure happens
  • logs get audited later, by someone who was not in the room

That mindset changes the shape of your interfaces.

The schema is your first security boundary

Tool schemas are not just a convenience for parsing. They are a boundary. A good schema makes it hard for the model to “invent” dangerous flexibility.

This is the core pattern:

  1. constrain what the model can ask the tool to do
  2. validate that request as if it came from an untrusted client
  3. authorize it as the user, not as the model

Most tool APIs already use JSON Schema or something close to it. Anthropic’s tool use uses JSON Schema directly (input_schema).2 OpenAI’s Structured Outputs can enforce that a tool call matches the schema you provide when you opt into strict mode, but it does not make the values correct.3

That last clause matters. Schema adherence is not correctness. It is just a higher quality failure mode.

Schema patterns that reduce real incidents

When I look at tool calling incidents, the root cause is often “we gave the model too much room”. Here are patterns that reduce that room:

PatternDo thisAvoid this
Separate read vs write toolssearch_tickets, get_ticket, update_ticketOne tool that does everything with a mode string
Bound stringsmax length, allowed chars, known prefixes“Free-form text” for identifiers
Enumerate choicesenums for priority, status, currency, region“any string” for fields that drive state
Make units explicitamount_cents not amount“amount”: 20 (is it dollars, cents, euros?)
Require stable identifiersticket_id, customer_id, account_idNames that can collide
Reject unknown fieldsfail closedpermissive parsing that quietly drops junk

If you only take one recommendation from this post, take this: do not design schemas that encourage the model to smuggle intent in a text field.

A concrete tool schema example (language agnostic)

Here is what “less room” looks like as a tool contract. Imagine a support agent that can update a ticket.

The unsafe design is an update_ticket tool that takes a ticket_id and a freeform_update_text. That invites the model to blend intent, justification, policy, and state change into a single field.

The safer design forces the model to name the exact changes, and it gives you a place to validate invariants:

{
  "name": "update_ticket",
  "description": "Update specific fields on a ticket. Requires a stable id, explicit changes, and an idempotency key.",
  "input_schema": {
    "type": "object",
    "additionalProperties": false,
    "required": ["ticket_id", "patch", "idempotency_key"],
    "properties": {
      "ticket_id": { "type": "string", "pattern": "^TICK_[A-Z0-9]{6,}$" },
      "idempotency_key": { "type": "string", "minLength": 16, "maxLength": 128 },
      "patch": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
          "status": { "type": "string", "enum": ["open", "pending", "resolved"] },
          "priority": { "type": "string", "enum": ["p0", "p1", "p2", "p3"] },
          "assignee_id": { "type": "string" }
        }
      }
    }
  }
}

This does not guarantee the model will make the correct choices, but it does force it to be explicit. It also gives your server a clean place to enforce: only certain roles can set priority, only certain users can assign, and resolved requires a resolution note on the ticket itself.

Validation: do not trust the model even when it is “structured”

Even if you use strict structured outputs, your tool server should validate like it is speaking to the internet.

That means:

  • parse and validate arguments server-side
  • clamp sizes (strings, arrays, nested objects)
  • reject unknown fields
  • validate cross-field invariants
  • validate against current state (not just the payload)

Example: a transfer_funds tool that validates a JSON schema but does not validate “source account belongs to this user” is not safe. It is just neatly formatted.

OWASP’s “Insecure Output Handling” is basically this warning in a category label.1

Validation invariants that matter in practice

Schema validation catches shape errors. Production incidents often come from invariant errors:

  • cross-field rules: refund_amount must be <= captured_amount
  • state machine rules: you cannot go from resolved back to open without a reason
  • freshness rules: the update is based on stale state, so you need an ETag or version check
  • ownership rules: the target resource belongs to a different tenant

These checks belong on the tool server. If you push them into the prompt, you are turning policy into a suggestion.

Authorization: the model should not hold the keys

The most dangerous pattern I see is a tool executor that runs as a broad service account and relies on prompt instructions for policy.

If you do that, prompt injection becomes privilege escalation. So does ordinary model confusion.

Instead:

  • issue scoped, per-user tokens for tool calls where possible
  • enforce RBAC and tenant boundaries at the API layer
  • keep “admin” operations on a separate path with stronger approvals
  • default tools to read-only, then explicitly grant write scope

This is the principle of least privilege. It is not new. Tool calling just makes it easy to forget because the model looks like “your code”.

Budgets and circuit breakers: put a ceiling on agency

Even with perfect schemas, a model can still be wrong in a costly way. A safe system has ceilings.

The ceilings I reach for first:

  • max tool calls per run (and per tool)
  • max wall-clock time per run
  • max spend per run (tokens plus tool fees)
  • rate limits per user and per tenant
  • a kill switch that turns write tools off without redeploying

These are not just cost controls. They are safety controls. They turn “agent spiraled” into “agent stopped”.

Sandboxing: assume tools will be used in ways you did not intend

If any of your tools execute code, fetch URLs, open files, or touch customer data, you should assume they will be used in surprising ways. Sometimes that is adversarial. More often it is accidental, and the model is just creatively wrong.

My preference is to treat “powerful tools” like you would treat production cron: run them in constrained environments with a clear blast radius.

Examples of practical constraints:

  • Code execution tools: container sandbox, read-only filesystem by default, no network by default, strict time and memory limits.
  • Browser or fetch tools: domain allowlist, size limits, and a “download quarantine” step instead of direct ingestion.
  • Data tools: field-level redaction, row limits, and explicit query budgets.

You can still build impressive agents inside these constraints. You just stop trusting the model to enforce them.

Retries are normal. Make them safe with idempotency.

Agents retry. They retry because networks fail, providers rate limit, and long-running operations time out. If your tools are not designed for retries, you will eventually double-charge, double-email, double-delete, or double-create.

You can borrow a pattern from payments: idempotency keys.

Stripe’s write-up on idempotency is worth reading even if you never touch payments. It describes the fundamental distributed-systems problem: if you do not know whether a request succeeded, you will retry, and the server needs a way to treat retries as the same operation.4

For tool calling, the practical translation is:

  • Every side-effecting tool takes an idempotency_key.
  • The tool server stores the outcome keyed by that value.
  • Retries return the stored outcome instead of repeating the action.

This does not eliminate all complexity, but it turns “oops, we created five tickets” into “we returned the same ticket id five times”.

A retry policy table you can actually implement

Tool classExampleSafe to auto-retry?Requirements
Read-only, deterministicget_order_statusYesCache, timeouts, exponential backoff
Read-only, expensivesearch_logsYes, but boundedTight timeouts, pagination, budgets
Write with idempotencycreate_refundYesIdempotency key, audit log
Write without idempotencysend_emailNoEither make it idempotent or require a confirmation step
Irreversible destructivedelete_userNoTwo-phase commit, human confirmation, strong auth

If you hear yourself saying “the model will probably not retry that”, assume the opposite.

A useful idempotency key convention for agents

If you already have a run_id, a practical pattern is:

idempotency_key = run_id + ":" + tool_name + ":" + step_index

It is not cryptography. It is just a stable name for “this exact intended action”, which is what the server needs to deduplicate.

Two-phase commit for agents: plan, then execute

In finance, we learn early that irreversible actions should have an approval step. Tool calling needs the same shape.

The common implementation pattern is a two-phase tool:

  1. prepare_* returns a summary of intended effects and a short-lived confirmation_token.
  2. commit_* executes using that token.

This gives you a place to insert:

  • user confirmation in the UI
  • policy checks (rate limits, fraud signals, business rules)
  • a final sanity check (is the target still the same, is state still valid?)

It also makes incidents more recoverable. When the model confuses “cancel subscription” with “refund last invoice”, you want to catch that before it becomes a write.

An approvals table that helps teams align quickly

Teams often fight about where the approval line should be, because the model feels like it “should be able to handle it”.

This is a simple default that keeps you out of trouble:

Action typeExamplesSuggested default
Reversible, low impactadd an internal comment, tag a ticketauto-execute
Reversible, medium impactassign ownership, change priorityauto-execute with tight auth and audit logs
Financial or customer-impactingrefunds, cancellations, outbound emailstwo-phase commit with user confirmation
Destructive or irreversibledeletes, compliance exports, account banshuman approval plus stronger auth

You can move the line over time. Start conservative and earn autonomy with data.

Tool results are data, not instructions

One of the most common self-inflicted wounds is piping raw tool output back into the model with full authority.

Tool output is untrusted input. Treat it like you would treat HTML from the internet:

  • limit its size
  • strip or transform markup
  • avoid letting it control tool selection
  • keep it in a “data-only” lane

This is also where vendor features can help. Anthropic’s advanced tool use work includes controls like tool search and caller constraints, which are signs that tool ecosystems are moving toward explicit boundaries instead of “anything goes”.5

Prefer structured tool results over raw text

If you can choose, have tools return structured data and short summaries, not a giant blob of text.

Two reasons:

  1. It is easier to validate and redact.
  2. It is harder for “instruction-like” strings to sneak into the agent loop.

If you need to return long text, consider returning a handle and requiring a separate get_* tool with explicit size limits and pagination.

A note on tool libraries (what I actually reach for)

This post is language agnostic on purpose, but I will admit a bias here: for tool schemas, I prefer “types you can run”.

In TypeScript, that often means a runtime validator like Zod that you can use to validate tool arguments on the server. In Python, Pydantic plays a similar role. The specific library matters less than the habit: validate at the boundary, and keep the boundary tight.

Observability: treat tool calls like money movement

If you cannot answer “who did what, and why” after an incident, you will end up turning off the agent.

At minimum, log these fields for every tool call:

FieldWhy you need it
trace_id / run_idtie tool calls to a single agent run
user_id / tenant_idenforce and audit authorization boundaries
tool_name and versionschemas evolve, you need to know which contract applied
arguments (redacted)debug and postmortem analysis
idempotency keysafe retries and deduplication
outcomesuccess, error class, partial
latency and provider errorsretry behavior and cost control

This is the part that makes tooling feel like infrastructure work, because it is.

The uncomfortable truth: you cannot fully patch prompt injection in the model

I want to be careful here: we should absolutely keep improving model-level defenses. There are promising research directions like training models to resist injected instructions.6

But you cannot wait for a perfect model before shipping safe tools. You have to assume:

  • the model will misunderstand
  • the model will be tricked
  • the model will occasionally do the wrong thing with a valid-looking request

So you design the system to minimize harm when the model is wrong. That is what “safe tooling” means in practice.

Opinion: safe tool calling is a tax, and it is worth paying

Tool calling makes demos impressive, and it makes incidents interesting. The safe, boring middle is where you want to live.

If you treat tools as public APIs, keep the model inside tight contracts, and rely on server-side validation and authorization, you can ship tool-using agents without holding your breath. You will still have failures, but you will have failures you can explain, debug, and recover from.

And if you are building in a litigious or regulated environment, or you want enterprise customers, this is not optional. It is the price of turning language into actions.

Footnotes

  1. https://owasp.org/www-project-top-10-for-large-language-model-applications/ 2

  2. https://platform.claude.com/docs/en/docs/agents-and-tools/tool-use/implement-tool-use

  3. https://openai.com/index/introducing-structured-outputs-in-the-api/

  4. https://stripe.com/blog/idempotency

  5. https://www.anthropic.com/engineering/advanced-tool-use

  6. https://arxiv.org/abs/2410.05451