Agent infrastructure
Agents in production
An AI agent sandbox is the runtime boundary around tool execution. What process, identity, and capability isolation actually require in production agents.

When teams say they want an AI agent sandbox, they usually mean two different things. Some mean a safe scratch environment where a model can experiment without touching production. Others mean a hardened runtime boundary that holds every tool call to a defined set of permissions, even when the agent is in production and acting on real customer data. The two are not the same, and conflating them is how agent projects either ship something unsafe or fail to ship at all.
Pontil's view is straightforward: the useful definition of an agent sandbox is the second one. A sandbox is the runtime boundary around tool execution. It's the thing that decides what an agent can reach, as whom, and with what blast radius — and it has to hold in production, not just in a notebook.
This piece works through five things. What an agent sandbox actually is once you strip away the marketing. The three layers of isolation that matter (process, identity, capability). Where current approaches fall short. What agent runtime safety looks like as a set of concrete properties. And what to evaluate when you're picking or building one.
The term "sandbox" comes from two older traditions. In security, a sandbox is a constrained execution environment — think browser tabs, mobile app permissions, or seccomp filters on Linux. In software development, a sandbox is a non-production environment where you can break things without consequence. Both meanings have leaked into the agent conversation, and the leakage is causing problems.
When a foundation model provider talks about a "code interpreter sandbox," they mean a constrained execution environment — a container or VM the model can run Python in, isolated from the host. When a SaaS company talks about "sandboxing an agent against our dev tenant," they usually mean a separate dataset the agent can poke at. When a security team talks about "sandboxing agent tools," they mean something closer to the original security definition: a runtime boundary that holds across every tool call, in production, against real user data.
For the rest of this article, sandbox means the third thing. Not a dev environment. Not a Python container. The production runtime boundary that constrains what a tool call can do, on whose behalf, and with what side effects.
That boundary has to answer three questions on every call: who is this agent acting as, what is it allowed to reach, and what happens if it tries to reach something else. If any of those answers is "we trust the model to behave," there is no sandbox. There is a hope.
Agent execution isolation isn't one thing. It's three layers that have to hold simultaneously, and most discussions of sandboxing only address one of them.
The oldest and most familiar layer. The tool call runs in a container, a VM, a serverless function, or some other unit that can't read the memory of the calling process or the host. If the tool generates code, the code runs somewhere that can't reach the orchestrator's secrets or the foundation model provider's API keys. Process isolation prevents a compromised or hallucinating agent from exfiltrating credentials sideways.
This is the layer most "agent sandbox" products from foundation model providers solve. Anthropic's code execution tool runs in a sandboxed execution environment, as does OpenAI's code interpreter. Useful, well-understood, and not sufficient on its own.
The layer most projects get wrong. When an agent calls a tool — say, "update the deal stage in the CRM" — whose identity is that call made under? If it's a shared service account with broad permissions, every user of the agent effectively has the union of every other user's permissions. The CRM has no way to know which human caused the change. The audit log is useless. The blast radius of a prompt injection is the entire dataset.
Identity isolation means tool calls execute as the authenticated end user, with that user's real permissions. If the user can't see deal X in the UI, the agent can't read or modify deal X via a tool call. The identity propagates from the human, through the agent, into the tool runtime, into the downstream API.
This is hard. It requires real OAuth flows or equivalent delegated-access mechanics, per-user token storage, token refresh handling, and the discipline to never fall back to a service account when the user token expires. We've written more on the mechanics in OAuth for AI agents: a practical setup guide for delegated access.
The layer most projects don't even attempt. Even with the right user identity, an agent shouldn't have access to every tool the user theoretically could call. A customer support agent doesn't need write access to billing. A research agent doesn't need to send emails on the user's behalf. Capability isolation means the set of tools available to a given agent is explicitly scoped — by role, by task, by context — and enforced at the runtime, not at the prompt.
"Enforced at the prompt" is the failure mode here. Telling the model "do not call the delete_customer tool" is not isolation. It's a suggestion. A jailbreak, a prompt injection, or a plain model error will route around it. The tool simply has to not be in the call surface the runtime exposes for that agent's session.
All three layers have to hold. A sandbox with process isolation but no identity isolation lets a compromised prompt drain the database. Identity isolation without capability scoping lets a confused agent send the deal-close email when it was supposed to look up the deal stage. Capability scoping without process isolation lets a generated-code tool break out and read the orchestrator's environment variables. The combination is the sandbox. Any one alone is a partial answer.
The honest picture: most things shipping under the "AI agent sandbox" banner today address one layer well, gesture at a second, and ignore the third.
Foundation model sandboxes — the code-interpreter style — are excellent at what they do. They keep generated code from breaking the host. They were not designed to be the boundary around "agent calls CRM, ticketing, billing, and analytics on behalf of a real user." The auth model is wrong for that. The API key calling out belongs to the developer, not the user.
Agent frameworks like LangChain and the OpenAI Agents SDK ship with tool registries and some hooks for scoping. The registry is a real improvement over "the model can call anything in the prompt." But the identity story is usually "the framework holds the API key," which collapses identity isolation. And the capability scoping is at the registration level, not enforced per-session against the authenticated user's actual permissions.
Browser automation — driving the product through its UI rather than its API — gets identity right by accident, because the agent logs in as the user. It gets process isolation right per session. But there's no capability boundary at all (the agent can do whatever the UI exposes, including dangerous combinations), and it's silently fragile when the UI changes. We've covered this trade-off in more depth in browser automation vs API-native agent tooling.
The gap nobody fills cleanly is the production runtime for SaaS tool calls — where the agent has to act as a specific authenticated user, with that user's permissions, against a scoped set of tools, with audit, rate limiting, and observability that holds up to a security review.
If the sandbox is the production runtime boundary, then agent runtime safety is the set of properties that boundary has to deliver. Five of them matter most.
Execute as the authenticated user, every time. Every tool call carries a user identity that originated with a real human authentication event. The downstream API sees that user's token, that user's permissions, that user's audit trail. There is no service-account fallback. If the token expires and can't be refreshed, the call fails — it doesn't quietly elevate.
Expose only the tools scoped to this agent session. The set of callable tools is determined at session start by the agent's purpose and the user's role. Tools outside that set are not in the model's call surface. Not denied at runtime — not present at all. This is the difference between a closed door and a missing door.
Make every call observable and attributable. Each tool call produces a structured log entry: who, what, when, with what inputs, with what result. Not just for debugging — for the security review that will eventually ask "who modified this record on March 14." Without per-user attribution, the answer is "the agent did," which is not an answer.
Hold rate and resource limits at the runtime, not the prompt. A runaway loop should hit a wall at the boundary, not a request to please be reasonable. Per-user, per-tool, per-session limits with deterministic enforcement.
Fail closed and fail loud. When the runtime can't determine identity, can't refresh a token, or can't reach a downstream system, the call fails with a structured error the agent can reason about. Silent fallbacks to lower-privilege accounts, cached responses presented as fresh, or partial results without error signals are how trust dies in production.
These aren't novel. Each one has analogues in mature backend systems. The work is bringing all five together in a runtime that an agent calls, consistently, across every tool, across every product surface. For deeper background on why this is the layer that breaks production agents, see why AI agents fail in production.
When you're picking a sandbox approach — buying one, building one, or assessing what your current stack actually does — these are the questions worth asking. Concretely. With code paths, not slide decks.
Does a tool call propagate the end user's identity all the way to the downstream API, or does it terminate at a service account somewhere in the middle? Walk one trace. If the CRM's audit log shows "api-bot-prod" as the actor, the answer is no.
When a tool isn't in the agent's scoped capability set, is it absent from the model's call surface, or is it present-but-denied? If the model can see the tool definition and only fails on invocation, you're relying on the model to make good choices. That's not isolation.
What happens when a user's access is revoked mid-session? Does the next tool call fail closed within seconds, or does it succeed because the runtime cached the token? Token revocation propagation is a useful test of how seriously identity is taken.
Can you produce a per-user audit trail for tool calls — by user, by date range, by tool, by outcome — without writing custom code? If the answer involves grepping orchestrator logs, the audit layer isn't really there.
When a downstream API changes its contract, how does the sandbox notice and what does it do? Silent breakage is the worst failure mode. The sandbox should be aware of the contract its tools depend on. We've written about the mechanics of catching that drift in how to detect API breaking changes.
Does the runtime enforce per-user rate limits, or only global ones? Global limits let one user's runaway loop deny service to everyone else.
None of these questions are exotic. They're the questions your security team will ask before they let the agent touch production. Better to answer them while you're picking the runtime than during the review.
Pontil is a Tools-as-a-Service platform, and the runtime is the part of the platform this article is really about. We generate tools from a SaaS product's existing codebase, and then we run those tools on behalf of the agent — as the authenticated user, against a scoped capability set, with per-call audit and observability.
The runtime is the sandbox. Tool calls execute under the user's real identity, never a shared service account. The set of callable tools is determined per session, not negotiated with the model in a prompt. Rate limits, failure handling, and audit are properties of the runtime, not of whatever framework happens to be orchestrating. That's what we mean when we say sandboxing agent tools should be a runtime concern, not a prompt concern.
If you're working through what an agent runtime boundary should look like for your own product, the product page covers the three components in more detail.
The deeper shift here is that the sandbox stops being a development-time convenience and becomes a production-time contract. The contract reads something like: this agent can act as this user, on this set of tools, with this blast radius, with this audit trail, with these failure modes. The model is downstream of the contract, not upstream of it.
That's a different posture from the current default, which treats the agent as the trusted actor and the sandbox as a polite request for restraint. The polite-request model worked while agents were demos. It doesn't work when agents are calling tools that move money, change records, send emails, and touch customer data. At that point the sandbox is the thing standing between a prompt injection and a regulator.
The teams that ship production agents in 2026 will be the ones who treat the runtime boundary as a first-class part of the system — designed, tested, and reviewed with the same seriousness as the model choice and the orchestration layer. The sandbox is not the boring infrastructure underneath the interesting AI work. For SaaS companies building agents on their own products, it's the part that decides whether the project gets to production at all.
Stay up to date on the ever changing agentic landscape.