Agent infrastructure
Agents in production
Agentic workflows look simple on a whiteboard and stall in production. A deep-dive on patterns, architecture, and why the tools layer is the real bottleneck.

Agentic workflows are the part of the AI conversation where the demo stops being a demo. A model picks a goal, decides what to do next, calls a tool, reads the result, and adjusts. That loop is straightforward to draw on a whiteboard and brutal to run in production at a real B2B SaaS company.
This piece is for the engineering and AI leaders whose pilot worked and whose rollout didn't. Pontil's view: agentic workflows aren't blocked by reasoning quality or orchestrator choice. They're blocked at the tools layer — the surface where the agent has to reach into a product and actually do work. Below: how to define agentic workflows precisely, the patterns that show up across real deployments, the architecture that has to sit underneath them, where they fail, and how to tell whether yours is genuinely production-grade.
An agentic workflow is a process where a model decides the next action, takes it, observes the outcome, and adjusts — rather than following a fixed script. The decision loop is the defining feature. If every step is pre-wired, you have automation. If the model can branch, retry, escalate, or pick a different tool based on what it just learned, you have an agent.
The loop has four moving parts: a goal, a set of tools the agent can call, a memory of what's happened so far, and a model that picks the next move. Strip any one of those and the workflow stops being agentic. A chatbot with no tools is a chatbot. A pipeline with tools but no model-driven branching is a Zapier flow. We covered the strict definition in agentic, defined — the short version is that decide, act, and adjust all have to be present.
This matters for scope. Most teams calling their project an agentic workflow are really building one of three things: a single-turn assistant with tool calls, a multi-step automation with an LLM in the middle, or a genuine agent with a real planning loop. The three need very different infrastructure. Conflating them is how budgets get set against the wrong problem.
The useful examples aren't the consumer demos. They're the workflows established SaaS companies are quietly trying to ship inside their own products. A few that recur across the teams we've spoken with:
What unites the four: each one needs the agent to act on the company's own product, not just answer questions about it. That's where the access problem starts. The agent has a model good enough to plan the work. It doesn't have a way to reach the surface it needs to operate on. The structural reason agent projects stall is the same in all four cases.
Under the marketing language, most agentic workflows fit one of four patterns. Naming them helps with design decisions because each pattern stresses different parts of the stack.
Most production deployments are router + specialists or plan-then-execute. Multi-agent collaboration is overused in demos and underused in production for a simple reason: every additional agent multiplies the tools surface area, and the tools layer is already the weak link. Adding agents before fixing tools is solving the wrong problem first — a pattern we've written about in the orchestrator obsession.
The trade-off no one talks about: the more sophisticated the pattern, the more it depends on the tools layer being boring. A plan-then-execute agent that calls fifteen tools across a plan needs all fifteen to behave predictably, return structured errors, respect rate limits, and execute under the right identity. Reasoning sophistication does not rescue a flaky tools layer. It exposes it.
A production agentic workflow needs four layers working together. Get any one wrong and the whole thing fails in ways that look like model problems but aren't.
This is the foundation model — Claude, GPT, Gemini — doing the reasoning. Foundation model providers like Anthropic and OpenAI have made this layer commodity-good for most production workflows. It is, increasingly, not where projects fail.
The state machine that runs the loop: hold conversation history, pass tool results back to the model, manage retries, enforce step limits. Frameworks like LangChain/LangGraph and CrewAI live here. Useful, well-trodden, rarely the blocker.
The surface the agent actually invokes — the connectors, the runtime that executes them, the auth lifecycle that keeps them safe. This is where most established-SaaS agent projects stall. Your APIs were built for the UI, not for an agent, and most of what your product can do isn't exposed. The model can plan a five-step workflow; the tools layer can only deliver step one.
Who is the agent acting as? Most pilots cheat here by giving the agent a service account. In production, that's a non-starter — permissions blow up, audit trails are wrong, data leakage is one prompt injection away. Tool calls have to execute as the authenticated user, with the user's actual permissions. This is the layer that gets retrofitted painfully if it isn't designed in.
For a deeper read on how the layers connect, see the agent stack map for platform teams.
Three failure modes account for most stalled projects. None of them are model failures.
Tool coverage gaps. The agent can reason about an action it has no tool for. So it improvises — fabricates an API call, tries a workaround, or gives up. The fix isn't a better prompt; it's more tools. And generating more tools by hand, product by product, is its own trap.
Silent tool drift. A tool worked yesterday; today the underlying API changed and the tool definition didn't. The agent calls it, gets a wrong result, and confidently proceeds. There's no test in the loop. This is the failure mode that kills browser automation in production — UI selectors drift, no contract catches it, and the agent fails silently.
Identity collapse under load. Workflows that worked for one user start leaking data when ten users hit the same agent. Service-account auth doesn't scale. Permission boundaries blur. The compliance team finds it before the engineering team does.
One external pattern worth naming: Gartner has projected that over 40% of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. The middle reason is a strategy problem; the first and third are architectural. Inadequate risk controls sit at the tools and identity layers — not the model layer — and escalating costs are usually downstream of bespoke tool work that has to be redone every time the product changes. Either way, the cancellations are downstream of architecture decisions made eighteen months earlier.
Pontil is a Tools-as-a-Service platform. We make SaaS products accessible to AI agents — which is the layer the rest of this article keeps pointing at. We generate tools from the codebase that already exists, maintain them as the product changes, and run them in a managed runtime that executes each call as the authenticated user.
The practical effect for an agentic workflow: the tools layer stops being the part of the stack that drifts. Coverage grows as fast as the product surface allows, without a portfolio-wide API rewrite. Identity and audit are correct by default. The agent's reasoning, the orchestrator's plan, and the tool that runs all agree on who the user is and what they're allowed to do.
If the workflows in this article look like the ones stalled inside your platform, a short demo is the fastest way to see whether the shape matches.
Not the demo. The version that runs every day, for every customer, without a human in the loop catching errors.
Production-grade means three things. First, the tools layer covers the workflow's full surface — not 80%, because the 20% gap is where the agent improvises and fails. Second, every tool call is observable, retryable, and executed as a real user with real permissions. Third, when the product changes, the tools change with it, automatically, before the agent notices.
The field is converging on this answer slowly because the model layer is more interesting to write about. But the teams shipping real agentic workflows in 2026 aren't the ones with the best prompts. They're the ones who stopped treating the tools layer as plumbing and started treating it as the product. The question worth holding open: how much of your current agent roadmap assumes the tools layer will sort itself out — and what changes when you stop assuming that?
Stay up to date on the ever changing agentic landscape.