Agent infrastructure

Agents in production

Agentic workflows: what they are, how they break, and what makes them production-grade

Agentic workflows look simple on a whiteboard and stall in production. A deep-dive on patterns, architecture, and why the tools layer is the real bottleneck.



8 minute read

Agentic workflows are the part of the AI conversation where the demo stops being a demo. A model picks a goal, decides what to do next, calls a tool, reads the result, and adjusts. That loop is straightforward to draw on a whiteboard and brutal to run in production at a real B2B SaaS company.

This piece is for the engineering and AI leaders whose pilot worked and whose rollout didn't. Pontil's view: agentic workflows aren't blocked by reasoning quality or orchestrator choice. They're blocked at the tools layer — the surface where the agent has to reach into a product and actually do work. Below: how to define agentic workflows precisely, the patterns that show up across real deployments, the architecture that has to sit underneath them, where they fail, and how to tell whether yours is genuinely production-grade.

What counts as an agentic workflow

An agentic workflow is a process where a model decides the next action, takes it, observes the outcome, and adjusts — rather than following a fixed script. The decision loop is the defining feature. If every step is pre-wired, you have automation. If the model can branch, retry, escalate, or pick a different tool based on what it just learned, you have an agent.

The loop has four moving parts: a goal, a set of tools the agent can call, a memory of what's happened so far, and a model that picks the next move. Strip any one of those and the workflow stops being agentic. A chatbot with no tools is a chatbot. A pipeline with tools but no model-driven branching is a Zapier flow. We covered the strict definition in agentic, defined — the short version is that decide, act, and adjust all have to be present.

This matters for scope. Most teams calling their project an agentic workflow are really building one of three things: a single-turn assistant with tool calls, a multi-step automation with an LLM in the middle, or a genuine agent with a real planning loop. The three need very different infrastructure. Conflating them is how budgets get set against the wrong problem.

Agentic workflow examples that actually run

The useful examples aren't the consumer demos. They're the workflows established SaaS companies are quietly trying to ship inside their own products. A few that recur across the teams we've spoken with:

Customer-facing support agent inside a product. The agent reads the user's question, pulls account state, checks recent activity, files a ticket or executes a remediation, and confirms back. Decide-act-adjust runs across four or five tool calls per session.
Internal ops agent for a multi-product platform. A revops or success team asks a question that spans CRM, billing, and the product itself. The agent fans out across systems, reconciles, and returns a single answer with citations.
Configuration agent for complex setup. A new customer describes what they want; the agent reads their existing config, proposes changes across multiple modules, applies them on approval, and rolls back on failure.
Data-quality agent. Runs continuously, scans the product's own data surface, flags anomalies, opens tickets, and in some cases auto-corrects within a permission boundary.

What unites the four: each one needs the agent to act on the company's own product, not just answer questions about it. That's where the access problem starts. The agent has a model good enough to plan the work. It doesn't have a way to reach the surface it needs to operate on. The structural reason agent projects stall is the same in all four cases.

Agentic workflow patterns: the architectural shapes

Under the marketing language, most agentic workflows fit one of four patterns. Naming them helps with design decisions because each pattern stresses different parts of the stack.
‍

Single-agent loop

Router + specialists

Plan-then-execute

Multi-agent collaboration

Decision-making

One model, one loop

Router picks a specialist per turn

Planner writes a plan, executor runs it

Multiple agents negotiate

Tool surface needed

Medium

Wide — specialists need depth in their domain

Wide and stable

Wide, with shared state

Failure mode

Wanders on long tasks

Router misroutes

Plan goes stale mid-execution

Agents disagree, loop forever

When it fits

Bounded tasks, short horizons

Mixed-domain workflows

Tasks with clear sub-goals

Genuinely parallel work

Most production deployments are router + specialists or plan-then-execute. Multi-agent collaboration is overused in demos and underused in production for a simple reason: every additional agent multiplies the tools surface area, and the tools layer is already the weak link. Adding agents before fixing tools is solving the wrong problem first — a pattern we've written about in the orchestrator obsession.

The trade-off no one talks about: the more sophisticated the pattern, the more it depends on the tools layer being boring. A plan-then-execute agent that calls fifteen tools across a plan needs all fifteen to behave predictably, return structured errors, respect rate limits, and execute under the right identity. Reasoning sophistication does not rescue a flaky tools layer. It exposes it.

Agent workflow architecture: the four layers that have to hold

A production agentic workflow needs four layers working together. Get any one wrong and the whole thing fails in ways that look like model problems but aren't.

The model layer

This is the foundation model — Claude, GPT, Gemini — doing the reasoning. Foundation model providers like Anthropic and OpenAI have made this layer commodity-good for most production workflows. It is, increasingly, not where projects fail.

The orchestration layer

The state machine that runs the loop: hold conversation history, pass tool results back to the model, manage retries, enforce step limits. Frameworks like LangChain/LangGraph and CrewAI live here. Useful, well-trodden, rarely the blocker.

The tools layer

The surface the agent actually invokes — the connectors, the runtime that executes them, the auth lifecycle that keeps them safe. This is where most established-SaaS agent projects stall. Your APIs were built for the UI, not for an agent, and most of what your product can do isn't exposed. The model can plan a five-step workflow; the tools layer can only deliver step one.

The identity and audit layer

Who is the agent acting as? Most pilots cheat here by giving the agent a service account. In production, that's a non-starter — permissions blow up, audit trails are wrong, data leakage is one prompt injection away. Tool calls have to execute as the authenticated user, with the user's actual permissions. This is the layer that gets retrofitted painfully if it isn't designed in.

For a deeper read on how the layers connect, see the agent stack map for platform teams.

Where agentic workflows fail in production

Three failure modes account for most stalled projects. None of them are model failures.

Tool coverage gaps. The agent can reason about an action it has no tool for. So it improvises — fabricates an API call, tries a workaround, or gives up. The fix isn't a better prompt; it's more tools. And generating more tools by hand, product by product, is its own trap.

Silent tool drift. A tool worked yesterday; today the underlying API changed and the tool definition didn't. The agent calls it, gets a wrong result, and confidently proceeds. There's no test in the loop. This is the failure mode that kills browser automation in production — UI selectors drift, no contract catches it, and the agent fails silently.

Identity collapse under load. Workflows that worked for one user start leaking data when ten users hit the same agent. Service-account auth doesn't scale. Permission boundaries blur. The compliance team finds it before the engineering team does.

One external pattern worth naming: Gartner has projected that over 40% of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. The middle reason is a strategy problem; the first and third are architectural. Inadequate risk controls sit at the tools and identity layers — not the model layer — and escalating costs are usually downstream of bespoke tool work that has to be redone every time the product changes. Either way, the cancellations are downstream of architecture decisions made eighteen months earlier.

How Pontil fits

Pontil is a Tools-as-a-Service platform. We make SaaS products accessible to AI agents — which is the layer the rest of this article keeps pointing at. We generate tools from the codebase that already exists, maintain them as the product changes, and run them in a managed runtime that executes each call as the authenticated user.

The practical effect for an agentic workflow: the tools layer stops being the part of the stack that drifts. Coverage grows as fast as the product surface allows, without a portfolio-wide API rewrite. Identity and audit are correct by default. The agent's reasoning, the orchestrator's plan, and the tool that runs all agree on who the user is and what they're allowed to do.

If the workflows in this article look like the ones stalled inside your platform, a short demo is the fastest way to see whether the shape matches.

What does a production-grade agentic workflow actually look like?

Not the demo. The version that runs every day, for every customer, without a human in the loop catching errors.

Production-grade means three things. First, the tools layer covers the workflow's full surface — not 80%, because the 20% gap is where the agent improvises and fails. Second, every tool call is observable, retryable, and executed as a real user with real permissions. Third, when the product changes, the tools change with it, automatically, before the agent notices.

The field is converging on this answer slowly because the model layer is more interesting to write about. But the teams shipping real agentic workflows in 2026 aren't the ones with the best prompts. They're the ones who stopped treating the tools layer as plumbing and started treating it as the product. The question worth holding open: how much of your current agent roadmap assumes the tools layer will sort itself out — and what changes when you stop assuming that?

Back to resources



Example H2

Example H3

Join our weekly newsletter

Stay up to date on the ever changing agentic landscape.

Agentic workflows: what they are, how they break, and what makes them production-grade



8 minute read

What counts as an agentic workflow

Agentic workflow examples that actually run

Agentic workflow patterns: the architectural shapes

Agent workflow architecture: the four layers that have to hold

The model layer

The orchestration layer

The tools layer

The identity and audit layer

Where agentic workflows fail in production

How Pontil fits

What does a production-grade agentic workflow actually look like?

Back to resources



Join our weekly newsletter

POSTS

Related content

Agent projects stall at the same point. Here's why

The orchestrator obsession is hiding the real bottleneck

The agent stack: a map for platform teams

Your platform, agent-ready. Without rebuilding a thing.

site navigation

Agentic workflows: what they are, how they break, and what makes them production-grade



8 minute read

What counts as an agentic workflow

Agentic workflow examples that actually run

Agentic workflow patterns: the architectural shapes

Agent workflow architecture: the four layers that have to hold

The model layer

The orchestration layer

The tools layer

The identity and audit layer

Where agentic workflows fail in production

How Pontil fits

What does a production-grade agentic workflow actually look like?

Back to resources



Join our weekly newsletter

POSTS

Related content

Agent projects stall at the same point. Here's why

The orchestrator obsession is hiding the real bottleneck

The agent stack: a map for platform teams

Your platform, agent-ready. Without rebuilding a thing.

site navigation

Your platform, agent-ready. Without rebuilding a thing.