API strategy

Platform integration

How to detect API breaking changes: a practical guide for agent-era teams

How to detect API breaking changes with oasdiff in CI, snapshot diffs for third-party APIs, and runtime canaries — a practical guide for agent-era teams.

7 minute read
Decorative imagery showcasing Pontil's brand

If your agents depend on a third-party API — or your own — a silent schema change can take production down before the next release note lands. This guide walks through a working pipeline to detect API breaking changes before they reach your users. By the end you'll have an oasdiff-based CI check for your own APIs, a snapshot-and-diff loop for third-party APIs you don't control, and a runtime canary that catches drift the spec misses.

Prerequisites: a service with an OpenAPI 3.x spec (or the ability to generate one), a CI system (GitHub Actions in the examples below), and roughly two hours.

Step 1 — Baseline your current API surface

You can't detect drift against nothing. Start by committing the current spec to your repo as the source of truth.

For your own APIs, generate an OpenAPI spec from your framework. Most modern frameworks have a generator — fastapi ships one, nestjs has @nestjs/swagger, Spring has springdoc-openapi. Run it and commit the output:

# Example: FastAPI
python -c "from app.main import app; import json; print(json.dumps(app.openapi()))" > openapi.json
git add openapi.json && git commit -m "chore: baseline openapi spec"

For third-party APIs, fetch the published spec or scrape one from their docs portal. Save it under specs/third-party/<vendor>.json with the date and version in the commit message. If the vendor doesn't publish a spec, see Step 4.

Expected result: a versioned openapi.json (or set of them) in your repo. This is what every future check compares against.

Step 2 — Add oasdiff to CI for your own APIs

oasdiff is the standard tool for diffing two OpenAPI specs and classifying the changes. It knows what counts as breaking (removed endpoint, required field added to request, response field removed) versus non-breaking (new optional field, new endpoint).

Add a check that runs on every pull request:

# .github/workflows/api-breaking-changes.yml
name: api-breaking-changes
on: [pull_request]
jobs:
 oasdiff:
   runs-on: ubuntu-latest
   steps:
     - uses: actions/checkout@v4
       with: { fetch-depth: 0 }
     - name: Regenerate spec from PR branch
       run: python -c "from app.main import app; import json; print(json.dumps(app.openapi()))" > new.json
     - name: Fetch base spec
       run: git show origin/main:openapi.json > base.json
     - name: Run oasdiff
       uses: oasdiff/oasdiff-action/breaking@v0.0.47
       with:
         base: base.json
         revision: new.json
         fail-on: ERR

Pin the action to a released tag (and let Dependabot or Renovate bump it) rather than tracking @main — that's both safer and what the oasdiff-action README recommends. The job fails if oasdiff finds any breaking-severity change. Set fail-on: WARN instead if you want to block on borderline cases too (deprecations, format tightening).

Expected result: PRs that introduce a removed endpoint, a new required field, or a changed response type will fail CI with a list of exactly which paths and which changes triggered the failure.

Step 3 — Decide what counts as breaking for you

oasdiff's defaults are conservative. They flag changes that can break consumers, not changes that definitely will. Tune the ruleset before your team learns to ignore the warnings.

A practical default:

Fail the build
Warn only
Ignore

Endpoint behaviour

Removed endpoint, changed HTTP method, removed response field

Added required header, tightened enum values

Added optional response field

Request shape

New required field, removed field, type change on required field

Type change on optional field

New optional field

Auth and errors

Added auth requirement, removed error code

New error code

Reworded description


Write these into an oasdiff.yaml config and reference it from the workflow. The exact rule names live in the oasdiff breaking-changes docs. The point isn't to copy this table — it's to make the decision once, write it down, and stop arguing about it per PR.

Expected result: every developer on the team knows what will fail CI before they push.

Step 4 — Snapshot third-party APIs you don't control

Third-party APIs are the harder problem. Vendors change response shapes without bumping versions. Sometimes without a changelog entry. Your CI can't catch what the vendor ships at midnight.

Run a scheduled job that hits each third-party endpoint with a known input and stores the response shape. Diff it against yesterday's snapshot.

# .github/workflows/third-party-drift.yml
name: third-party-drift
on:
 schedule:
   - cron: '0 6 * * *'
jobs:
 snapshot:
   runs-on: ubuntu-latest
   steps:
     - uses: actions/checkout@v4
     - name: Probe vendor endpoints
       env:
         VENDOR_TOKEN: ${{ secrets.VENDOR_TOKEN }}
       run: ./scripts/snapshot-third-party.sh > snapshots/$(date +%F).json
     - name: Diff against yesterday
       run: node scripts/diff-snapshots.js
     - name: Open issue on drift
       if: failure()
       uses: actions/github-script@v7
       with:
         script: |
           github.rest.issues.create({
             owner: context.repo.owner,
             repo: context.repo.repo,
             title: `Third-party API drift detected: ${new Date().toISOString().slice(0,10)}`,
             body: 'See workflow logs for the diff.'
           })

The snapshot script should record the JSON keys present at each path of the response, not the values. Values change every run; structure shouldn't. A drift is a new key, a missing key, or a type change at a known path.

For APIs that are read-heavy and idempotent, a real GET is fine. For APIs where every call costs money or creates a record, use the vendor's sandbox or a recorded fixture.

Expected result: a daily issue (or Slack ping) the morning after a vendor changes a response shape — usually before a customer reports it.

Step 5 — Add a runtime canary for the changes specs can't catch

Schema diffs catch shape changes. They miss semantic changes — the field is still called status, but pending now means something different. They also miss undocumented endpoints your code relies on, and behaviour changes inside fields the spec marks as free-text.

Run a small set of end-to-end probes on a schedule: real calls, assertions on the meaning of the response, not just its shape.

# probes/test_vendor_search.py
def test_search_returns_results_in_relevance_order():
   results = vendor.search(query="known fixture term")
   assert len(results) >= 3
   assert results[0].score >= results[1].score
   assert results[0].id == EXPECTED_TOP_RESULT_ID

Run these through your normal test runner against production (or staging if the vendor offers it) on a 15-minute cron. When they fail, page someone.

This is the layer where things like "the vendor silently changed pagination defaults from 20 to 10" or "status: cancelled now means status: voided" get caught. Specs say the response is well-formed. Your agent says the answer is wrong. Only a behaviour probe knows the difference.

Expected result: 80% of schema drift caught by Steps 2–4, the remaining 20% — the semantic stuff — caught here.

Step 6 — Wire alerts into the team that actually fixes things

Detection without ownership is noise. Decide upfront who acts on which signal:

  • oasdiff CI failures → the PR author. Block the merge.
  • Third-party snapshot drift → the team that owns the integration. Auto-open a ticket, assign by CODEOWNERS on the integration directory.
  • Runtime probe failures → on-call. Page if a probe fails three runs in a row (filters out flake).

If the same vendor drifts repeatedly, that's a signal — not about the detector, but about whether that integration is worth the maintenance. The pattern is what the bespoke connector cost article gets into.

Expected result: a documented runbook entry per signal type. No silent dashboards.

Common pitfalls

Specs that lie. A handwritten OpenAPI spec drifts from the implementation faster than the third-party APIs you're trying to monitor. If you can't generate the spec from code, your baseline is fiction. Fix the generator before fixing detection.

Treating warnings as failures from day one. You'll get a flood of warnings on first run — old optional fields with vague types, missing examples, inconsistent error shapes. Triage once. Suppress what you don't care about. Then turn the screws.

Diffing only the spec. Specs describe the contract. Agents care about behaviour. If you're only running Step 2, you'll catch the loud changes and miss the expensive ones. The runtime canary in Step 5 isn't optional for any integration your business depends on.

Forgetting the auth surface. A vendor changing Authorization: Bearer to a custom header scheme is technically a breaking change, but it often slips past schema diffs because the security scheme block looks superficially similar. Probe an authenticated endpoint in Step 5, not just a public one.

Monitoring everything equally. You probably integrate with thirty-plus third-party APIs. You don't need behaviour probes on all of them. Run schema snapshots across the long tail; spend probe budget on the five integrations that would page someone at 2am if they broke. The same prioritisation logic shows up when you're calculating agent integration debt — find the surface that matters, instrument that.

What to do next

Once detection is in place, the next problem is response time. A breaking change you detect at 6am is still a breaking change in production until someone ships a fix. Look at how your team handles vendor drift today — manual code edit, deploy, hotfix — and decide whether that loop is fast enough for the agent workloads you're running. For most teams building agents on top of third-party APIs, it isn't, and the gap is what makes connector maintenance compound instead of staying flat.

Join our weekly newsletter

Stay up to date on the ever changing agentic landscape.

POSTS

Related content

Platform integration

Agent infrastructure

The hidden cost of bespoke agent connectors

4 minute read

Platform integration

Agent infrastructure

Connector maintenance cost: the integration engineering tax nobody budgets for

10 minute read

Agent infrastructure

API strategy

How to calculate your agent integration debt

5 minute read