API strategy
Platform integration
How to detect API breaking changes with oasdiff in CI, snapshot diffs for third-party APIs, and runtime canaries — a practical guide for agent-era teams.

If your agents depend on a third-party API — or your own — a silent schema change can take production down before the next release note lands. This guide walks through a working pipeline to detect API breaking changes before they reach your users. By the end you'll have an oasdiff-based CI check for your own APIs, a snapshot-and-diff loop for third-party APIs you don't control, and a runtime canary that catches drift the spec misses.
Prerequisites: a service with an OpenAPI 3.x spec (or the ability to generate one), a CI system (GitHub Actions in the examples below), and roughly two hours.
You can't detect drift against nothing. Start by committing the current spec to your repo as the source of truth.
For your own APIs, generate an OpenAPI spec from your framework. Most modern frameworks have a generator — fastapi ships one, nestjs has @nestjs/swagger, Spring has springdoc-openapi. Run it and commit the output:
# Example: FastAPI
python -c "from app.main import app; import json; print(json.dumps(app.openapi()))" > openapi.json
git add openapi.json && git commit -m "chore: baseline openapi spec"
For third-party APIs, fetch the published spec or scrape one from their docs portal. Save it under specs/third-party/<vendor>.json with the date and version in the commit message. If the vendor doesn't publish a spec, see Step 4.
Expected result: a versioned openapi.json (or set of them) in your repo. This is what every future check compares against.
oasdiff is the standard tool for diffing two OpenAPI specs and classifying the changes. It knows what counts as breaking (removed endpoint, required field added to request, response field removed) versus non-breaking (new optional field, new endpoint).
Add a check that runs on every pull request:
# .github/workflows/api-breaking-changes.yml
name: api-breaking-changes
on: [pull_request]
jobs:
oasdiff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- name: Regenerate spec from PR branch
run: python -c "from app.main import app; import json; print(json.dumps(app.openapi()))" > new.json
- name: Fetch base spec
run: git show origin/main:openapi.json > base.json
- name: Run oasdiff
uses: oasdiff/oasdiff-action/breaking@v0.0.47
with:
base: base.json
revision: new.json
fail-on: ERR
Pin the action to a released tag (and let Dependabot or Renovate bump it) rather than tracking @main — that's both safer and what the oasdiff-action README recommends. The job fails if oasdiff finds any breaking-severity change. Set fail-on: WARN instead if you want to block on borderline cases too (deprecations, format tightening).
Expected result: PRs that introduce a removed endpoint, a new required field, or a changed response type will fail CI with a list of exactly which paths and which changes triggered the failure.
oasdiff's defaults are conservative. They flag changes that can break consumers, not changes that definitely will. Tune the ruleset before your team learns to ignore the warnings.
A practical default:
Write these into an oasdiff.yaml config and reference it from the workflow. The exact rule names live in the oasdiff breaking-changes docs. The point isn't to copy this table — it's to make the decision once, write it down, and stop arguing about it per PR.
Expected result: every developer on the team knows what will fail CI before they push.
Third-party APIs are the harder problem. Vendors change response shapes without bumping versions. Sometimes without a changelog entry. Your CI can't catch what the vendor ships at midnight.
Run a scheduled job that hits each third-party endpoint with a known input and stores the response shape. Diff it against yesterday's snapshot.
# .github/workflows/third-party-drift.yml
name: third-party-drift
on:
schedule:
- cron: '0 6 * * *'
jobs:
snapshot:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Probe vendor endpoints
env:
VENDOR_TOKEN: ${{ secrets.VENDOR_TOKEN }}
run: ./scripts/snapshot-third-party.sh > snapshots/$(date +%F).json
- name: Diff against yesterday
run: node scripts/diff-snapshots.js
- name: Open issue on drift
if: failure()
uses: actions/github-script@v7
with:
script: |
github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: `Third-party API drift detected: ${new Date().toISOString().slice(0,10)}`,
body: 'See workflow logs for the diff.'
})
The snapshot script should record the JSON keys present at each path of the response, not the values. Values change every run; structure shouldn't. A drift is a new key, a missing key, or a type change at a known path.
For APIs that are read-heavy and idempotent, a real GET is fine. For APIs where every call costs money or creates a record, use the vendor's sandbox or a recorded fixture.
Expected result: a daily issue (or Slack ping) the morning after a vendor changes a response shape — usually before a customer reports it.
Schema diffs catch shape changes. They miss semantic changes — the field is still called status, but pending now means something different. They also miss undocumented endpoints your code relies on, and behaviour changes inside fields the spec marks as free-text.
Run a small set of end-to-end probes on a schedule: real calls, assertions on the meaning of the response, not just its shape.
# probes/test_vendor_search.py
def test_search_returns_results_in_relevance_order():
results = vendor.search(query="known fixture term")
assert len(results) >= 3
assert results[0].score >= results[1].score
assert results[0].id == EXPECTED_TOP_RESULT_ID
Run these through your normal test runner against production (or staging if the vendor offers it) on a 15-minute cron. When they fail, page someone.
This is the layer where things like "the vendor silently changed pagination defaults from 20 to 10" or "status: cancelled now means status: voided" get caught. Specs say the response is well-formed. Your agent says the answer is wrong. Only a behaviour probe knows the difference.
Expected result: 80% of schema drift caught by Steps 2–4, the remaining 20% — the semantic stuff — caught here.
Detection without ownership is noise. Decide upfront who acts on which signal:
CODEOWNERS on the integration directory.If the same vendor drifts repeatedly, that's a signal — not about the detector, but about whether that integration is worth the maintenance. The pattern is what the bespoke connector cost article gets into.
Expected result: a documented runbook entry per signal type. No silent dashboards.
Specs that lie. A handwritten OpenAPI spec drifts from the implementation faster than the third-party APIs you're trying to monitor. If you can't generate the spec from code, your baseline is fiction. Fix the generator before fixing detection.
Treating warnings as failures from day one. You'll get a flood of warnings on first run — old optional fields with vague types, missing examples, inconsistent error shapes. Triage once. Suppress what you don't care about. Then turn the screws.
Diffing only the spec. Specs describe the contract. Agents care about behaviour. If you're only running Step 2, you'll catch the loud changes and miss the expensive ones. The runtime canary in Step 5 isn't optional for any integration your business depends on.
Forgetting the auth surface. A vendor changing Authorization: Bearer to a custom header scheme is technically a breaking change, but it often slips past schema diffs because the security scheme block looks superficially similar. Probe an authenticated endpoint in Step 5, not just a public one.
Monitoring everything equally. You probably integrate with thirty-plus third-party APIs. You don't need behaviour probes on all of them. Run schema snapshots across the long tail; spend probe budget on the five integrations that would page someone at 2am if they broke. The same prioritisation logic shows up when you're calculating agent integration debt — find the surface that matters, instrument that.
Once detection is in place, the next problem is response time. A breaking change you detect at 6am is still a breaking change in production until someone ships a fix. Look at how your team handles vendor drift today — manual code edit, deploy, hotfix — and decide whether that loop is fast enough for the agent workloads you're running. For most teams building agents on top of third-party APIs, it isn't, and the gap is what makes connector maintenance compound instead of staying flat.
Stay up to date on the ever changing agentic landscape.