Use Case

Monitor AI App Uptime When OpenAI or Anthropic Degrades

Q: What does a 429 from OpenAI actually mean for my app?

Two different things. A rate_limit_exceeded 429 means request frequency exceeded your account or tier limit. It is transient, and retrying with backoff is the right response. An insufficient_quota 429 means billing or credits are exhausted. It is not transient. Retrying never helps, and no provider status page will ever show it because it is account-scoped. Assert body_not_contains on insufficient_quota so a quota-dead AI feature fails the check loudly instead of degrading silently until a customer notices.

May 19, 202612 min read

Bottom line: If your core feature is an AI call, a degraded provider is a degraded product, and a naive ping of api.openai.com will mostly return 200 while your users watch the feature fail. The check that actually catches this is a multi-step API monitor that signs in and calls your own in-app AI endpoint, asserting the response shape, the HTTP status, and a response_time_ms budget, paired with a browser login monitor that signs in as a real user and confirms the AI feature actually rendered. Velprove proves your AI endpoint responded fast and in the expected shape. It does not and cannot judge whether the model's answer was good or correct.

The 15 hours OpenAI's API ran at 75%

On June 9 and 10, 2024, OpenAI had an incident that lasted roughly 15.5 hours. It was not a clean outage. According to OpenAI's June 2024 postmortem, "ChatGPT users experienced elevated error rates reaching ~35% errors at peak, while API users experienced error rates peaking at ~25%," and for the API, "Availability dropped to 75% during the incident." The root cause was mundane: "a daily scheduled system update inadvertently restarted the network management service (systemd-networkd) on affected nodes, causing a conflict with a networking agent."

Read the 75% number again, because it is the whole point of this post. The API was not down. It was up, and serving correct responses, roughly three times out of four, for fifteen hours. A monitor that asks "is api.openai.comreachable, does it return 200" would have passed most of the time, because most of the time it genuinely did. Meanwhile an app that calls that API on every user action, without retry-with-jitter, was surfacing roughly one in four AI-feature requests as a failure to real users. The provider endpoint was nominally up. The product was not.

That gap did not show up at api.openai.com. It showed up inside your own AI feature, as elevated errors, latency blowout, 429s and 529s, or a stream that returned HTTP 200 and then died mid-completion. The general case of what an HTTP 200 misses is its own subject, covered in why uptime monitors miss real outages. This post is the AI-provider-specific case, and it has failure modes that the general catalogue does not.

How an LLM provider actually degrades to your app

When a provider degrades, it does not politely return a single clean error code. Here is the detectable surface, by primary source.

Latency blowout. The most common partial degradation is not an error at all. Time-to-first-token climbs, the call still completes, the status is still 200, and your users sit watching a spinner. This is invisible to a status-code check and visible to a latency assertion.

HTTP 429, which means two different things. This distinction is underused and it matters. A rate_limit_exceeded 429 means request frequency exceeded your account or tier limit. It is transient; retry with backoff. An insufficient_quota 429 means billing or credits are exhausted. It is not transient. Retrying never helps, and your AI feature is silently dead until you top up. No provider status page will ever show this, because it is account-scoped, not a provider outage.

5xx and timeouts during incidents. During the documented OpenAI incidents, traffic returned 500s, 503s, and timeouts. Anthropic's errors documentation lists "500 - api_error" and "504 - timeout_error" explicitly.

Anthropic 529, overloaded_error. Anthropic's docs define "529 - overloaded_error: The API is temporarily overloaded" and warn that "529 errors can occur when APIs experience high traffic across all users. In rare cases, if your organization has a sharp increase in usage, you might see 429 errors because of acceleration limits on the API." The error body shape is {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}}, which is exactly the kind of shape you can assert against.

The AI-specific one: a stream that errors after a 200. Anthropic documents this directly: "When receiving a streaming response over SSE, it's possible that an error can occur after returning a 200 response, in which case error handling wouldn't follow these standard mechanisms." A status-code-only check passes, because the status really was 200, while the user gets a truncated or errored completion. This failure mode does not exist for a static REST endpoint, and it is the reason the rest of this post is not just the silent-outage argument with an LLM example. Providers can also serve a slower or lower-tier path under load; treat that qualitatively, as a latency signal, not as a documented behavior with a name.

What Velprove can and cannot see here

This goes before the setup on purpose, because if you misunderstand the boundary, you will build the wrong check and trust it for the wrong thing.

Velprove can prove your AI endpoint responded, with the expected JSON shape, the expected HTTP status, and within a latency budget. It cannot judge whether the model's answer was good, correct, relevant, or not hallucinated. It asserts that the AI feature responded correctly-shaped and fast, not that the answer was right. Output-quality, evals, and hallucination testing are a different tool category and explicitly out of scope. There is no semantic or answer-quality assertion in the product, and there is no way to construct one. The only assertion types that exist are status_code, body_contains, body_not_contains, json_path, response_time_ms, and header_contains. None of those reads meaning.

Here is the turn, and it is an honest one. Shape, latency, status, and the stream-error-after-200 case catch the overwhelming majority of provider-degradation incidents anyway, precisely because those failures change the shape, status, or latency of the response, not just its quality. A timeout is a latency failure. A 529 is a status failure. A quota-dead account is a body failure. A stalled stream is a completion-marker failure. The class of failure that a Velprove check genuinely cannot see, a confidently-worded but wrong answer returned fast and in the right shape, is real, but it is an evals problem, and conflating the two is how monitoring tools lose credibility. Velprove will not claim that ground.

The monitor: a multi-step API check on your own AI endpoint

The useful check is a Velprove multi-step API monitor pointed at your own AI endpoint, not a ping. It has two steps. Step one authenticates as a dedicated low-privilege synthetic test account. Step two calls your in-app AI endpoint with a fixed synthetic test prompt. Use a generic endpoint shape to make this concrete: POST /api/ai/generate returning { "answer": "..." }.

On step two, set these success conditions. A status_code assertion that catches 429, 500, 503, 504, and 529 (do not assert 200-only if your endpoint streams, see the next section). A response_time_ms threshold sized to your real time-to-first-token budget, because latency blowout is the partial degradation a status check misses entirely. A json_path assertion with the exists operator on the answer field, which catches a 200 wrapped around a body that is missing the answer field entirely, a malformed error-shape response. And two body_not_contains rules, one for overloaded_error and one for insufficient_quota, so the two failures that no provider status page will ever show fail your check loudly.

I am deliberately not re-explaining how multi-step monitors chain requests, extract a token, and carry it forward. That is its own walkthrough; see the multi-step API monitoring guide for the mechanics and come back. The plan math here: the Free plan caps multi-step API monitors at 3 steps, so a 2-step check fits Free comfortably. Starter at 19 dollars lifts the cap to 5 steps and the interval to 1 minute; Pro at 49 dollars goes to 10 steps and a 30-second interval. Each monitor runs from one of 5 global regions; if you want regional coverage, run a separate monitor per region, because providers can degrade asymmetrically by region.

This is the same pattern our guide to monitoring Stripe API health applies to your payment provider: your app depends on a third-party API that degrades in ways the vendor status page does not show, so you monitor your own integration point synthetically rather than trusting the dependency to tell you. The dependency is different. The pattern is the same.

Velprove Create New Monitor wizard on the Configure Multi-Step Flow step. Step one is a POST to https://api.example.com/auth/login. Step two is a POST to https://api.example.com/api/ai/generate with five Success Conditions rows: Status Code with the Equals operator and value 200, Response Time (ms) with the Less Than operator and value 8000, JSON Path with the Exists operator on path $.answer, Response Body with the Not Contains operator and value overloaded_error, and Response Body with the Not Contains operator and value insufficient_quota. — Step two of the multi-step API monitor against a generic /api/ai/generate endpoint. The Status Code condition catches 429 and 5xx and 529, the Response Time (ms) Less Than condition catches latency blowout, the JSON Path Exists condition on $.answer catches a 200 wrapped around a body missing the answer field, and the two Response Body Not Contains conditions fail the check on the two account-scoped errors no provider status page shows.

The stream that returns 200 and then dies

If your AI endpoint streams, a status_code assertion alone is not enough, and Anthropic's own docs are why. An error can occur after a 200 has already been returned. The HTTP status was 200. It was honestly 200. The stream then errored, stalled, or truncated, and your user got half an answer or a broken one. A monitor that only checks the status code records a pass and tells you everything is fine.

The fix is to assert on a stable end-of-stream marker, not on the status. If your endpoint emits a final structured event or sets a completion field once the full answer is assembled, assert it with a json_path or body_contains rule: for example, a done: true field, or a sentinel token your server only writes after the stream closes cleanly. A 200-then-broken stream will not contain that marker, so the check fails on exactly the failure a status-code check waves through. This is the single beat that separates monitoring an AI feature from monitoring any other endpoint, and it is the reason the boundary section above is honest rather than defensive: this failure changes the response shape, so Velprove can see it.

The browser login monitor: the AI feature as a signed-in user

The API check proves the endpoint answers correctly-shaped and fast. It does not prove a signed-in user can actually use the AI feature through your real interface, and that is where Velprove's strongest differentiator lives. A browser login monitor opens a real browser, signs in as the dedicated low-privilege test account, and verifies the post-login page rendered correctly. To make it watch the AI feature, point its login URL at an account whose post-login landing surfaces AI-dependent content, then open Customize detection and set Success verification to Page contains text, matching a string that only renders when a real AI result actually loaded. By default this monitor only checks that the URL changed after login, which would pass even if the AI content never rendered, so the default is not enough here. One honest limit: the browser login monitor logs in and checks a single success condition on the resulting page. It does not script clicking into a feature and submitting a prompt. Driving the AI endpoint itself is the multi-step API monitor's job, above.

This catches failures the API check structurally cannot. A front-end that swallows a 500 and shows a generic toast. A spinner that never resolves because the stream stalled client-side. An auth-gated AI route that the API check authenticated into directly but a real browser session cannot reach because a session or CSRF step broke. A client-side error boundary that renders an empty panel while the network tab shows a clean 200. The API monitor sees a healthy endpoint; the user sees a dead feature; the browser login monitor sees what the user sees.

Free includes 1 browser login monitor at a 15-minute interval, which is enough to catch a multi-hour provider degradation and a UI regression within one window. Starter includes 3 at a 10-minute interval, and Pro 10 at a 5-minute interval. Point it at the dedicated test account with the smallest permissions that still renders a real AI result, and use a fixed synthetic prompt, never real user data and never a prod-mutating or expensive call, because it runs on every interval.

Velprove Create New Monitor wizard on the Configure Browser Login Monitor step. Login Page URL is set to https://app.example.com/login, the Username / Email field is filled with a dedicated low-privilege monitoring account, the Password field is filled, and the Customize detection panel is expanded so Success verification is set to Page contains text with an expected string that only renders when a real AI result has loaded. — The browser login monitor, the differentiator here. It opens a real browser, signs in as a dedicated low-privilege test account, and Success verification is set to Page contains text instead of the default URL change, so the check passes only when a string that depends on a real AI result actually renders. It logs in and checks that one post-login condition; calling the AI endpoint itself is the multi-step monitor's job.

Why your provider's status page is not the monitor

Start with the argument that is fully sourced and not a matter of timing at all. An OpenAI insufficient_quota 429 and a per-account acceleration-limit 429 will never appear on status.openai.com or status.anthropic.com, because they are not provider outages. They are account-scoped. The provider is fine. Your account is out of credits or over an acceleration limit, and a synthetic monitor of your own AI endpoint is the only thing that catches them, because there is no public incident to subscribe to.

Then the timing argument, kept qualitative, because no defensible minute-count exists. OpenAI's December 11, 2024 postmortem describes a control-plane cascade: from 3:16 PM PST to 7:38 PM PST, about 4 hours 22 minutes, after "a new telemetry service deployment that unintentionally overwhelmed the Kubernetes control plane." DNS caching held stale-but-working records for a while, which delayed when services visibly started failing, and OpenAI states plainly that "Remediation was very slow because of the locked out effect." You do not need an invented number to take the point: impact and provider-side acknowledgement and recovery are not the same clock. A monitor of your own endpoint runs on the impact clock. The status page runs on the acknowledgement clock.

This is not a substitute for error tracking or evals

One last honest boundary, because credibility is the only thing this post is selling. Velprove tells you fast that your AI endpoint stopped responding correctly-shaped, fast, and with the right status. It does not replace application error tracking, which owns the stack traces and the per-request diagnostics when you go to fix the failure. It does not replace model-output evals, which own whether the answers are actually any good. Three different layers, three different jobs. A synthetic uptime monitor is the layer that tells you the AI feature is failing for users right now, which is the layer most teams launching AI features do not have wired and the one a degraded provider exposes first. Keep the eval suite. Keep the error tracker. Add the monitor that watches the endpoint the way a user hits it.

Frequently Asked Questions

Can Velprove tell me if the AI gave a wrong or bad answer?

No. Velprove asserts that your AI endpoint responded, in the expected JSON shape, with the expected HTTP status, inside a latency budget. It does not judge whether the answer was correct, relevant, or hallucinated. That is an evals and output-quality tool category, and it is explicitly out of scope. What Velprove does instead is catch the failure modes that change the shape, status, or latency of the response: timeouts, 429, 500, 503, Anthropic 529, a response missing the answer field, and a stream that returned 200 and then died. Those cover the overwhelming majority of provider-degradation incidents.

How do I monitor my AI feature without sending real user data to the model?

Use a dedicated low-privilege synthetic test account and a fixed synthetic test prompt that you control. Never send real user data through the monitor, and never point it at a prod-mutating or expensive AI call. The prompt should be short, deterministic in shape, and cheap, because it runs on every probe interval. The point of the check is to prove the endpoint responds correctly-shaped and fast, not to exercise real customer content.

Will an uptime monitor catch an OpenAI or Anthropic outage before their status page does?

It catches the class of failures a provider status page structurally cannot show, because some of them are account-scoped, not provider outages. An OpenAI insufficient_quota 429 or a per-account acceleration-limit 429 will never appear on status.openai.com or status.anthropic.com because they are not platform incidents. A synthetic monitor of your own AI endpoint sees the impact where it actually lands, at your request, without waiting for the provider to detect, confirm, and post. OpenAI's own December 11 2024 postmortem describes remediation that was very slow because of the locked out effect, which is a sourced way of saying impact and acknowledgement are not the same clock.

What does a 429 from OpenAI actually mean for my app?

Two different things. A rate_limit_exceeded 429 means request frequency exceeded your account or tier limit. It is transient, and retrying with backoff is the right response. An insufficient_quota 429 means billing or credits are exhausted. It is not transient. Retrying never helps, and no provider status page will ever show it because it is account-scoped. Assert body_not_contains on insufficient_quota so a quota-dead AI feature fails the check loudly instead of degrading silently until a customer notices.

My AI endpoint returns 200 but the answer is cut off. Why doesn't my monitor catch it?

Streaming responses can return HTTP 200 and then error mid-stream. Anthropic documents this directly: when receiving a streaming response over SSE, an error can occur after a 200 response has already been returned. A status-code-only assertion passes because the status really was 200. Assert a stable end-of-stream or completion marker with a json_path or body_contains rule so a 200-then-broken-stream still fails the check.

Can I do this on the free plan?

Yes. A 2-step API monitor (authenticate, then call your AI endpoint) fits the Free plan, which caps multi-step API monitors at 3 steps. Free also includes 1 browser login monitor at a 15-minute interval and email alerts, with a 5-minute HTTP interval and commercial use allowed. Starter at 19 dollars lifts multi-step to 5 steps, drops the interval to 1 minute, and adds Slack, Discord, Teams, and webhook alerts. Pro at 49 dollars goes to 10 steps and a 30-second interval. Start with the free plan. No credit card required.

Which region does the AI endpoint check run from?

From any one of 5 global regions. Each monitor runs from a single region you pick, not all of them at once. If you want regional coverage of your AI endpoint, create separate monitors per region. This matters for AI features because a provider can degrade asymmetrically by region, and a single-region monitor only sees its own region's path.