Monitor a Supabase App: Auth, RLS, Edge Functions, Realtime
Diagnosis. Supabase's status page can be green while your customer's post-login dashboard renders an empty list. RLS-protected reads that fail to a misconfigured policy return an empty array with HTTP 200, which a status-code probe cannot see. A Supabase Auth session can pass its first check, then fail the next one with "Invalid Refresh Token" because refresh tokens are single-use. An Edge Function that boots in 3 ms on a warm isolate can spike into a multi-second tail on the cold path. Realtime can keep its websocket open and stop delivering events. None of those show up on a 200 OK against your app URL. Velprove's free browser login monitor signs into the actual app your users open, the one that reads through Supabase RLS, and a multi-step API monitor re-authenticates against /auth/v1/token every check and asserts a known row comes back, not just a 200.
Your Supabase project returns 200 OK. Your RLS reads return [].
PostgREST is the HTTP layer that fronts every Supabase table. When you query it with a publishable key and a user JWT, it consults the row-level security policies on the target table and returns only the rows the policies allow. If the policies are missing or misconfigured, PostgREST does not return a 403 or a 500. It returns an empty JSON array and a 200 status code. A monitor that asserts status_code = 200 sees a happy response. Your customer sees a blank screen.
From Supabase's Row Level Security docs, the behavior is documented verbatim:
"Once you have enabled RLS, no data will be accessible via the API when using a publishable key, until you create policies."
And, on the default scope:
"RLS must always be enabled on any tables stored in an exposed schema. By default, this is the public schema."
Read those sentences together. RLS is supposed to be on. With RLS on and no policy, the API returns nothing. The failure mode is not an HTTP error code. The failure mode is an empty array dressed as a success. This is the load-bearing reason a status-code probe is not enough for a Supabase-backed app: the platform's primary authorization layer fails by returning success. The same family of silent-success failures shows up across cloud platforms in our silent outages HTTP misses writeup; the Supabase version is just the cleanest case.
A migration that drops a policy, a CI step that disables RLS on a table by accident, a refactor that renames the column the policy references: all three produce the same outside-visible shape. 200 OK, empty array, customer dashboard blank. The rest of this post is what to assert on top of the 200 so the empty array trips an alert.
Feb 12 2026: 3h 42m of us-east-2 dark, every surface at once
On Thursday February 12 2026 at 21:12 UTC, Supabase deployed a new internal monitoring service that took out an entire AWS region. The post-mortem, signed by CEO Paul Copplestone, names the change as the root cause:
"We deployed a new internal monitoring service on February 12th that inadvertently enabled AWS's VPC Block Public Access feature at the regional level in us-east-2. This blocked all internet gateway traffic across every VPC in the region."
The blast radius was the whole region. Verbatim, again:
"All Supabase customers with projects hosted in the us-east-2 region were affected."
The incident started at 21:12 UTC and resolved at 00:54 UTC the next morning. 3 hours and 42 minutes. And the affected-surface list is the punchline for this post:
"Postgres databases, Auth, Data APIs, Edge Functions, Storage, Realtime, and any other Supabase service in that region"
That list is every Supabase primitive a customer's application reads through. A status-code probe on your own app URL during those 3h 42m would have caught the most visible surface, your web app erroring on its first database read, but it would not have told you which Supabase primitive was failing, whether Auth was issuing tokens, whether Edge Functions were accepting invocations, or whether Realtime channels were open. Each of those needs its own assertion. The pattern is the same pattern we apply to what /healthz should return on any backed-by-something API: assert on the read, not on the handler.
One incident does not justify a monitoring strategy by itself. The Feb 12 2026 outage is the recent reminder that a BaaS-backed app has a vendor surface area that lives outside your control, and the vendor's own status page is not a substitute for probing your own customer-visible path. The next four H2s are the assertions, one per Supabase primitive.
Auth + RLS read in one multi-step (the core wedge)
A multi-step API monitor lets you chain two HTTP calls in sequence and extract a value from the first response into the second request. For a Supabase-backed app, the chain that matters is: get a real user's access token, then use it to read a row the user is supposed to be able to read. If either step fails, your customers cannot use the app. If both steps pass, the auth-and-authorization path is verified from outside, end to end, on every probe interval. The general mechanic is in the multi-step mechanic reference; this section is the Supabase-specific shape.
Step 1 hits the auth endpoint with a dedicated test user's credentials. Step 2 reads from a table protected by RLS, using the access token captured in step 1, and asserts a known row marker appears in the response body. Re-authenticate fresh on every check; do not try to cache a refresh token, because Supabase refresh tokens are single-use (see FAQ #5 for the full reason). The 1-hour default access-token lifetime gives you all the headroom a 5-minute monitor interval needs.
The shape is small enough to describe in a paragraph. Step 1: POST /auth/v1/token?grant_type=password with the publishable key in the apikeyheader, and the test user's email and password in the body. Assert status_code = 200, then assert that $.access_token exists in the JSON response. Capture $.access_token for the next step. Step 2: GET /rest/v1/canary_rows with the captured token as Authorization: Bearer and the same apikey header. Three assertions: status_code = 200, body_contains the user-A-canary marker, and body_not_contains the user-B-canary marker. Screenshot 1 shows the wizard with the chain built end-to-end.
A few details earn their place. The apikey header carries the publishable key on both steps; PostgREST requires it on every request and the Auth endpoint requires it on the token call. The Authorization: Bearer header on step 2 carries the user JWT extracted from step 1. The body_not_contains assertion is the RLS-enforcement half of the wedge: if a policy got dropped and the read now returns rows owned by user B, the assertion fails. The two assertions together verify both that RLS lets the right rows through AND that RLS blocks the wrong rows.
Velprove's multi-step monitor runs each step once in sequence with the same 6 assertion types HTTP monitors use: status_code, body_contains, body_not_contains, json_path, response_time_ms, and header_contains. No conditional branching, no per-step retry, no wait-for-condition. That is enough to express the Supabase auth-then-read pattern cleanly, and the simplicity is what keeps the monitor deterministic across thousands of runs.
The test user posture matters. Provision a dedicated monitoring user with read-only access to a small canary table whose rows carry a stable marker string. Do not point this monitor at a real customer's account, do not give the user write permission, and rotate the password from your monitor's secret vault on the same cadence as your other service credentials. Two test users (one for body_contains, one to source the body_not_contains marker) are the discipline that turns this monitor from a liveness check into an RLS-enforcement check.
Edge Functions: cold start as a response-time tail, not a binary
Supabase Edge Functions run on V8 isolates inside Deno, with code packaged in ESZip format for fast boot. The architecture puts cold starts in the single-digit millisecond range under normal conditions; warm invocations return in roughly the same time as the function's own work. The Supabase team shipped a 2025 fix that moved workers performing initial script evaluation onto a dedicated blocking pool, which measurably reduced boot-time spikes in the long tail.
That is the right shape to monitor as a response_time_ms assertion rather than a binary up/down signal. Cold starts happen, they recover quickly, and a single slow cold start is not an outage. A sustained shift in the p95 tail is. Set the threshold at 1.5x to 2x your warm p95 measured over a real traffic window, not at a round number pulled from the docs.
A standalone HTTP monitor against the function URL is the right primitive. The function exposes a public HTTPS endpoint, and a Velprove HTTP monitor can probe it from any of the 5 global regions. Configure three success conditions: status_code = 200, response_time_ms under your warm-p95 threshold (1500 ms is a reasonable starting point for most Edge Function workloads), and body_contains a known string the function emits on the happy path. Screenshot 2 shows the Success Conditions step with all three assertions stacked.
The body_contains assertion is the part most Edge Function monitors skip and shouldn't. A 200 OK from a function that silently swapped its handler (a deploy that shipped the wrong build, an env var that flipped a feature flag) is still a 200. Asserting on a static string the function's real code path emits turns the same probe into a deploy-correctness check.
One trap to avoid: do not invoke functions that mutate state on the monitor path. The monitor runs on the configured interval from every region the monitor is configured in, forever. A function that writes a row on each call will accumulate millions of rows over a year. Use a read-only Edge Function path for the canary, or pass a request flag your function honors as a dry-run.
Browser-login on YOUR signed-in surface (not Supabase Studio)
A browser login monitor opens a real browser, navigates to a login page, fills the form with a test user's credentials, waits for the post-login route, and asserts that the page rendered the data it was supposed to render. For a Supabase-backed app, the page that renders post-login is the one that reads through RLS. If RLS is broken or the Data API is down, the page returns 200, renders the chrome, and shows an empty state. The browser login monitor catches that, because it asserts on the data, not on the response code. The general pattern is in our browser login monitor on your signed-in surface guide; the Supabase-specific detail is below.
Hard rule: never point this monitor at supabase.com, the Supabase dashboard, or Supabase Studio. The target is the customer's own application: the URL your real users open to sign in. Supabase's own UIs sit behind device verification, captchas, and account-level protections the monitor cannot complete. The monitor's job is to verify the path your customer takes through your app, which happens to be backed by Supabase Auth and the Supabase Data APIs.
The Supabase-distinguishing detail in the assertion: set the monitor's post-login success check to a string that only renders after the dashboard's first RLS-protected read completes. A customer name pulled from the profiles table. An invoice ID from the invoices table. The label of a plan the user is actually on. If the RLS policy on that table drops, the page loads but the string never renders, and the monitor fails. For a stronger signal, set the success check to selector_visible on a DOM element that only renders after the post-login RLS read completes (a row from the user's profiles table, an invoice ID from the invoices table). That catches the case where the page renders cached chrome but the user's data layer underneath has gone empty.
The free plan covers 1 browser login monitor at a 15-minute interval. That is enough for the production signed-in path of a single application. Paid plans add more browser monitors and tighter intervals. The browser monitor is the one assertion in this post that catches a class of failures the API-only monitors cannot: an authenticated page that renders a 200 but is functionally broken because the data layer underneath it returned nothing.
Realtime: probe a customer-side /realtime-health endpoint
Supabase Realtime runs on Elixir with the Phoenix Framework and delivers three primitives over WebSockets: Broadcast, Presence, and Postgres Changes. Postgres Changes adheres to RLS policies on the tables you subscribe to. The whole thing is fast and well-engineered. The whole thing also has no probe surface Velprove can directly assert on, because we have no websocket primitive in any of our monitor types (http, api, multi_step, browser).
The right response is to push the freshness window into the customer's own infrastructure. Stand up a tiny server-side process that subscribes to the channel you care about, records the timestamp of the last event it received, and exposes an HTTP endpoint that returns 200 or 503 based on how stale that timestamp is. The endpoint owns the freshness logic, and Velprove asserts status_code = 200 on it from outside. This is the same /healthz pattern for compound dependencies: compute the verdict server-side, expose a binary endpoint, probe the binary.
A minimal implementation in Node with the supabase-js client:
// /realtime-health server-side subscriber + endpoint
import { createClient } from "@supabase/supabase-js";
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!
);
let lastEventAt = Date.now();
supabase
.channel("public:orders")
.on(
"postgres_changes",
{ event: "*", schema: "public", table: "orders" },
() => {
lastEventAt = Date.now();
}
)
.subscribe();
// HTTP handler (Express / Hono / native): GET /realtime-health
export function handler(_req: Request): Response {
const ageMs = Date.now() - lastEventAt;
const STALE_MS = 5 * 60 * 1000; // 5-minute tolerance for an active channel
const stale = ageMs > STALE_MS;
return new Response(stale ? "stale" : "ok", {
status: stale ? 503 : 200,
headers: { "Cache-Control": "no-store" },
});
}A few notes on the shape. The endpoint computes the verdict on each request from a server-local timestamp; nothing about the request itself drives the calculation. The Velprove monitor that probes it is the simplest HTTP monitor in this whole post: a GET against the /realtime-health URL on a 60-second interval with a single status_code = 200 assertion. No body assertion, no response-time threshold; the customer endpoint already encoded all of those concerns in its 200-versus-503 return.
That is it. The endpoint owns "what counts as stale," Velprove owns "tell me when it isn't 200," and the customer's real Realtime delivery path is being exercised continuously by the server-side subscriber. If the channel falls silent, the timestamp stops advancing, the endpoint flips to 503, and the next probe pages you.
Two disciplines matter. First, set the staleness tolerance to the real-world cadence of events on the channel: a 5-minute window on a channel that fires every few seconds catches a real silence quickly; a 5-minute window on a channel that fires once an hour will false-positive constantly. Second, the subscriber process needs its own uptime story; if it crashes the timestamp also stops advancing, which is correct alerting behavior but means you should keep the subscriber simple and run it as part of your normal application deployment, not as a one-off script.
When this post is the wrong one
Supabase is one BaaS, and this post is scoped to that surface. If you got here looking for something else, three sibling posts probably fit better.
If your question is platform-shaped, not BaaS-shaped. A Supabase-backed app still has a host that serves its frontend and a runtime that serves its backend. For platform-side monitoring of those hosts, the per-platform guides are Vercel, Render, Railway, Cloudflare Workers + Pages, and Heroku. Each covers the platform's own failure modes (cold starts, release-phase failures, Eco dyno sleep, regional outages) which are orthogonal to Supabase's.
If your question is about choosing between a browser monitor and an HTTP monitor. The general rule of thumb is in the browser vs HTTP decision tree. The short version for Supabase: use an HTTP or multi-step monitor for the API surface, and use a browser login monitor for the customer-facing signed-in path that reads through RLS. Both, not either.
If you are not sure which Velprove plan covers this. The four-monitor Supabase set in this post (multi-step auth+RLS, HTTP Edge Function, HTTP Realtime freshness, browser login) fits inside the free plan: 10 monitors total, 1 browser login monitor, multi-step up to 3 steps. If you need more browser monitors, multi-step chains longer than 3 steps, Slack/Discord/Teams/Webhooks delivery (Starter), or PagerDuty (Pro), see which Velprove plan fits your shape.
Frequently Asked Questions
How do I assert that RLS is actually enforced and not silently disabled?
Provision two low-privilege test users in your Supabase project, A and B, with disjoint row ownership: a row only A can read carries a marker string user-A-canary, and a row only B can read carries user-B-canary. Run the Auth + RLS multi-step monitor as user A. On the RLS read step, assert two conditions in this order: body_contains the user-A-canary marker, and body_not_contains the user-B-canary marker. If RLS is enforced, both pass. If RLS was disabled on the table (or the policy got dropped during a migration), the read returns both rows, the second assertion fails, and Velprove pages you. Velprove cannot tell you that the policy is misconfigured, only that the expected row scope changed. That symptom-not-cause framing is enough to put a human on the database within minutes.
Can Velprove monitor a Supabase Edge Function cold start?
Yes. Create a Velprove HTTP monitor against your function URL and add a response_time_ms assertion at roughly 1.5x to 2x your warm p95. Edge Functions run on V8 isolates with ESZip cold starts in the single-digit-millisecond range under normal conditions, but boot-time spikes still happen on first invocation after idle. Setting the threshold above warm p95 catches the long tail without paging on every warm request.
What if my Supabase Realtime channel stops delivering events?
Velprove has no websocket primitive, so the realtime channel itself is not directly probeable. Move the freshness window into your own infrastructure: expose a /realtime-health endpoint that subscribes to the channel server-side, records the timestamp of the last delivered event, and returns 200 when the gap is below your tolerance or 503 when it exceeds it. A Velprove HTTP monitor asserts status_code = 200 on that endpoint on your normal interval. The endpoint owns the freshness logic, and Velprove owns the alerting and the global probe origins.
Should the multi-step monitor use the service-role key or the anon/publishable key?
The publishable (anon) key plus a real test-user JWT obtained at the first step. The service-role key bypasses RLS by design. A monitor authenticated with the service-role key will return rows whether or not the policy enforces correctly, so the entire RLS-enforcement wedge collapses to noise. Use the publishable key in the apikey header and the test user's access_token in the Authorization: Bearer header. That is the same posture your real customer's browser uses, which is the posture you want to be monitoring.
Why does my Supabase Auth multi-step fail every other check with "Invalid Refresh Token"?
Supabase refresh tokens are single-use. From the Sessions docs: a refresh token can only be used once to exchange for a new access-and-refresh-token pair. If your monitor caches a refresh token between checks and tries to refresh on the second run, the first refresh consumed the token and the second call gets "Invalid Refresh Token." The fix is to not refresh at all. Call POST /auth/v1/token?grant_type=password fresh on every check, get a brand new access token, and discard everything when the check completes. A 5-minute monitor interval against a 1-hour access-token lifetime means you never approach expiry anyway, and the refresh-token consumption problem stops existing.