Monitor a Heroku App: Eco Sleep, Release Phase, Scheduler
The short version: On June 10 2025 Heroku went down for up to 24 hours and Heroku's own status page went down with it, because both ran on the same affected infrastructure. External monitoring is not an extra. For 7 hours and 42 minutes it was the only signal anyone had. Beyond named incidents, Heroku has three platform primitives a 200 OK on your dyno URL cannot see: Eco web dynos sleep after 30 minutes of inbound idle, Release Phase failures email the deployer but leave your public URL serving yesterday's code, and Heroku Scheduler is documented as "expected but not guaranteed." Classic Cedar Eco ($5) and Basic ($7) dynos get zero native threshold alerting. You upgrade to a Standard-1x dyno at $25 per dyno per month just to unlock email alerts on response time. The Velprove free plan covers the same gap with 10 monitors total, 1 browser login monitor, and multi-step API monitors, no credit card, commercial use allowed.
Heroku's own status page went down with the platform on June 10 2025
On Tuesday June 10 2025, Heroku went down for up to 24 hours, and Heroku's own status page went down with it. The incident started at 06:00 UTC when an automated operating system update ran against production infrastructure that was meant to have automated upgrades disabled. The update restarted host networking, the routes did not reapply, and outbound connectivity for every dyno on every affected host severed at once. Heroku identified root cause at 13:42 UTC, seven hours and forty-two minutes after the first dynos failed. Customer impact persisted for up to 24 hours on the long tail.
From Heroku's own postmortem, published 2025-06-15:
"Our internal tools and the Heroku Status Page were running on this same affected infrastructure. This meant that as your applications failed, our ability to respond and communicate with you was also severely impaired."
That sentence is the entire reason this post exists. For the first eight hours of the incident, the vendor status page that every Heroku customer reflexively refreshes during an outage could not tell them anything, because the status page was inside the outage. External monitoring stopped being theoretical insurance and became the only signal anyone had. Heroku has since said no system changes will occur outside its controlled deployment process going forward. That is the right corrective action. It does not change the structural lesson: a status page sitting on the same platform as the product it reports on is a single point of failure, and external monitoring is the second point.
The rest of this post is what an external monitor should watch on a Heroku app between outages of that scale, which is most of the time. Three platform primitives, one Standard-only alerting wedge, and four concrete Velprove monitors.
Why a 200 OK on your Heroku app URL is not enough
A single GET on your Heroku app URL watches one thing: the web dyno answering on port $PORT. That is the smallest part of most real Heroku deployments. Behind the web dyno sit Release Phase steps that run migrations and asset compiles before a new release promotes, Scheduler jobs that fire on cron and run as one-off dynos, worker dynos that drain queues with no inbound traffic at all, and an auto-restart cycle that bounces every dyno at least once every 24 hours. None of those have a public URL, and a status-code probe pointed at / cannot see any of them.
Free-tier Heroku ended on 2022-11-28. Eco dynos at $5 a month for a shared 1,000-hour pool replaced the old free tier, and most indie-hacker Heroku apps now run on Eco or Basic. The hosting-economics half of that decision is its own conversation, covered in the indie-hacker free-stack guide. This post assumes you have already made that call and is about the platform surface a URL monitor cannot see.
The reason the distinction matters: page-level failures change the page, so a URL monitor catches them. Platform-level failures degrade the product without changing the page. Your marketing site can keep serving a clean 200 for hours after the Scheduler job that bills your customers skipped a run, after Release Phase failed and left you on yesterday's code, or after the database the public URL queries silently lost its connection pool. The rest of this post is the platform layer.
Eco dyno sleep is inbound-idle, the opposite of Railway
Heroku Eco web dynos sleep when no web traffic arrives, not when the dyno stops sending outbound traffic. Heroku's Eco Dyno Hours docs state the rule verbatim:
"If an app has an Eco web dyno and that dyno receives no web traffic in a 30-minute period, it sleeps. Eco web dynos do not consume Eco dyno hours while sleeping."
And on wake:
"the dyno becomes active again after a short delay."
Heroku does not publish a cold-start latency number. Community observation puts it at a handful of seconds for Node, a few more for Rails or Django, longer for the JVM. The honest framing is a short delay, qualitatively, with the actual number determined by your stack.
The mechanic runs in the opposite direction from Railway. On Railway, a service sleeps when it has not sent outbound traffic for 10 minutes, and an external probe arrives as inbound traffic that does not reset the sleep clock (covered in the Railway platform-layer guide). On Heroku Eco the clock is inbound. A Velprove HTTP probe on a 5-minute interval arrives as inbound web traffic six times every 30 minutes, so the Eco web dyno never sees a full 30 minutes of silence, so it never sleeps. The probe is keeping the dyno warm whether or not you want it to.
That comes with a cost. The Eco dyno-hour pool is 1,000 hours shared across every Eco dyno on your account. A single Eco web dyno held awake 24/7 burns about 720 hours per month (24 hours times roughly 30 days). One always-awake Eco web dyno fits comfortably in the pool with headroom for a second small Eco service. Two always-awake Eco web dynos overflow into billed dyno time at the Basic per-second rate. The Velprove pattern on Eco is honest about that tradeoff: a 5-minute probe interval keeps your one production Eco web dyno warm and observable, and if you have a second Eco service you slow the second probe to a 10-minute interval or accept the overflow.
For a Heroku-hosted SaaS where cold-start latency matters, an always-warm Eco web dyno is the correct configuration. For a side project that genuinely does not care about a few seconds of cold-start delay on first request, a slower probe interval saves pool hours at the cost of detection lag. The rule is the principle, not a number: match the probe interval to how fast the failure matters.
Release Phase failures leave your public URL on yesterday's code
Release Phase is the lifecycle stage that runs after a build and before a release is promoted to the dyno formation. It is where database migrations and asset compile steps typically live. When the release command fails, the new release does not promote. The public URL keeps serving the previous release.
From Heroku's Release Phase docs:
"If the release command exits with a non-zero exit status, or if it's shut down by the dyno manager, the release fails. In this case, the release is not deployed to the app's dyno formation." Heroku does send an email when this happens:
"An email notification is generated in the event of a release phase failure."
The honest wedge is sharper than "Release Phase fails silently." The email arrives. The problem is what the email covers and what it does not.
The email goes to the deployer, the developer who pushed the release. On a one-developer indie project that is the same person who would have configured an external monitor. On a small team, the deployer is whichever developer last pushed, not the on-call engineer. On a larger team with a deploy bot or a CI/CD pipeline, the email may be going to a shared inbox no human reads. The notification is real, but it is point-to-point email to a known address, not a routable alert into PagerDuty or Slack.
The bigger problem is what the URL looks like. From the outside, a Heroku app whose Release Phase just failed looks identical to a Heroku app where the release succeeded and did not break anything: the public URL returns 200, the HTML looks correct, and the responses are consistent. The previous release is still running. If the migration that just failed was the one that adds a column three new code paths depend on, the next time those code paths run they will 500, but right now the URL is fine. Nothing has actually deployed, and the URL still looks normal.
The pattern that closes the gap is a build-version probe. Expose a /version endpoint that returns the current git SHA, wired from an environment variable that Heroku sets at build time. The recommended form is a multi-step API monitor: Step 1 hits /version and captures $.build_sha into a variable, Step 2 calls a second route that compares its own runtime SHA against the captured value and returns non-2xx on mismatch. The comparison lives server-side in your app, the monitor just orchestrates, and the setup survives every future deploy unchanged. Available on every plan including free up to 3 steps.
The shorter setup is a plain HTTP monitor with body_contains set to the SHA your build just produced. It works for the current deploy and stales on the next, because the deployed app starts returning the new SHA while the monitor keeps asserting the old one. Use this form only when your CI/CD pipeline updates the assertion on every deploy via Velprove's PUT /api/checks/<id> API. When Release Phase fails and the previous release keeps serving, either assertion fails and the monitor pages you. Velprove does not provide a native deploy-skew detector; your /version assertion is the detector. The full assertion pattern is in our API health check patterns reference; this section is the Release Phase framing on top of it.
Recovery on Heroku is two commands. heroku releases:retry reruns the release without a new build, useful when the failure was an external dependency such as a Postgres instance that was briefly unavailable. heroku rollback promotes a prior release if the failed release uncovered something that needs a code fix. Either way, the monitor told you to run them.
Heroku Scheduler is "expected but not guaranteed"
Heroku Scheduler is a free add-on that runs jobs on a cron-like schedule by spawning a one-off dyno that executes the configured command. The killer detail is in Heroku's own Scheduler docs:
"Scheduler job execution is expected but not guaranteed. Scheduler is known to occasionally (but rarely) miss the execution of scheduled jobs."
And, in the same article:
"In very rare instances, a job may be skipped. In very rare instances, a job may run twice."
Read those sentences twice. Heroku is documenting that Scheduler can skip a run and can double-run a run, with no native alert for either case. If you bill customers from a Scheduler job, settle balances from a Scheduler job, or send a daily report from a Scheduler job, the platform has formally disclaimed the guarantee. The contract is best-effort.
Both failure modes are silent from outside. A job that runs and exits non-zero produces logs in your platform-aggregated log stream, which you have to be looking at. A job that Scheduler skips produces nothing, because from Scheduler's side nothing happened. The Render counterpart (covered in the Render platform-layer guide) at least emails on a failed run; Heroku Scheduler does not even emit that signal for the skip case.
The pattern that works is a heartbeat URL the job hits at the end of its successful run, paired with a freshness endpoint a probe asserts against. The job writes a timestamp to Postgres or a key value store on durable success, after the work is done, not on entry. A small companion route reads that timestamp, computes its age, and returns 503 when the age exceeds the job cadence plus a grace window, 200 otherwise:
// companion web service, /jobs/billing/freshness
const last = await db.get("scheduler:billing:lastRun");
const ageMs = Date.now() - new Date(last).getTime();
const STALE_MS = 25 * 60 * 60 * 1000; // 25h grace for a daily job
return new Response(ageMs > STALE_MS ? "stale" : "ok", {
status: ageMs > STALE_MS ? 503 : 200,
});A Velprove HTTP monitor asserts status_code = 200 on that endpoint. The endpoint flips to 503 the moment the job goes stale, so a 200 is the whole check. Both Scheduler failure modes, skipped run and run-that-exited-non-zero-without-writing-the-stamp, collapse into one signal: the timestamp did not advance. The detection lag is bounded by the probe interval, not by Scheduler.
One discipline matters: the job must write the timestamp on real progress, not on entry. A job that fails partway through and exits non-zero before the final write looks correctly stale from the freshness endpoint. A job that writes the timestamp before doing the work would look fresh while never actually completing.
Eco and Basic dynos get no native threshold alerting
This is the load-bearing economic wedge of the post. Heroku has a first-party alerting feature called Threshold Alerting that emails you or pages PagerDuty when response time or failed-response rate crosses a configured threshold. It runs on top of App Metrics, which is the dashboard view of your dyno's performance over time.
Two quotes from Heroku's Application Metrics docs define the tier boundary:
"Application metrics aren't available for apps using eco dynos.""The Threshold Alerting feature is available to apps running on Professional dynos (standard-1x,standard-2xandperformance) and all Fir dynos."
The economic shape is this: classic Cedar Eco dynos at $5 a month and classic Cedar Basic dynos at $7 a month do not have App Metrics, so they cannot have Threshold Alerting, so they have zero native uptime alerting from Heroku at all. The cheapest classic dyno that includes Threshold Alerting is Standard-1x at $25 per dyno per month. That is a 5x jump in dyno cost for an Eco shop and a 3.5x jump for a Basic shop, paid not for more compute but for the right to receive an email when response time crosses a threshold.
The Fir-generation entry-tier dyno (Heroku's next-generation Kubernetes-based runtime) does include alerting on its low-cost tier per the Threshold Alerting tier quote. The Eco-no-alerting claim scopes specifically to classic Cedar Eco dynos. If you are running on Fir already, your alerting story is different and worth checking against the current Fir docs.
The third option is external. A Velprove free plan covers 10 HTTP monitors at a 5-minute interval, 1 browser login monitor at a 15-minute interval, multi-step API monitors up to 3 steps, and 1 status page, with email alerts on every plan including free. That gives a Cedar Eco shop response-time and failed-response alerting without changing dyno tier, plus the Release Phase and Scheduler coverage Threshold Alerting cannot give you even on a Standard dyno. The math is straightforward: $0 for external alerting versus $240 a year per dyno to unlock native alerting. The right answer is both, for most teams, but the cost of starting with external is zero and the marginal benefit is high.
Setting up the 4 Velprove monitors for a Heroku app
Put the patterns above together and the Heroku-side coverage lands in four concrete monitors. All four fit inside the Velprove free plan: 10 monitors total at a 5-minute interval, 1 browser login monitor at a 15-minute interval, multi-step API monitors up to 3 steps, email alerts, SSL expiry monitoring, and 1 status page with a Velprove badge. Each monitor probes from one of 5 global regions you pick at setup time. Every plan picks from the same 5 regions; to cover multiple regions, you create multiple monitors.
- Create a browser login monitor on the signed-in path. This is the Velprove differentiator and the monitor that catches the most subtle Heroku failures, so it goes first in the canonical order. Create a new browser login monitor against your Heroku app's login URL with a dedicated low-privilege test user. The monitor drives a real browser, signs in as the test user, follows the post-login redirect, and asserts on the landing page. Under Customize detection, switch Success verification from the default URL-change to Page contains text and set it to a string that only renders when a real database read succeeded: a customer name, an invoice ID, a known plan label. A Release Phase that fails the migration adding a column the login flow depends on will land the user on an error page with a 200 status, which a URL probe would miss and the browser login monitor catches. The monitor is free on every plan, including the free plan, at a 15-minute interval.
- Add an HTTP monitor on the public URL with a build-SHA assertion. Create an HTTP monitor against your public Heroku app URL on a 5-minute interval. On the Verify step, add two Success Conditions in order:
status_code = 200andbody_containsset to the build SHA exposed at/version. The body_contains rule turns the same probe into a Release Phase detector when your CI/CD pipeline updates the assertion value via Velprove'sPUT /api/checks/<id>API after each successful build. Without that CI integration, body_contains stales on the next deploy because the app starts returning the new SHA while the monitor keeps asserting the old one. The deploy-survives-unchanged alternative is the multi-step monitor in step 4. Pick the region closest to your real users. On Eco, this probe also keeps the dyno warm by arriving as inbound web traffic every 5 minutes, which resets the 30-minute sleep clock. - Add an API monitor on the Scheduler heartbeat endpoint. Create an HTTP monitor (API-shaped) against the freshness endpoint your Scheduler job updates on real success. Assert
status_code = 200. The endpoint returns 503 when the timestamp goes stale, so a 200 is the whole check. Match the probe interval to the job cadence: a daily Scheduler job is comfortable on a 5-minute probe with a 25-hour grace window in the endpoint logic; an hourly job wants a tighter grace window. The detection lag is bounded by your probe interval, not by Scheduler. - Add a multi-step API monitor for deploy verification. Create a multi-step API monitor. Step 1 hits
/versionand captures$.build_shainto a variable using ajson_pathassertion. Step 2 hits a second route that compares its own runtime SHA against the captured value and returns non-2xx on mismatch. Multi-step monitors run each step once in sequential order, with the same 6 assertion types HTTP monitors use:status_code,body_contains,body_not_contains,json_path,response_time_ms, andheader_contains. No polling, no retry-until, no wait-for-condition. The free plan covers multi-step up to 3 steps; Starter covers up to 5 and Pro up to 10. This monitor is the upgrade path from the body_contains assertion in monitor (2): the SHA comparison lives server-side in your app, so the setup survives every future deploy unchanged.
That is four monitors out of your ten total slots: one browser login monitor, two HTTP monitors, and one multi-step monitor. The remaining six slots are room for a database health endpoint, a third-party API dependency, a second region on a critical path, or a second environment such as staging.
Email alerts are included on every plan, including free. Slack, Discord, Microsoft Teams, and webhook alerts unlock on Starter. PagerDuty integration is on Pro for teams that route alerts into an on-call rotation. The free plan's status page carries a Velprove badge; the badge comes off on paid plans.
What Velprove cannot catch on Heroku
A monitor that pretends to catch everything is lying. The honest boundary on a Heroku app has four parts.
Most multi-factor authentication flavors on the browser login monitor. If your Heroku app's login flow requires an SMS code, an email code, a magic link, a push approval, or a passkey, the browser login monitor cannot complete it. Velprove cannot read your phone, your inbox, or your authenticator app. The monitor works on login flows where the dedicated test user can sign in with a username and password. For consumer SaaS where every user is forced through SMS or email-code MFA, the browser login monitor pattern is not the right tool; an HTTP monitor on a post-login API endpoint with a service token is. This is not a Heroku-specific limit, but it bites Heroku-hosted apps the same way it bites apps anywhere else.
Heroku platform internals that are invisible to an external probe. Velprove sees what the public URL returns. Velprove does not see dyno-level CPU or memory pressure before the request reaches the dyno, the state of the router queue, or the internal health of Heroku Postgres beyond what your application code exposes. App Metrics on Standard-1x and up sees those; an external probe sees the consequences. The two views complement each other, and on a small Eco app the external view is the only view available.
Fir-generation entry-tier dyno alerting. The Eco-no-alerting framing in this post scopes to classic Cedar Eco dynos. The Fir generation, Heroku's next-generation Kubernetes-based runtime, includes Threshold Alerting on its equivalent low-cost tier. If you are on Fir, your native alerting story is meaningfully different and worth checking against current Heroku docs before assuming this post's wedge applies to you. The external pattern still helps for the Scheduler skip and Release Phase build-SHA cases on Fir, because those are not threshold-shaped signals.
Dyno restart skew. Per Heroku's Dyno Restarts docs, the dyno manager restarts every dyno at least once per day on a jittered 24-hour-plus-216-random-minutes cycle. During a deploy, some dynos can be on the new release and some on the previous release for a short window. This is documented, intentional, and customer-tolerated behavior, not a failure mode you should wire alerting around. One sentence acknowledgment, not a wedge.
Getting started
The Velprove free plan covers 10 monitors total at a 5-minute interval, 1 browser login monitor at a 15-minute interval, multi-step API monitors up to 3 steps, 5 global regions to choose from (one per monitor), email alerts, SSL expiry monitoring, and 1 status page with a Velprove badge. Commercial use is allowed on every plan, including free. No credit card required.
That is enough to land the four-monitor Heroku set described above for a single production app: a browser login monitor on the signed-in path, an HTTP monitor on the public URL with a build-SHA assertion, an HTTP monitor on a Scheduler heartbeat endpoint, and a multi-step API monitor for deploy verification. Start with the free plan. The first monitor takes about three minutes to configure.
Frequently Asked Questions
How do I monitor a Heroku Eco dyno that sleeps after 30 minutes?
Point a Velprove HTTP monitor at your public Heroku app URL on a 5-minute interval and assert status_code = 200 plus body_contains on a static marker your real app emits. The probe arrives as inbound web traffic, which resets the 30-minute Eco sleep clock and wakes the dyno on the first hit after a sleep. The tradeoff is dyno-hour pool burn: a single Eco dyno held awake 24/7 consumes about 720 hours of the 1,000-hour Eco pool every month, which is fine for one app and tight if you run two. If you have a second Eco app on the same account, slow the probe interval or accept overflow billing.
How do I detect a Heroku Release Phase failure when the public URL still returns 200?
Expose a /version endpoint that returns the current git SHA from a build-time environment variable, then assert it with Velprove. The recommended form is a multi-step API monitor: Step 1 captures $.build_sha from /version into a variable, Step 2 hits a second route that compares its own runtime SHA against the captured value and returns non-2xx on mismatch. The comparison lives in your app, the monitor just orchestrates, and the setup survives every deploy unchanged. The lighter alternative is a plain HTTP monitor with body_contains set to the current SHA, but body_contains goes stale on your next deploy unless your CI/CD updates it via Velprove's PUT /api/checks/<id> API. When Release Phase fails and the previous release keeps serving, either assertion fails on the next probe and Velprove pages you. The two-step X-Expected-SHA chain that does this server-side is walked through in the /healthz + /version multi-step pattern.
Does Heroku Scheduler alert me when a job does not fire?
No. Heroku's own Scheduler docs say job execution is expected but not guaranteed and that jobs may occasionally be skipped or run twice. Heroku sends no email and no webhook when a scheduled job fails to fire. The pattern that closes the gap is a heartbeat URL the job hits on real success, with a companion freshness endpoint that returns 503 when the timestamp goes stale.
Why do I need external monitoring if I am paying for Heroku Standard dynos?
Heroku's Threshold Alerting on Standard-1x and above watches response time and failed responses on the web dyno. It does not watch Scheduler runs, it does not catch a Release Phase failure that leaves you on the previous build SHA, and on June 10 2025 it could not tell you anything because Heroku's own status page went down with the platform on the same affected infrastructure. Threshold Alerting is a useful inside-the-platform signal. An external probe from outside Heroku is what gives you signal when Heroku itself is the failure. The two complement each other; one does not replace the other.
Can I monitor a Heroku app on the Velprove free plan?
Yes. The Velprove free plan covers 10 monitors total at a 5-minute interval, one browser login monitor at a 15-minute interval, multi-step API monitors up to 3 steps, email alerts, SSL expiry monitoring, and 1 status page. Commercial use is allowed and no credit card is required. That is enough to land an HTTP probe on your web URL, a build-SHA assertion on /version, a Scheduler heartbeat on a freshness endpoint, and a browser login monitor on the signed-in path of a Heroku-hosted SaaS.
Does a Velprove probe keep my Heroku Eco dyno from sleeping?
Yes, and that is the opposite of how Railway works. Heroku's Eco sleep clock is inbound-idle: if an Eco web dyno receives no web traffic in a 30-minute period, it sleeps. A Velprove HTTP probe arrives as inbound web traffic, so a 5-minute probe interval resets the sleep clock every 5 minutes and the dyno stays warm. The cost side of that decision is the 1,000-hour Eco pool shared across all Eco dynos on the account. One always-awake Eco web dyno burns about 720 hours per month, leaving room for one more small Eco service; two always-awake Eco dynos overflow into billed time. The Railway inverse, where service sleep is outbound-idle and an inbound probe does not reset the clock, is covered in the Railway outbound-idle sleep writeup.