Guide

API Health Check Patterns: What /healthz Should Return

10 min read

In one paragraph: Liveness, readiness, and startup are three different probes that do three different jobs. Conflating them is how a single database hiccup turns into a full fleet restart. Your load balancer reads the HTTP status code, not your JSON body, so a 200 with "status":"degraded" is invisible. Keep liveness shallow, keep dependency checks in readiness, hide internal hostnames from public probes, and watch your /healthz from outside the cluster with a free Velprove API monitor. Start for free. No credit card required.

Most outage stories that start with /healthz end with someone conflating two of three different probes. The endpoint name is one path, but it can answer three questions, and the questions do not have the same right answer. This post is the engineer-pillar layer above our existing tutorial on how to monitor your /health endpoint with response validation. That post is "how to point a monitor at /health." This one is "what /healthz should actually return, why, and what it should never expose."

The three probes, defined the way Kubernetes defines them

The cleanest probe taxonomy in production use is the Kubernetes one, even if you do not run Kubernetes. Three probes, three jobs, three different consumers. Use the same separation on ECS, on Nomad, on a fleet of VMs behind an AWS ALB. The names change. The shape does not.

Liveness: is this process deadlocked?

Per the Kubernetes probe documentation: "Liveness probes determine when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress." A failed liveness probe restarts the container. That is the only thing it does. The contract is narrow on purpose: the probe should answer one question, "is this process so wedged that the only recovery is killing it," and the answer should be yes only when killing it is genuinely the right move.

Readiness: should this instance be in the load-balancer pool?

The same docs: "Readiness probes determine when a container is ready to accept traffic." A failed readiness probe does not restart anything. It de-routes traffic. From the docs: "If the readiness probe returns a failed state, the EndpointSlice controller removes the Pod's IP address from the EndpointSlices of all Services that match the Pod." This is the probe where you can legitimately check whether the database is reachable, whether your in-process cache is warm, whether the upstream identity provider is responding. A blip in any of those should pull this instance out of the pool, not kill it.

Startup: has the app finished initializing?

Startup is the gate during boot: "Startup probes verify whether the application within a container is started. If a startup probe is configured, Kubernetes does not execute liveness or readiness probes until the startup probe succeeds." This is the probe that exists so a slow JVM warm-up or a long migration job does not get killed by a 30-second liveness deadline three seconds after launch.

The taxonomy is not Kubernetes-only. The same three jobs exist on every orchestrator and every load balancer. Even on the Kubernetes API server itself, the unified /healthz endpoint is on the way out. Per the Kubernetes API server health endpoint docs: "healthz is deprecated (since Kubernetes v1.16), and you should use the more specific livez and readyz endpoints instead." The reason is exactly the conflation problem above. One path that pretends to answer three questions cannot tell its caller which question failed.

Why HTTP status codes matter more than your JSON body

The number one source of false confidence from a 200 OK on a health endpoint is a body that says one thing and a status code that says another. Your load balancer does not read JSON. It reads three digits.

The 200-with-degraded-body anti-pattern

Per the Kubernetes probe configuration docs: "Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure." Returning a 200 with {"status":"degraded"} in the body is functionally identical to returning a 200 with {"status":"ok"}. The orchestrator does not care. Neither does your load balancer.

AWS ALB target-group defaults

The defaults on an AWS Application Load Balancer target group, from the AWS ALB target group health checks docs: 30-second interval, 5-second timeout, 5 healthy checks to bring an instance into rotation, 2 unhealthy checks to take it out. Default path is /. Default matcher is the literal string 200. Anything outside that exact code is a failure. (GCP and Cloudflare both expose similar interval, timeout, and threshold knobs with different defaults; check your specific LB before assuming.)

When to actually return 503

Per RFC 9110 §15.6.4: "The 503 (Service Unavailable) status code indicates that the server is currently unable to handle the request due to a temporary overload or scheduled maintenance, which will likely be alleviated after some delay." The rule of thumb is short. Return 200 if this instance can still serve traffic. Return 503 if it should be pulled out of rotation. Anything else hides the signal.

One ALB-specific nuance worth knowing about, from the same AWS docs: "If a target group contains only unhealthy registered targets, the load balancer routes requests to all those targets, regardless of their health status." That is the fail-open backstop. It is not a feature you should rely on. It is the safety net that prevents a bad health check from taking the entire fleet dark in one zone.

The deep liveness anti-pattern

"Incorrect implementation of liveness probes can lead to cascading failures." That is verbatim from the Kubernetes probe documentation, and it is the most-cited engineering rule on this topic. The most common incorrect implementation is the one where someone writes a /healthz handler that opens a connection to the database, runs SELECT 1, and returns 200 only if the query succeeds.

Why a liveness probe that hits the database will eat your fleet

Sandor Szücs, in the canonical engineer reference on this topic (Liveness Probes are Dangerous): "A Liveness Probe in combination with an external DB health check dependency is the worst situation: a single DB hiccup will restart all your containers!" Colin Breck (Kubernetes Liveness and Readiness Probes) draws the rule directly: "Avoid checking dependencies in liveness probes. Liveness probes should be inexpensive and have response times with minimal variance."

The mechanism is simple. A 200ms database blip is normal. A liveness probe with a 5-second timeout treats it as healthy. Now make the blip 6 seconds. Every pod in every region fails its liveness check at the same moment. Every pod gets killed at the same moment. Every pod restarts at the same moment, hits the still-recovering database with a fresh connection storm, and fails liveness again. You have turned a 6-second database hiccup into a 5-minute app-wide cold start.

Container vs dependency, the right separation

The Kubernetes docs are explicit about how to split this correctly: "When your app has a strict dependency on back-end services, you can implement both a liveness and a readiness probe. The liveness probe passes when the app itself is healthy, but the readiness probe additionally checks that each required back-end service is available." The liveness handler should answer "is this Node.js process responsive," nothing more. The readiness handler is allowed to be opinionated about dependencies.

What "shallow" actually means

A shallow liveness handler in Node looks like this. No I/O. No async fetches. No database connection.

// liveness.js, shallow on purpose
export function livenessHandler(req, res) {
  res.statusCode = 200;
  res.setHeader("content-type", "application/json");
  res.end(JSON.stringify({ status: "ok" }));
}

And a 12-line Kubernetes probe config that points at separate paths so the two probes never get confused:

livenessProbe:
  httpGet:
    path: /livez
    port: 8080
  periodSeconds: 10
  timeoutSeconds: 1
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 2

The principle holds even if you do not run Kubernetes: keep two paths, point the orchestrator at the cheap one, point the load balancer at the more expressive one. Silent outages that HTTP monitors miss almost always start when those two paths get collapsed into one.

What to put in the response body

There is no IETF standard for the response body. The expired draft you may have seen cited in older posts is, per the IETF Datatracker page itself: "This Internet-Draft is no longer active." (Last revision October 16, 2021. Expired April 19, 2022.) Anything still treating application/health+json as a standard is wrong by four years.

What does exist is three reference shapes that production frameworks actually use. Pick the one your tooling already speaks.

Spring Boot Actuator returns a top-level body of {"status":"UP"} by default. Per the Actuator endpoint docs: "These indicators are shown on the global health endpoint (/actuator/health). They are also exposed as separate HTTP Probes by using health groups: /actuator/health/liveness and /actuator/health/readiness."

ASP.NET Core returns plain text by default per the ASP.NET Core health checks docs: the literal string Healthy, Degraded, or Unhealthy, with no JSON body at all unless you add a custom response writer.

The Kubernetes API server returns an empty body with a 200 on success on /livez and /readyz. That is the most defensible shape: zero bytes is zero info-disclosure surface.

A pragmatic body for an internal-facing detailed endpoint, when you want one, looks like this:

{
  "status": "ok",
  "version": "1.42.0",
  "build_sha": "a1b2c3d",
  "time": "2026-05-10T14:00:00Z",
  "checks": { "db": "ok", "redis": "ok", "auth_provider": "ok" }
}

Six fields, no hostnames, no library versions, no stack traces. A short build SHA is enough to identify what is running. The detailed dependency block belongs behind auth, not on the public probe.

What to leak and what to hide

Your /healthz is reachable from anywhere your app is. Treat it as untrusted-caller-visible. The principle is least exposure on the public probe; dependency detail behind auth or on a separate internal-only path.

Safe to expose on a public probe: a top-level status string, a short build SHA or version tag, a service name, a server timestamp. Nothing else.

Do not expose on a public probe: dependency hostnames or connection strings (recon material), internal IPs (lateral movement material), library or runtime versions (CVE targeting material), stack traces or partial errors (logic disclosure), database identifiers, or queue depths (capacity reconnaissance).

The two-tier pattern is what Spring Actuator and ASP.NET Core both default to: a public, minimal /healthz for the load balancer, and an authenticated /healthz/details for humans and internal tooling. Spring exposes the verbose body only when management.endpoint.health.show-details is set; ASP.NET Core defaults to a single status string and forces you to opt in to a custom response writer. Both defaults are secure on purpose. If you write your own framework integration, copy the pattern.

Framework cheat sheet

Five common stacks, the canonical setup, and the gotcha each one trips on most often.

Spring Boot Actuator (Java)

// pom.xml: spring-boot-starter-actuator
// application.properties:
// management.endpoints.web.exposure.include=health
// management.endpoint.health.probes.enabled=true

Spring exposes /actuator/health/liveness and /actuator/health/readiness as separate paths once probes are enabled. Gotcha: leaving show-details on always in production turns the public probe into an info-disclosure endpoint. Default is never; keep it that way unless the path is auth-gated.

ASP.NET Core (C#)

builder.Services.AddHealthChecks();
app.MapHealthChecks("/healthz");

Two lines. Microsoft's own docs use /healthz as the example path. Gotcha: registering an AddDbContextCheck on a MapHealthChecks("/healthz")that is wired into the orchestrator's liveness probe is the textbook deep-liveness anti-pattern. Wire dependency checks to a separate readiness path, not the liveness one.

Express + terminus (Node.js)

import { createTerminus } from "@godaddy/terminus";
import http from "node:http";

const server = http.createServer(app);
createTerminus(server, {
  healthChecks: { "/livez": () => Promise.resolve(), "/readyz": readyCheck },
  signal: "SIGTERM",
});

Gotcha: terminus also handles graceful shutdown on SIGTERM. The handler in your readiness check should start failing the moment SIGTERM arrives, so the load balancer pulls the pod before the existing connections drain. Most teams forget this and serve 200 from /readyz right up until the process exits.

FastAPI (Python)

from fastapi import FastAPI

app = FastAPI()

@app.get("/livez")
def livez():
    return {"status": "ok"}

Three lines. Gotcha: FastAPI's dependency injection makes it easy to Depends(get_db_session) on a liveness route. Do not. The whole point of the deep-liveness section is keeping the database out of the liveness path.

Next.js App Router

// app/api/healthz/route.ts
export const dynamic = "force-dynamic";

export async function GET() {
  return Response.json({ status: "ok" });
}

Gotcha: without dynamic = "force-dynamic", the route can be cached at build time and stop reflecting the running process. Cold starts on serverless platforms add a separate signal-vs-noise problem; the dedicated Next.js production monitoring guide covers that piece in depth.

Watching the probe from outside the cluster

An internal probe sees what is internal. It does not see DNS misconfiguration. It does not see a regional CDN outage. It does not see a TLS certificate that expired at 03:00 because the renewal cron failed. None of those flip a Kubernetes liveness check, because the kubelet is connecting to the pod over its internal network. Your customers are connecting over the public internet. The two paths are not the same.

An external monitor against your /healthz, hitting the same URL your customers do, catches what the orchestrator cannot. Add a multi-step API monitor that hits /healthz first, then a protected route with a test bearer token, and you cover both reachability and authentication in one check. The 3-step pattern fits inside the free plan; the longer chain is what the multi-step API monitoring guide walks through.

A two-step Velprove monitor that hits the public /healthz, extracts the build SHA from the response, then verifies a protected /version route on the same deploy. The full chain catches deploy skew between the public probe and the authenticated path.
Velprove multi-step API monitor builder showing a two-step chain: Step 1 GET https://api.yourapp.com/healthz with a JSON path extractor on $.build_sha into a variable named sha and assertions for Status Code equals 200 and Response Time under 3000 milliseconds, and Step 2 GET https://api.yourapp.com/version with Authorization: Bearer test-token and X-Expected-SHA: {{sha}} headers asserting Status Code equals 200.

The full setup, four steps:

  1. Sign up for a free Velprove account. No credit card required. The free plan includes 10 monitors total, 1 browser login monitor, multi-step API monitors with up to 3 steps, 5-minute check intervals, and email alerts. Every plan, including free, runs checks from all five regions: North America, Europe, UK, Asia, and Oceania.
  2. Add a multi-step API monitor with /healthz as Step 1. Method GET, URL https://api.yourapp.com/healthz. Under Save values from response, extract $.build_sha into a variable named sha. Add a Status Code assertion equals 200 plus a Response Time assertion under 3000ms. This is the cheap, public, shallow probe, and the extracted build SHA is what Step 2 will verify against.
  3. Add a Step 2 GET against a protected route with a test bearer token. URL https://api.yourapp.com/version. Pass two headers: Authorization: Bearer <test-token> for auth, and X-Expected-SHA: {{sha}} so the protected route can compare its own build SHA against the one the public probe reported. Assert Status Code equals 200. Use a dedicated test account, not a real one. If your public probe and your private route ever disagree on build SHA, you have a deploy-skew bug, and this chain catches it.
  4. Configure your check interval and alert channel. The free plan runs every 5 minutes with email alerts. Starter at $19 per month adds Slack, Discord, Microsoft Teams, and webhooks at 1-minute intervals. Pro at $49 per month adds PagerDuty at 30-second intervals and 1-year dashboard incident history.

That is the whole thing. A 2-step monitor that watches the public probe, extracts the build SHA, and verifies the protected route is on the same deploy, from five regions, on a free plan, with commercial use allowed.

Frequently Asked Questions

What's the difference between liveness, readiness, and startup probes?

Liveness asks "should the orchestrator restart this container?" Readiness asks "should the load balancer send traffic here?" Startup asks "has the app finished initializing yet?" They run on different schedules and trigger different actions. Mixing them up is the most common cause of cascading restart failures on Kubernetes and on any orchestrator that copies the pattern.

Why is /healthz deprecated on the Kubernetes API server?

Kubernetes deprecated /healthz on its own API server in v1.16 in favor of /livez and /readyz, because a single endpoint that conflates liveness and readiness cannot tell an orchestrator when to restart versus when to stop routing traffic. The deprecation is a Kubernetes-internal API-server change. Your application can still serve /healthz if you want; what matters is keeping the underlying probes separated.

Can a single endpoint serve both liveness and readiness?

Yes, technically, but you should not. Any failure that should only affect routing (a slow database, a flaky upstream) ends up triggering a restart instead. If the endpoint cannot distinguish "the process is wedged" from "the database is slow," every readiness blip becomes a fleet-wide restart loop. Splitting them is a five-line refactor.

Should /healthz require authentication?

The orchestrator probe should not, because adding auth to liveness checks creates a new failure mode (auth service down equals all probes fail). The detailed dependency endpoint should, because dependency hostnames and version strings are recon material. The two-tier pattern (/healthz public and shallow, /healthz/details auth-gated) is what Spring Actuator and ASP.NET Core both default to.

What status code should a degraded service return?

Return 200 if the instance is still serving real traffic, even if degraded. Return 503 if the instance should not receive traffic. Most load balancers, including AWS ALB by default, route on the 200 range and pull traffic on anything else. Putting "status":"degraded" in the body of a 200 response is invisible to the load balancer. If you want the LB to act, change the status code.

Three probes, three jobs, three different right answers. Keep liveness shallow, keep dependency checks in readiness, return 503 when you want traffic pulled, and never put hostnames or stack traces on a public probe. Then watch the result from outside your cluster, because the orchestrator cannot see DNS, certs, or the public internet. Start for free, point a 2-step monitor at /healthz, and have it running in five minutes. No credit card required.

Start monitoring for free

Free browser login monitors. Multi-step API chains. No credit card required.

Start for free