Founder

The Solo Founder On-Call Rotation Playbook

Q: How many on-call alerts per week is too many?

PagerDuty's internal benchmark (https://www.pagerduty.com/blog/lets-talk-about-alert-fatigue/) is 15 alerts per week per on-call team before they trigger an alert-quality debrief. For a solo founder, treat 15 as a ceiling, not a target. If you are paged more than twice a day, the fix is fewer or smarter alerts, not a more resilient on-call. The cheapest rotation improvement at any team size is removing alerts that do not need a human.

May 11, 202615 min read

Quick take: Google's SRE Book is clear about the minimum number of engineers you need to run a sustainable on-call rotation, and the answer is eight. You have one. The question is not whether you can build what Google built. The question is what scales down. This is the structural playbook for solo founder on-call rotation design when the rotation is you, two of you, or the three of you who showed up last quarter.

The solo founder outage playbook covered solo founder incident response in the moment: who to be at 3 AM, what to check first, what to say to customers. This post is the layer underneath. It is the startup on-call schedule design you decide on a calm Tuesday so that the page routes to the right person, fires for the right reasons, and does not arrive during the one week of the year you booked off. On-call rotation for a small team starts with founder on-call coverage being a real question, not a default. If you are reading this mid-incident, close the tab and read the outage playbook. If you are reading this on a Tuesday, this is the one to read now.

The SRE book floor of 8 engineers, scaled to 1, 2, and 3

The relevant chapter is Chapter 11 of the Google SRE Book, Being On-Call. The team sizing math is explicit: the minimum number of engineers needed for on-call duty from a single-site team is eight, with a primary and secondary rotating week-long shifts. Dual-site follow-the-sun lets you drop to six per site, because each site covers the other's nights. Below those floors, Google says, you will burn the rotation out.

Google also puts a hard ceiling on how much SRE time goes to on-call. At least 50% of SRE time goes to engineering. Of the remaining 50%, no more than 25% can be spent on-call. That leaves another 25% for non-project operational work. The arithmetic assumes the team can absorb a paged-awake night every few weeks without the rest of the week collapsing. The headcount floor and the time cap are two sides of the same constraint.

You are not Google. What survives the translation from 8 engineers with two awake on every shift to 1 founder asleep until the phone rings is the underlying principle, not the constants. Reduce the rate of alerts. Make every page worth waking for. Pre-decide who gets it and when. The mechanism scales down. The 8-engineer floor and the 25% cap do not.

The rest of this post is the scaled-down version at three headcounts: 1, 2, and 3. Plus the labor-law question that shows up when you try to add a part-time backup, and the structural question of what to walk back when a co-founder leaves.

The one-person on-call rotation: what you actually have when you are solo

At 1 person, the rotation is your phone. That is not a euphemism. The schedule is whatever notification settings you put on your email and Slack apps, which channels are allowed through Do Not Disturb, and what ringtone wakes you up. There is no escalation. There is no secondary. There is you, and there is the alert that either arrives or does not.

Consider a 14-hour transpacific flight. First paying customer signed last month. The monitor will keep checking from 5 global regions every 5 minutes whether you are conscious or not, and it will send email if something breaks. The question is not whether the monitor works during the flight. It does. The question is whether you can accept a 14-hour coverage gap, whether you have told your customers about it, whether the failure mode during that gap is recoverable, and whether the alternative (never taking a flight again) is a worse outcome for the business than the controlled gap.

This is the central trade of a rotation-of-one. You are not choosing between coverage and no coverage. You are choosing which coverage gaps you will accept on purpose. The Indie Hackers community has documented this for years. Sergey Kyune wrote about facing one outage on a train returning from a holiday and handling two urgent customer requests during the break. That is the rotation-of-one in one sentence. The vacation does not stop the pager. The pager goes wherever your phone goes.

The principles that hold at 1 person are these. First, the only two questions worth optimizing are alert quality and recovery gap. Alert quality is whether the page is worth waking for. The recovery gap is how long the system can run alone before a failure becomes unrecoverable. Make the first as high as you can and the second as long as you can. Second, pre-announce coverage windows. Customers tolerate a stated 12-hour silence vastly better than an unstated 30-minute one. Third, you will not escape the cognitive cost of broken sleep. The clinical literature on this is decades old and still cited. Deary and Tait's 1987 BMJ study on medical house officers found that short-term recall was measurably impaired after a night of emergency admissions. The founder who answers the page at 3 AM and stays up debugging is not the same founder who should be making release-irreversible decisions.

Reducing the alert surface so a rotation-of-one is survivable

A rotation-of-one survives if and only if it almost never fires. That is the whole game. Headcount math says you need 8 to absorb a 24x7 rotation. The way 1 person survives the same coverage requirement is by making the rate of pages low enough that the interruption is rare. The mechanism is alert quality, not headcount.

PagerDuty's own internal benchmark is the cleanest number to anchor on. At PagerDuty, if a team logs more than 15 alerts per week, they trigger an alert-quality debrief. For a team of 8 with primary and secondary, 15 alerts a week is the warning line. For a solo founder, treat 15 as a hard ceiling and aim for fewer than 5 per week steady state. More than that and the rotation-of-one is no longer survivable. You will sleep through one of them eventually, and the one you sleep through will be the one that mattered.

What gets you under 5 alerts a week is structural. The four layers your monitoring should cover piece covers the depth question: HTTP, browser login monitor, multi-step API, status page. Both that piece and the configuration walk-through are upstream of this post. Get monitoring right first. Page rate is a downstream property of monitoring quality, and the most common pre-incident mistakes are covered separately.

The single largest source of false alerts at the solo-founder stage is single-region HTTP probes that flap on transient network conditions. A 200 OK from one city is the noisiest signal in the stack. Multi-region with consensus, paired with a real browser login monitor that sees what users see, is why most uptime monitors miss real outages when they run alone. The browser login monitor catches the silent failures that HTTP misses. The multi-region consensus filters the noise that HTTP creates. Together they cut the alert rate by an order of magnitude versus a naive single-region uptime probe, and the configuration mistakes that small businesses make on the way there are documented separately.

The 2-person rotation: never both away at once

At 2 people, you have a rotation. It is the simplest possible rotation, and it is enormously more humane than the rotation-of-one. The one rule is the entire spec: never both away at once.

Max Al Farakh from Jitbit wrote the canonical 2-founder version of this rule. While one of us is away on a trip the other one stays home and online. We just have to plan our vacations ahead.That is the whole pattern. A shared calendar where you mark off your travel weeks. A shared Slack, Discord, or Microsoft Teams channel where the alert lands. A norm that whoever is "on" that week acknowledges the page within a defined window.

A partner's surgery, three weeks out, is a sharper version of the same question. Hospital all day, then recliner-sleeping for two nights afterward. Phone on silent. 20 customers depend on you. In a rotation-of-one, this is the kind of week you announce a coverage gap and hold your breath. In a 2-person rotation, you tell your co-founder which 96 hours you are dark and they pick up the pager. The mechanism is a Google Calendar event with their name on it. The tooling cost is zero. The structural cost is the conversation you had four months ago about how this would work the first time it came up.

What 2 people does not give you is severity routing, escalation, or quiet hours. You both get every alert. If alert volume is high, both of you are buzzing all night, and 2 people sleeping poorly is not better than 1 person sleeping poorly. The 2-person rotation works because the previous section worked: you got alert volume down first, then you added the second person to absorb vacation and surgery and weddings. Going from 1 to 2 without fixing the alert volume just doubles the misery.

The shared-channel pattern is the entire tooling story at 2 people. Slack with phone notifications enabled and Do Not Disturb allow-listed to the channel, or Discord with the same, or Microsoft Teams. Whoever is on for the week claims the page with an emoji or a one-word reply. The other person is the implicit safety net if the on person does not respond inside 15 minutes. There is no formal escalation policy because the escalation is "the other one of us." You are not yet at the headcount where a tool that schedules this is cheaper than the shared channel that does it implicitly.

The 3-person rotation: the first real schedule

At 3 people, you have something that resembles an actual on-call rotation as the SRE world recognizes it. You can do weekly shifts where one person is primary and the other two are off. You can do primary plus secondary, where two people are on and one is off. You can rotate evenly enough that no one carries more than 1 in 3 weeks on the pager.

The mechanics that show up here are real schedule mechanics. Opsgenie's documentation is a clean reference for the underlying patterns: daily, weekly, and custom rotations, day-and-time restrictions, and the multi-schedule pattern where primary and secondary are separate rotations that index against the same calendar. The Google SRE Book formalizes the same pattern: many teams have both a primary and a secondary on-call rotation, and the distribution of duties between the primary and the secondary varies from team to team. At 3 people, you are picking which variant.

A 9-day wedding 11 timezones away with unreliable resort Wi-Fi is the dreaded coverage-gap announcement at 1 person. At 2 people, the other one of you is covering and you should not have to think about the pager. At 3 people, you are not even the backup. You are off the schedule entirely for those 9 days, and the primary plus secondary slots are filled by the other two members. The wedding is just a wedding. Follow-the-sun in the SRE sense is still out of reach because you do not have the headcount per timezone for it. But the local version, where the timezones you live in cover most of your customers' waking hours and the gaps are short enough not to matter, is achievable.

The tool decision shifts at 3 people. Below 3, a tool that schedules rotations is overhead you will not maintain. At 3, a tool starts paying for itself. The two practical options are PagerDuty (Professional at $21 per user per month, billed annually) or Opsgenie's equivalent. Both replace the shared-channel pattern with explicit primary and secondary slots, escalation policies, and a calendar that anyone can read. Three seats of PagerDuty is roughly $63 per month at list price. The $0 floor is the shared channel. Pick based on whether the shared channel has started to fail, not on what the SRE blogs say you should have.

The contractor or part-time backup question

Sooner or later, a solo founder thinks about hiring a part-time contractor to take the pager during vacation. This is a reasonable thought. It is also the moment the answer leaves the territory of engineering and enters the territory of employment law, and the right framing here is that this is a question, not a rule.

The first technical backup is usually a part-time contractor in another country, on a 7-day rotation for the months you travel. The question that determines whether you owe them standby pay for hours they are awake-and-available but not actively working is jurisdictional, not universal.

In Canada, under the federal Labour Code, the framing has been summarized cleanly by Canadian HR Reporter: while stand-by or on-call employees are common to many industries, the time spent waiting for a call is not considered work. Provincial law layered on top can change this. The contract you write with the contractor can also change this. The provincial CNESST rules in Quebec, the Employment Standards Act in Ontario, and the equivalents in other provinces all deserve direct reading before you commit to a structure.

In the United States, this is harder. The Fair Labor Standards Act and state law diverge sharply. California and a handful of other states have reporting-time and on-call pay rules that federal FLSA does not. Whether your contractor is properly classified as a contractor versus an employee under both federal and state tests is a separate, larger question that predates the on-call pay question. Nothing in this paragraph is legal advice. The actionable version is: before you put a contractor on a recurring on-call rotation, ask an employment lawyer in their jurisdiction what the rules are. Pay the few hundred dollars for the consultation. It is cheaper than the ruling.

There is a structural version of this question too, separate from the legal one. A contractor on-call for one week of every eight is on a worse rotation than your own rotation-of-one, because they have less context and a worse chance of fixing what they get paged for. The rotation that works at 3+ dedicated people does not transplant cleanly onto 1 full-time founder plus 1 occasional contractor. The contractor backup is real, but it works best as a coverage-gap filler during announced absences, not as a permanent member of an ongoing rotation. Use them for the weeks you travel. Do not use them as the secondary on every alert.

What to walk back when a co-founder leaves

The hardest rotation transition is not 1 to 2 or 2 to 3. It is 2 back to 1. Until last month you had a 2-person rotation. Your co-founder left. You still have customer commitments the two of you made together, and a product that two people built that you alone now operate. The first instinct is to maintain everything and absorb the load. That instinct is wrong.

What to walk back is the part of the operational surface that required two humans. The 24x7 response window goes back to a stated business-hours window plus best-effort overnight. The 15-minute first-comms commitment, if you made one, goes back to a 30-minute one. The list of channels you accept incoming support on shrinks. The list of monitors that page you at 3 AM shrinks to the ones that genuinely require a 3 AM response, not the ones that were nice-to-have when you had a co-founder who could absorb them.

What does not walk back is customer trust. Tell customers what changed and what it means for them. The pattern from the 5-minute first-comms rule applies to structural changes too: state the change, state what the customer will experience, state what they can rely on going forward. A short, direct note to active customers that response windows have changed because the team has changed is read as professionalism. Silence followed by a missed 3 AM alert is read as abandonment.

The pricing question that often follows the co-founder departure is whether you can still offer the same tier of support you offered as a two-person company. The honest answer is usually no, and the honest move is to grandfather existing customers at the old terms while updating the public commitments on new sign-ups. Trying to maintain a two-person SLA as a one-person company is the fastest path to burning out the one person left.

How Velprove fits

Velprove is the detection layer, not the rotation layer. That is the entire honest answer, and it is worth stating directly because most monitoring vendors imply otherwise.

What Velprove does. Free, Starter, and Pro all route alerts to a single recipient via email plus account-level channels. Free ($0, no credit card required) includes 10 monitors at a 5-minute interval floor, 1 browser login monitor at 15 minutes, email alerts only, and multi-step monitors of up to 3 steps. Starter ($19 per month) adds Slack, Discord, Microsoft Teams, and webhook channels, opens the interval floor to 1 minute on HTTP and API monitors, raises the monitor cap to 25, includes 3 browser login monitors at a 10-minute floor, and raises multi-step monitors to 5 steps. Pro ($49 per month) adds PagerDuty as a 6th channel, opens HTTP and API intervals to 30 seconds, raises the cap to 100 monitors, includes 10 browser login monitors at a 5-minute floor, and raises multi-step monitors to 10 steps.

What Velprove does not do. No native on-call rotation scheduling. No quiet hours. No severity tiers or severity-conditional routing. No secondary contact or backup recipient field. No snooze or mute window on a monitor. No follow-the-sun multi-recipient routing. Velprove forwards to a single recipient per channel. The channel is shared (an account-level Slack webhook, an account-level Discord webhook, an account-level PagerDuty routing key), so the rotation pattern at 2 people is "both of you watch the channel, whoever is on this week answers." That pattern works. It is not a feature. It is a usage pattern.

What this means at each headcount. At 1 person, Velprove is your rotation, because the rotation is your phone and Velprove puts the alert on it. At 2 people, the shared Slack, Discord, or Teams channel is the rotation surface, and Velprove delivers into it. At 3+ with a real rotation requirement, Pro forwards to PagerDuty Professional(or Opsgenie via webhook, or incident.io, or Better Stack), and the rotation lives in the downstream tool. Velprove is the detector. The downstream tool is the scheduler. If you have outgrown forwarding, you have outgrown Velprove's natural use case as a rotation surface, and the right architecture is to keep Velprove as the detector and let the rotation tool do scheduling.

If you are weighing the broader vendor landscape rather than the Velprove plan ladder, the 2026 uptime monitoring tool comparison covers the matrix.

Frequently Asked Questions

Can a solo founder run an on-call rotation?

Yes, but it is a rotation of one. Reduce alert surface so the page rarely fires, accept controlled coverage gaps, and pre-announce them. Google's SRE Book sets the on-call floor at 8 engineers per single-site team, so at 1 person you trade coverage for survivability and tell customers what that trade looks like. The work shifts from scheduling to alert quality.

When do you need a PagerDuty-style tool for on-call?

Three, roughly. At 1 person, the rotation lives on your phone. At 2 people, a shared Slack, Discord, or Teams channel plus a calendar covers it. At 3 people you need primary and secondary slots, and PagerDuty's Professional tier at $21 per user per month or an Opsgenie equivalent starts paying for itself. Below 3, the tool is overhead you do not have time to maintain.

How many on-call alerts per week is too many?

PagerDuty's internal benchmark is 15 alerts per week per on-call team before they trigger an alert-quality debrief. For a solo founder, treat 15 as a ceiling, not a target. If you are paged more than twice a day, the fix is fewer or smarter alerts, not a more resilient on-call. The cheapest rotation improvement at any team size is removing alerts that do not need a human.

Do you owe a part-time on-call backup standby pay?

In Canada, under the federal Labour Code, time spent waiting for a call is not considered work, so standby pay is not automatic. Provincial law and the specific contract can change that. In the United States, FLSA and state law diverge sharply on standby pay rules. This is a question for an employment lawyer in the backup's jurisdiction, not a rule you can resolve from a blog post.

How do solo founders take vacation while on-call?

Three options exist. Announce a coverage window and accept the gap, with proactive customer communication. Relocate temporarily to a timezone where peak-load hours are your daytime hours. Or pre-negotiate with a contractor or peer founder to take the pager for the week. The Indie Hackers community has documented this trade-off repeatedly, and none of the three options is free. You are choosing which cost to pay.

Does Velprove schedule on-call rotations?

No. Velprove handles detection and forwards to whatever rotation tool you already use. Free, Starter, and Pro all route alerts to a single recipient via email plus account-level Slack, Discord, Microsoft Teams, and webhook channels. Pro adds PagerDuty forwarding for teams that need a real schedule. If you have outgrown forwarding into a downstream rotation tool, you have outgrown Velprove's natural scope, and the right move is to keep Velprove as the detector and let the rotation tool do scheduling.

The rotation question does not get easier by ignoring it until the first vacation. It gets easier by deciding now what coverage you can sustain at your current headcount, what gaps you will pre-announce, and which alerts are worth waking a human for in the first place. Start a free Velprove account. 1 browser login monitor, 10 HTTP monitors, 5 global regions, email alerts, status page. No credit card required. The setup is five minutes. The rotation design is the part that takes a Tuesday afternoon, and this post is the outline for that afternoon.