An on-call handoff checklist should give the incoming responder enough context to avoid starting the shift blind. The handoff should cover open incidents, unstable systems, monitoring gaps, and any unusual operational risk. If the new primary has to reconstruct context from Slack threads and alerts, the handoff is not good enough.
This guide is about handoff discipline. For broader rotation design, see On-call rotation guide.
The minimum handoff checklist
Every handoff should answer:
- are there active incidents?
- are there unresolved degradations?
- are any alerts noisy or unreliable right now?
- are there planned changes or maintenance windows soon?
- is any escalation context already in progress?
1. Active incidents
List any incident that is still active or still being monitored after mitigation.
Include:
- incident title
- current severity
- current status
- next expected update or review point
2. Known unstable systems
Not every risk is a declared incident yet.
Examples:
- one region showing intermittent latency
- dependency provider with ongoing degraded status
- queue depth growing but not yet customer-visible
These belong in handoff because they may become incidents during the next shift.
3. Monitoring gaps or noisy alerts
The incoming responder should know if:
- one monitor is flapping
- a timeout threshold is too aggressive
- a region is disabled temporarily
- an alert is currently unactionable noise
That context prevents wasted response effort.
4. Planned changes and maintenance
Any near-term risky event should be included.
Examples:
- deploy scheduled in the next few hours
- maintenance window tonight
- certificate change
- infrastructure migration in progress
5. Escalation context
If leadership, support, or customers are already involved, the incoming responder should know:
- who is expecting updates
- where updates are being posted
- whether a status page incident already exists
For communication structure, see Incident communication templates.
A simple handoff format
| Section | Example |
|---|---|
| Active incidents | API write degradation in EU, Sev 2, monitoring recovery |
| Open risks | Provider latency elevated, not customer-visible yet |
| Monitoring issues | Webhook monitor flapping from one region |
| Planned work | DB maintenance at 01:00 UTC |
| Notes | Support team already has customer-facing summary |
Practical rule
If the next responder reads the handoff in under two minutes and knows what to watch first, the handoff is probably good.
FAQ
What is the goal of an on-call handoff?
The goal is to transfer operational context clearly enough that the incoming responder can spot risk quickly and avoid losing time reconstructing the current state.
How detailed should an on-call handoff be?
Detailed enough to highlight real risk, but short enough that the incoming responder will actually read it. It should be concise, not a full retrospective.
Should noisy monitors be mentioned in handoff?
Yes. Known monitoring noise is operational context and should be handed over clearly.