Incident severity levels are a way to classify operational problems by customer impact and urgency. A useful severity model helps teams decide who gets paged, how often updates go out, and what response process to follow. If teams cannot classify incidents consistently, they usually escalate the wrong issues and underreact to the real ones.
If you want the product workflow side, see Incident management. This guide focuses on the severity model itself.
Why severity levels matter
Severity levels are not just labels. They affect:
- who joins the response
- which communication channel is used
- how often updates are published
- when leadership or customers are notified
- when work is treated as business-critical
Without a clear model, every incident becomes a debate.
A simple severity framework
Many teams do well with four levels.
| Severity | Meaning | Example |
|---|---|---|
| Sev 1 | Major outage or major business impact | Login is down for all customers |
| Sev 2 | Significant partial outage | API writes fail for one region |
| Sev 3 | Degradation with workaround | Email notifications delayed |
| Sev 4 | Low-impact issue | Cosmetic dashboard bug during incident review |
The exact labels matter less than clear definitions.
Define severity by impact, not by technical drama
The most common mistake is classifying incidents by how interesting they are technically.
Correct approach:
- use customer impact
- use scope
- use duration risk
- use business criticality
Bad approach:
- number of internal systems involved
- how noisy the logs look
- whether the root cause seems complex
A practical severity matrix
Use three questions:
- How many users are affected?
- What can they no longer do?
- Is there a workaround?
Example:
| Scenario | Recommended severity |
|---|---|
| Entire login flow unavailable | Sev 1 |
| API latency doubled, but requests still succeed | Sev 3 |
| Webhook delivery delayed for some customers | Sev 2 or Sev 3 depending on duration and scope |
| One admin-only reporting page broken | Sev 4 |
Tie severity to communication cadence
Severity should also define update expectations.
| Severity | Typical update cadence |
|---|---|
| Sev 1 | Every 10-15 minutes |
| Sev 2 | Every 15-30 minutes |
| Sev 3 | Every 30-60 minutes |
| Sev 4 | As needed |
That makes the model operational instead of theoretical.
Keep the model small
Too many severity levels create ambiguity.
For most SaaS teams, 3 to 4 levels are enough. If responders cannot explain the difference between Sev 2 and Sev 2.5 in 30 seconds, the model is too complex.
Example: classifying a payment incident
Scenario:
- checkouts fail for 40% of customers
- account logins still work
- status page and docs remain available
This is usually Sev 1 or Sev 2 depending on business dependence, but it should almost never be treated as a low-priority degradation. The customer-facing impact is too direct.
Once severity is clear, use Incident communication templates to keep updates consistent.
FAQ
How many incident severity levels should a SaaS team use?
Most teams should use 3 or 4 levels. That is usually enough to separate major outages from partial outages, degradations, and low-impact issues.
Should severity be based on technical root cause?
No. Severity should be based primarily on customer impact, service scope, and business risk.
Can an incident change severity during the response?
Yes. If impact expands or recovery takes longer than expected, the incident should be reclassified and communication cadence should change with it.