Incidents
Incidents in PixoMonitor track service disruptions from detection to resolution. They provide a timeline of what happened, who was involved, and how long it took to resolve. Incidents can be created automatically by monitors or manually when you're aware of an issue.
Overview
Every incident tracks:
- Title — Description of the incident
- Status — Current state (Investigating, Identified, Monitoring, Resolved)
- Severity — Impact level (Critical, High, Medium, Low)
- Timeline — When it started and was resolved
- Acknowledgement — Who responded and when
Automatic Incidents
When a monitor detects a failure (status DOWN), PixoMonitor automatically:
- Creates a new incident linked to the monitor
- Sets status to "Investigating"
- Triggers alerts through configured channels
- Starts escalation policy (if configured)
When the monitor recovers (status UP):
- Updates incident status to "Resolved"
- Records the resolution time
- Notifies subscribers of the recovery
Automatic incidents use the monitor name as the incident title. You can edit the title and add details at any time.
Manual Incidents
Create incidents manually when you're aware of an issue that monitors haven't detected, or for issues that span multiple services.
Create a Manual Incident
Navigate to Incidents
Go to Incidents in the sidebar and click New Incident.
Fill in details
- Title — Clear description of the issue
- Monitor — Associate with a monitor (required)
- Severity — Select the impact level
Create
Click Create Incident. The incident is now active and visible on your status page.
Incident Statuses
Track incident progress through standard status levels:
| Status | Description | When to Use |
|---|---|---|
| Investigating | Issue detected, investigating cause | Initial status |
| Identified | Root cause identified, working on fix | After diagnosis |
| Monitoring | Fix deployed, monitoring for stability | After fix deployed |
| Resolved | Issue fully resolved | When complete |
Status Flow
The typical status progression:
Investigating → Identified → Monitoring → Resolved
You can skip statuses if appropriate (e.g., go directly from Investigating to Resolved for quick fixes).
Severity Levels
Severity indicates the impact of an incident:
| Severity | Impact | Examples |
|---|---|---|
| Critical | Complete outage, major data loss | All services down, security breach |
| High | Significant impact, major feature unavailable | Primary function broken |
| Medium | Moderate impact, workaround available | Degraded performance |
| Low | Minor issue, minimal user impact | Minor bug, cosmetic issue |
Severity and Escalation
Severity affects escalation behavior:
- Critical incidents trigger immediate escalation
- High incidents may have shorter escalation delays
- Medium/Low incidents follow normal escalation timing
Set severity based on user impact, not technical complexity. A "simple" bug that affects all users is more severe than a complex bug affecting few.
Acknowledging Incidents
Acknowledging an incident indicates someone is actively working on it. This:
- Stops escalation to additional responders
- Records who acknowledged and when
- Shows team that the incident is being handled
Acknowledge an Incident
Open the incident
Go to Incidents and click on the active incident.
Click Acknowledge
Click the Acknowledge button. This records:
- Your user account
- Timestamp
- Acknowledgement method (web, API, etc.)
Acknowledgement Information
After acknowledgement, the incident shows:
- Who acknowledged
- When they acknowledged
- How they acknowledged (web, API, SMS link, etc.)
Only one person needs to acknowledge an incident. Once acknowledged, escalation stops for that incident.
Updating Incidents
Keep stakeholders informed by updating incidents as you learn more.
Update Status
Open the incident
Go to Incidents and select the incident to update.
Change status
Select the new status from the dropdown.
Save
Changes are saved automatically and reflected on the status page.
Update Severity
If the impact changes (worse or better than initially thought), update the severity level to reflect current reality.
Update Title
Edit the title to better describe the incident as you learn more about it.
Resolving Incidents
Mark an incident as resolved when the issue is fully fixed and confirmed stable.
Resolve an Incident
Verify the fix
Confirm the issue is resolved and monitors show UP status.
Update status to Resolved
Change the incident status to Resolved.
Confirm resolution
PixoMonitor records the resolution time and calculates total duration.
Automatic Resolution
When a monitor recovers from DOWN to UP, associated incidents are automatically resolved. The resolution timestamp is set to when the monitor recovered.
Incident Timeline
Every incident maintains a timeline showing:
- When it started
- Status changes
- Who made changes
- When it was resolved
Viewing the Timeline
Open any incident to see its complete history. The timeline shows:
- Initial detection or creation
- All status updates
- Acknowledgement events
- Resolution
Incident Metrics
Track incident performance over time:
Key Metrics
| Metric | Description |
|---|---|
| Time to Acknowledge (TTA) | Time from incident start to acknowledgement |
| Time to Resolve (TTR) | Total time from start to resolution |
| Downtime Duration | How long the service was unavailable |
| Incident Count | Number of incidents per monitor/period |
Using Metrics
These metrics help you:
- Identify monitors that fail frequently
- Track response time improvements
- Meet SLA commitments
- Plan capacity improvements
Filtering Incidents
Find specific incidents using filters:
Filter by Severity
View only Critical, High, Medium, or Low severity incidents to focus on what matters most.
Filter by Status
- Active — All unresolved incidents
- Resolved — Past incidents
Filter by Monitor
View incidents for a specific monitor to understand its history.
Linking to Post-Mortems
After resolving significant incidents, create a post-mortem to document:
- What happened
- Root cause
- Impact
- Action items to prevent recurrence
See the Post-Mortems guide for details.
Create post-mortems for all Critical and High severity incidents, plus any incident that teaches valuable lessons.
Status Page Integration
Active incidents automatically appear on your public status page:
- Incident title is visible to users
- Status updates show progress
- Users see you're aware and working on it
- Resolution is announced when complete
What Users See
| Incident Status | Status Page Display |
|---|---|
| Investigating | "We are investigating this issue" |
| Identified | "The issue has been identified" |
| Monitoring | "A fix has been implemented" |
| Resolved | "This incident has been resolved" |
Subscriber Notifications
If you have subscribers enabled, they receive automatic emails:
| Event | Notification |
|---|---|
| New Incident | "New incident affecting [monitor]" |
| Status Update | "Update: [incident title]" |
| Resolved | "Resolved: [incident title]" |
Best Practices
- Acknowledge quickly — Stop escalation and show the team it's being handled
- Update regularly — Keep stakeholders informed, even if just "still investigating"
- Use appropriate severity — Based on user impact, not technical complexity
- Write clear titles — "API returning 500 errors" is better than "API down"
- Document everything — Future you will thank present you
- Create post-mortems — Learn from incidents to prevent recurrence
- Track metrics — Measure and improve response times
Don't delete incidents to "clean up" history. The incident record is valuable for understanding patterns and meeting compliance requirements.
