KPI Source of Truth¶
This page is the human-readable source for every KPI we track. The database tables and API mirror this page; nothing ships without an entry here.
What “authoritative” means¶
- One definition: Each KPI has a unique
slugand single definition that feeds all scorecards (exec, stakeholder, responder readiness). - One formula: The calculation is written once, stored in the Worker/D1 layer, and reused by every consumer.
- One ownership path: Every KPI has an owner and review cadence.
Fields to capture (per KPI)¶
| Field | Description |
|---|---|
slug |
Lowercase identifier used in API and DB (mttr, cadence-adherence). |
name |
Friendly display name. |
description |
What the KPI measures and why it matters. |
formula |
Human-readable formula; implementation lives in Worker. |
units |
e.g., minutes, percentage, count. |
targets |
Optional thresholds by severity or persona (e.g., P1 target 60m). |
data_sources |
Which events/fields power it (timeline events, milestones, alerts). |
visibility |
internal or exec; controls where it renders. |
owner |
Role or person accountable for accuracy. |
review_cadence |
How often the definition is re-validated. |
Current KPIs¶
| slug | name | description | units | formula (conceptual) | targets | visibility | owner |
|---|---|---|---|---|---|---|---|
mttr |
Mean Time to Resolve | Average time from incident.detected_at to resolved_at. |
minutes | avg(resolved_at - detected_at) |
P1: 120m, P2: 240m | exec | Ops Leader |
mtta |
Mean Time to Acknowledge | Average time from alert issue to first MIM action (phase 1 → 2). |
minutes | avg(phase_entered_at(gather) - alert.issueTime) |
P1: 10m, P2: 20m | internal | MIM Lead |
cadence-adherence |
Update Cadence Adherence | Percent of milestones delivered on/before committed ETA. | % | on_time_milestones / total_milestones |
≥90% | exec | Comms Lead |
stakeholder-satisfaction |
Stakeholder Signal Score | Rolling survey score post-incident (1–5). | score | avg(survey_score) |
≥4.5 | exec | Customer Ops |
team-page-to-bridge |
Team Bridge Arrival Time | Time from page sent (team_pages.paged_at) to team on bridge (team_pages.arrived_at). Measured per team dispatch, per incident. |
minutes | avg(arrived_at - paged_at) per team |
Box3: ≤5m, Box2: ≤10m, Box1: ≤20m | internal | MIM Lead |
team-page-to-ack |
Team Acknowledgment Time | Time from page sent to acknowledgment. Only computed where acknowledged_at is present. |
minutes | avg(acknowledged_at - paged_at) per team |
Box3: ≤2m, Box2: ≤5m | internal | MIM Lead |
Add new KPIs by appending to this table and adding a
KpiDefinitionrow via migration or admin tool. The API rejects observations for unknown slugs.
Team Dispatch Credit (Response Readiness Score)¶
The long-term goal of team-page-to-bridge and team-page-to-ack is not just to track a single incident — it is to build a per-team response record that accumulates across incidents into something meaningful.
The reference model is the fire service's ISO Public Protection Classification (PPC) — a 1–10 score that evaluates a fire department's infrastructure, staffing, training, and water supply. That score determines homeowner insurance rates in the department's coverage area. A department with a class 1 rating (best) can demonstrably justify its budget. A class 8 rating tells the insurer something different.
MajorOps's equivalent is not a single incident's page-to-bridge time. It is the rolling trend: how consistently does a team respond to dispatch at the alarm level they're rated for?
A team that consistently arrives in under 5 minutes on Box 3 incidents has a strong dispatch credit. A team that averages 18 minutes has a story to tell — and that story belongs in training planning, on-call rotation decisions, and staffing conversations.
MVP (Phase 1)¶
What gets captured:
- Team name: structured, from the pre-configured teams list — no free text drift
- Contact name: free text, e.g. Alex Kim (on-call) — no account verification required
- Page time: when the MIM sent the page (paged_at)
- Bridge arrival time: when the MIM observes the team on the call (arrived_at) — manually recorded
What gets computed (per team, per alarm level):
- team-page-to-bridge — arrival latency, queryable across incidents
- Trend over time: rolling average for the last 30/60/90 days
Phase 2 (Engineer-Level Tracking)¶
When an engineer has a MajorOps account and clicks "I'm here" during an incident:
- team_pages.arrived_at is set precisely (not manually)
- team_pages.contact_name becomes a link to a User record
- incident_participants.joined_at is the individual-level timestamp
This unlocks individual-level scoring: not just "the team arrived in 8 minutes" but "Alex arrives in 4 minutes on average; the secondary on-call averages 14 minutes." That data should inform rotation assignments — not as punishment, but as a structurally honest input to staffing.
Phase 2 KPI to add (not yet in schema):
| slug | name | description | units | formula |
|---|---|---|---|---|
engineer-response-time |
Engineer Response Time | Time from page to "I'm here" confirmation, per individual. Requires Phase 2 account linkage. | minutes | avg(arrived_at - paged_at) per user |
Note: Individual scoring requires careful UX treatment. This is operational data, not performance review data. The goal is readiness awareness and rotation optimization — not ranking or blame.
How it flows¶
1) Define here → KpiDefinition row in D1 (slug, formula, targets, visibility, owner).
2) Compute/ingest → Worker writes KpiObservation rows (per incident, timestamp, value, source, confidence).
3) Consume → Scorecards call GET /kpis (definitions) and GET /incidents/:id/kpis (observations).
4) Render → Exec/Stakeholder/Responder views pick the subset they are allowed to see based on visibility.
Backfill & QA¶
- Run
mim kpi backfill --kpi <slug>to recompute observations from timeline/milestones. - Observations are append-only; re-running backfill overwrites by
(kpi_slug, incident_id, timestamp)primary key for determinism.
Change control¶
- Update definitions here first; then update Worker formulas and run a migration if fields change.
- Tag each update with the date and owner in git history; this page is the audit trail.