Skip to content
■ DRAFT — This page is under review and has not been approved for publication. Content may change.

KPI Source of Truth

This page is the human-readable source for every KPI we track. The database tables and API mirror this page; nothing ships without an entry here.

What “authoritative” means

  • One definition: Each KPI has a unique slug and single definition that feeds all scorecards (exec, stakeholder, responder readiness).
  • One formula: The calculation is written once, stored in the Worker/D1 layer, and reused by every consumer.
  • One ownership path: Every KPI has an owner and review cadence.

Fields to capture (per KPI)

Field Description
slug Lowercase identifier used in API and DB (mttr, cadence-adherence).
name Friendly display name.
description What the KPI measures and why it matters.
formula Human-readable formula; implementation lives in Worker.
units e.g., minutes, percentage, count.
targets Optional thresholds by severity or persona (e.g., P1 target 60m).
data_sources Which events/fields power it (timeline events, milestones, alerts).
visibility internal or exec; controls where it renders.
owner Role or person accountable for accuracy.
review_cadence How often the definition is re-validated.

Current KPIs

slug name description units formula (conceptual) targets visibility owner
mttr Mean Time to Resolve Average time from incident.detected_at to resolved_at. minutes avg(resolved_at - detected_at) P1: 120m, P2: 240m exec Ops Leader
mtta Mean Time to Acknowledge Average time from alert issue to first MIM action (phase 1 → 2). minutes avg(phase_entered_at(gather) - alert.issueTime) P1: 10m, P2: 20m internal MIM Lead
cadence-adherence Update Cadence Adherence Percent of milestones delivered on/before committed ETA. % on_time_milestones / total_milestones ≥90% exec Comms Lead
stakeholder-satisfaction Stakeholder Signal Score Rolling survey score post-incident (1–5). score avg(survey_score) ≥4.5 exec Customer Ops
team-page-to-bridge Team Bridge Arrival Time Time from page sent (team_pages.paged_at) to team on bridge (team_pages.arrived_at). Measured per team dispatch, per incident. minutes avg(arrived_at - paged_at) per team Box3: ≤5m, Box2: ≤10m, Box1: ≤20m internal MIM Lead
team-page-to-ack Team Acknowledgment Time Time from page sent to acknowledgment. Only computed where acknowledged_at is present. minutes avg(acknowledged_at - paged_at) per team Box3: ≤2m, Box2: ≤5m internal MIM Lead

Add new KPIs by appending to this table and adding a KpiDefinition row via migration or admin tool. The API rejects observations for unknown slugs.


Team Dispatch Credit (Response Readiness Score)

The long-term goal of team-page-to-bridge and team-page-to-ack is not just to track a single incident — it is to build a per-team response record that accumulates across incidents into something meaningful.

The reference model is the fire service's ISO Public Protection Classification (PPC) — a 1–10 score that evaluates a fire department's infrastructure, staffing, training, and water supply. That score determines homeowner insurance rates in the department's coverage area. A department with a class 1 rating (best) can demonstrably justify its budget. A class 8 rating tells the insurer something different.

MajorOps's equivalent is not a single incident's page-to-bridge time. It is the rolling trend: how consistently does a team respond to dispatch at the alarm level they're rated for?

A team that consistently arrives in under 5 minutes on Box 3 incidents has a strong dispatch credit. A team that averages 18 minutes has a story to tell — and that story belongs in training planning, on-call rotation decisions, and staffing conversations.

MVP (Phase 1)

What gets captured: - Team name: structured, from the pre-configured teams list — no free text drift - Contact name: free text, e.g. Alex Kim (on-call) — no account verification required - Page time: when the MIM sent the page (paged_at) - Bridge arrival time: when the MIM observes the team on the call (arrived_at) — manually recorded

What gets computed (per team, per alarm level): - team-page-to-bridge — arrival latency, queryable across incidents - Trend over time: rolling average for the last 30/60/90 days

Phase 2 (Engineer-Level Tracking)

When an engineer has a MajorOps account and clicks "I'm here" during an incident: - team_pages.arrived_at is set precisely (not manually) - team_pages.contact_name becomes a link to a User record - incident_participants.joined_at is the individual-level timestamp

This unlocks individual-level scoring: not just "the team arrived in 8 minutes" but "Alex arrives in 4 minutes on average; the secondary on-call averages 14 minutes." That data should inform rotation assignments — not as punishment, but as a structurally honest input to staffing.

Phase 2 KPI to add (not yet in schema):

slug name description units formula
engineer-response-time Engineer Response Time Time from page to "I'm here" confirmation, per individual. Requires Phase 2 account linkage. minutes avg(arrived_at - paged_at) per user

Note: Individual scoring requires careful UX treatment. This is operational data, not performance review data. The goal is readiness awareness and rotation optimization — not ranking or blame.

How it flows

1) Define here → KpiDefinition row in D1 (slug, formula, targets, visibility, owner).
2) Compute/ingest → Worker writes KpiObservation rows (per incident, timestamp, value, source, confidence).
3) Consume → Scorecards call GET /kpis (definitions) and GET /incidents/:id/kpis (observations).
4) Render → Exec/Stakeholder/Responder views pick the subset they are allowed to see based on visibility.

Backfill & QA

  • Run mim kpi backfill --kpi <slug> to recompute observations from timeline/milestones.
  • Observations are append-only; re-running backfill overwrites by (kpi_slug, incident_id, timestamp) primary key for determinism.

Change control

  • Update definitions here first; then update Worker formulas and run a migration if fields change.
  • Tag each update with the date and owner in git history; this page is the audit trail.