After Action¶
The question is never "who made a mistake." The question is always "what did we learn, and what do we change?"
Why We Changed the Name¶
"Post-Incident Review" sounds like a compliance requirement. A box you check so a ticket can close. In most organizations, that is exactly how it is treated — scheduled, rescheduled, attended without preparation, documented by no one, forgotten by the next incident.
We call it an After Action because that is what fire service and the military call it. You have just been through something. You come off the fireground, you debrief while it is fresh, and you come out with specific things to do differently next time. It is not a review. It is a continuation of the response.
The written output is the Learning Review. Not "lessons learned" — because lessons are not learned until behavior changes. The Learning Review documents what we captured. Whether it becomes a lesson depends on what happens to the action items.
This framing is borrowed directly from how Google SRE approaches post-mortems. Their shift was deliberate: rename the artifact, change the question at the top. The old question: "What went wrong?" The new question: "What did we learn, and what do we change?" The first question finds fault. The second question finds improvements. Only one of them makes the next incident better.
What an After Action Produces¶
- A documented timeline of the incident (reconstructed from the MajorOps record — not re-narrated from memory)
- The Learning Review document: root cause, contributing factors, what went well, what we change
- Improvement tasks with owners and due dates, tracked in MajorOps
- A signal to the team that every incident makes the organization better — not just closes a ticket
The Problem with How Most Organizations Do This¶
This is preserved from original operational analysis. It describes what happens without structure — and what we are designed to prevent:
- Hour-long calls where multiple people re-explain the same timeline.
- Long-running meetings with no agenda and no decisions.
- Follow-up meetings scheduled because the first one produced nothing.
- Rescheduling cycles because no one protected the calendar.
- Key people not invited because the invitation process is manual.
- No defined expectations beyond "show up and talk."
- Action items tracked in a spreadsheet no one maintains.
- Teams arrive without evidence, delaying everything while people locate data.
- Vendor status unknown — no one tracked the vendor's RCA commitment.
- The actual theme of the meeting becomes: explaining the technology and avoiding accountability.
MajorOps After Actions are different because the data exists before the meeting starts. Milestones, phase logs, timeline events, recovery track outcomes — all of it was captured during the incident. The After Action is analysis, not reconstruction.
If the data does not exist because it was not logged during the incident, that gap is itself the first finding.
When an After Action Is Required¶
| Condition | Required |
|---|---|
| Critical (P1), any duration | Yes — mandatory |
| High (P2) > 2 hours | Yes — mandatory |
| High (P2) < 2 hours with customer impact | Recommended |
| High (P2) < 2 hours, internal only | Optional |
| Medium (P3) with unusual contributing factors | Optional |
| Repeat incident (same root cause as a prior incident) | Yes — regardless of severity |
Repeat incidents trigger a mandatory After Action regardless of severity because repetition means a prior learning was not implemented. That is a process failure.
Timeline¶
| Step | When |
|---|---|
| MIM completes resolution milestone | At incident close |
| After Action scheduled | Within 24 hours of incident close |
| Learning Review draft distributed | 48 hours before the meeting |
| After Action held | Within 5 business days (Critical), 10 business days (High) |
| Learning Review published | Within 48 hours of the meeting |
| Action items tracked | In MajorOps, assigned owners and due dates |
| Action item completion reviewed | 30 days after publication |
Who Attends¶
Required: - MIM — chairs the After Action - Technical Recovery Lead — owns the technical findings - SMEs from all active recovery tracks - Vendor representative — if vendor was involved and root cause is vendor-related
Optional / Situational: - Customer Communications Lead — if customer comms are a finding - Security or Compliance — if regulatory exposure was involved - Engineering Manager or Director — if findings require organizational change
Not Required: - Executive leadership — unless a finding requires executive action - Everyone who was on the bridge — the After Action is not a group debrief, it is a structured review
The MIM decides the attendee list based on the incident record. Attendance is not based on seniority or org chart proximity.
The Learning Review Document¶
The Learning Review is drafted by the MIM before the meeting. Attendees read it before arriving. The meeting is for discussion, challenge, and decisions — not for writing.
This distinction matters. If attendees are reading for the first time in the meeting, you are in a retell. That is an anti-pattern.
Structure¶
1. Incident Summary
- Incident ID, title, severity, duration
- Business and customer impact (affected users, revenue exposure, SLA status)
- MIM and key responders
2. Timeline
Reconstructed from MajorOps phase logs and milestones. Timestamps only — not narrative.
Key metrics derived from the timeline:
| Metric | Definition |
|---|---|
| Time to Detect (TTD) | Alert fired → incident confirmed |
| Time to Declare | Detection → Major Incident opened |
| Time to Mitigate (MTTM) | Incident opened → mitigation applied |
| Time to Resolve (MTTR) | Incident opened → validated recovery |
3. Root Cause
The technical finding. One or more contributing factors.
Format: "The incident was caused by [specific failure]. Contributing factors include [list]."
No names in root cause. Systems and processes only. If a person made an error, the question is: what system allowed that error to occur?
4. What Went Well
Actions, tools, and communications that worked. These are as important as the failures. They should be reinforced, documented, and replicated.
Questions that surface this: - What decision made a difference in this incident? - What worked faster or better than expected? - What process held up under pressure?
5. What We Are Changing
Not "what could be improved" — that framing is passive. This section names specific changes, each with an owner and a date.
Questions that surface this: - What slowed us down that a process change would fix? - What information did we not have that we should have had? - What would have changed the outcome if we had caught it earlier?
6. Learning Statements
Specific, named, actionable. Not "communicate better." Examples:
"Runbook step 4 does not account for the case where the primary DB is unreachable. SRE lead updating by [date]."
"Vendor escalation contact was missing from the run card. Service Manager adding before next quarter."
"Recovery track for Application team took 40 minutes to start — no pre-assigned track lead. Run card updated with named standby."
7. Action Items
| Item | Owner | Due Date | Status |
|---|---|---|---|
| Update runbook step 4 | [Name] | [Date] | Open |
Action items from After Actions are not optional. They are tracked in MajorOps. Unresolved items at the 30-day review are escalated.
The Major Technical Meeting (MTM)¶
For Critical incidents with extended duration (> 2 hours), the MIM may call a Major Technical Meeting during the active incident — a structured touchpoint separate from the bridge, focused on executive alignment and action item coordination.
The MTM is not the After Action. It happens during the incident. The After Action happens after.
MTM Agenda:
- Attendance and intro — MIM
- Establish core roles (first MTM only) — MIM
- Recovery status — Technical Recovery Lead
- Impact statement — Customer Success / Service Delivery
- Business and client impact — MIM + Recovery Director
- Regulatory exposure (if applicable)
- Review open actions from prior MTM — MIM
- Confirm severity posture is correct — MIM + Recovery Lead
- Set next MTM time and update cadence — MIM
The MTM produces structured action items. The MIM publishes a post-MTM summary immediately after. MTM notes feed the Learning Review.
Vendor After Actions¶
When a vendor is involved, the MajorOps After Action is not dependent on the vendor's RCA arriving on time.
The Learning Review documents: 1. When the vendor was engaged and response time against their stated SLA 2. Impact of the vendor system's failure on the incident timeline 3. Open items pending from the vendor — with a committed date, not "waiting on them"
The vendor RCA is attached to the Learning Review as an appendix when received. "We're waiting on them" is not a closed item. Set a follow-up date. Escalate when it passes.
Know your vendor RCA SLA before the incident, not during it. It is in the contract.
Anti-Patterns¶
These failure modes make After Actions ineffective. Name them when they occur.
The Blame Session — Time spent identifying who made a mistake. Names belong in action item owners. They do not belong in root cause. A blame session is a sign the culture has not internalized the NTSB model. The MIM chairs the After Action and is responsible for redirecting it.
The Retell — Teams re-narrate the incident from scratch because no Learning Review draft was prepared. This is a process failure, not a meeting failure. The MIM owns the draft. If the draft is not ready, the After Action should be rescheduled until it is — not converted into an improv session.
The Vanishing Tasks — Action items produced in the meeting that no one checks on at 30 days. Every item needs an owner, a date, and visibility in a tracked system. Vanishing tasks mean the learning loop is broken.
The Missing Vendor — Closing without a plan to get the vendor RCA. Set a date. Assign it to the Service Manager. If it passes without delivery, escalate. The vendor's RCA is your evidence. You need it.
The Pre-Scheduled Cancel — The After Action is on the calendar, then rescheduled, then rescheduled again, and eventually never held because "things settled down." Every Critical incident gets an After Action. There is no "resolved cleanly enough to skip it" threshold.
Data Available in MajorOps Before the After Action¶
The Learning Review draft should be built from the incident record, not from memory. What MajorOps provides:
- Full milestone log — stakeholder communications, timestamped
- Phase transition log — when each phase was entered and by whom
- Timeline events — all logged actions during the incident
- Command assignment history — who held each role, when they were assigned
- Alert info — detection time, customer count, external impact
- Status updates — full comms record, public and internal
If any of this data is absent, the gap is the first finding. Every empty field in the After Action record means something was not captured during the incident. Fix the capture, not the record.
After Action process adapted from U.S. military After Action Review methodology, NTSB investigation standards, and Google SRE Learning Review practices. The distinction between "what went wrong" and "what did we learn" is borrowed directly from the Google SRE team's public writing on blameless post-mortems.