RC-DB-001 v1.0 TIER 3 · SPECIFIC APP BOX 3 P1 — CRITICAL
Run Card Browser › IT Infrastructure › Database Services › Production Database
Specific App Run Card  ·  IT Infrastructure › Database Services
Production Database
Unreachable
ALARM LEVEL
BOX 3
FULL RESPONSE
Initial Exposure
T3C(~)R(~)
Typical at open — 3 teams engaged, confidence and recovery path not yet assessed.
Update every 15 minutes. Publish on bridge and in incident chat.
Required Responders
Role Who Required First Action
MIM / IC Certified Major Incident Manager Required Open incident, assign command, declare Box 3
Operations Chief Senior DBA or Infrastructure Lead Required Triage DB connectivity, confirm scope
App SME Application owner(s) for affected services Required Confirm app-layer impact, check retry behavior
Network/Infra Network or cloud infrastructure engineer Required Check network path, DNS, firewall rules
PIO / Comms Customer Communications Lead Required Draft initial customer alert, update status page
Security / SO InfoSec or Compliance Lead If data exposure risk Assess data breach risk, advise on recovery path safety
Vendor / LO Cloud provider or DB vendor contact If cloud/vendor DB Open P1 support ticket, get on vendor bridge
C
Conditions to Gather
  • Which database(s)? Primary, replica, both? Specific host?
  • When did it go unreachable? First alert time vs. confirmed outage time
  • What can connect and what can't? App servers? Read replicas? Admin access?
  • Any recent changes? Deployments, schema migrations, infra changes, cert rotations?
  • Error message or log snippet? Connection timeout? Auth failure? Port unreachable?
  • Customer impact confirmed? Which features / customers / regions?
  • Is this a cloud-managed DB? RDS, Cloud SQL, Atlas, Azure SQL, etc.?
A
Initial Actions
  • MIM opens incident Severity: P1. Alarm Level: Box 3. Assign command.
  • Ping DB host directly Network vs. DB process — distinguish quickly
  • Check DB process status Is the DB process up? Has it crashed or hung?
  • Check connection pool exhaustion Max connections hit? Zombie connections?
  • Review DB and system logs OOM kill, disk full, auth errors, deadlock?
  • Open vendor P1 ticket if cloud DB Do not wait — open in parallel with triage
  • MIM posts first milestone T+15 minutes. CAN format. Stakeholder view updated.
N
Needs / Escalation
  • DB admin access confirmed? If not — who has it? Get them on the bridge now.
  • Failover runbook located? Where is it? Who owns it? Is it current?
  • Leadership aware? E2 posture if customer-facing. Exec SMS trigger.
  • Backup availability confirmed? Last backup time? Restore tested? RTO/RPO known?
  • Security / compliance notified? If data may be compromised or inaccessible to regulators.
  • Read replica available? Can traffic fail over to read replica temporarily?
  • MIM: update exposure line Every 15 min. If R(U) → escalate immediately.
Escalation Path
App SME
DB / Ops Chief
MIM
Engineering Director
CTO / Exec
Trigger escalation up the chain when: recovery path is unknown (R(U)) for >15 minutes, vendor has not engaged within 30 minutes, or data integrity risk is identified.
Response Timeboxes
T+5
Bridge Open / Command Assigned
MIM active. All required responders on call or paged.
T+10
Conditions Gathered
All teams have reported CAN to MIM. Scope confirmed.
T+15
First Milestone Published
Stakeholders updated. Exposure notation published.
T+20
Recovery Track Opened
Primary recovery path identified. Assignee and timebox set.
T+30
Customer Alert Decision
PIO issues customer alert or MIM documents why not.
Release Criteria
Responders are not released until MIM confirms all of the following:
  • Database is confirmed reachable from all application servers
  • Application teams have validated their services are operating normally
  • Monitoring alerts have cleared or been acknowledged with explanation
  • Customer impact has been assessed — alert issued or documented as not triggered
  • Root cause known or formally deferred to Learning Review
  • MIM has posted the resolution milestone with duration and known cause
  • After Action scheduled (within 5 business days for P1)