AI Security Incident Response: A Comprehensive Playbook
When an AI security incident occurs, traditional incident response procedures often fall short. The unique characteristics of AI systems - non-deterministic behavior, complex data dependencies, and novel attack vectors - require specialized response procedures.
This playbook provides a comprehensive framework for responding to AI-specific security incidents.
Why AI Needs a Specialized IR Playbook
AI incidents differ from traditional security incidents in several critical ways:
| Traditional Incident | AI Incident | |---------------------|-------------| | Clear attack indicators | Subtle behavioral changes | | Static malware signatures | Dynamic prompt-based attacks | | Binary state (compromised/clean) | Degraded or biased behavior | | Isolated system impact | Downstream decision cascade | | Deterministic reproduction | Non-deterministic behavior |
Incident Classification Framework
Severity Levels
Severity Levels:
Critical (P1):
definition: "Immediate threat to business operations or safety"
examples:
- Active prompt injection causing harmful outputs to users
- Model theft confirmed with data exfiltration
- AI making critical decisions with poisoned data
- Public-facing AI producing harmful or illegal content
response_time: 15 minutes
escalation: Immediate executive notification
High (P2):
definition: "Significant security breach with potential for escalation"
examples:
- Detected prompt injection attempts with partial success
- Suspected training data poisoning
- Unauthorized access to model weights or training data
- AI output quality degradation indicating compromise
response_time: 1 hour
escalation: Security leadership within 2 hours
Medium (P3):
definition: "Security concern requiring investigation"
examples:
- Anomalous query patterns suggesting probing
- Minor prompt injection attempts blocked by guardrails
- Unexpected model behavior changes
- Access control policy violations
response_time: 4 hours
escalation: Team lead within 1 business day
Low (P4):
definition: "Security observation for awareness"
examples:
- Failed authentication attempts on AI endpoints
- Minor policy violations detected and blocked
- Routine security scanning findings
response_time: 1 business day
escalation: Weekly security reviewIncident Categories
class AIIncidentCategory:
"""Classification of AI-specific security incidents."""
categories = {
'prompt_injection': {
'description': 'Attempts to manipulate AI behavior through crafted inputs',
'subcategories': [
'direct_injection', # User input manipulation
'indirect_injection', # Via external data sources
'stored_injection', # Persistent payloads
],
'typical_severity': 'P1-P2',
'indicators': [
'Unusual output patterns',
'Safety guardrail triggers',
'Context window anomalies',
'Instruction-like content in user messages',
],
},
'model_theft': {
'description': 'Attempts to extract model weights or replicate capabilities',
'subcategories': [
'weight_extraction', # Direct model stealing
'distillation_attack', # Training surrogate model
'api_abuse', # Systematic querying
],
'typical_severity': 'P1-P2',
'indicators': [
'High query volumes from single source',
'Systematic input patterns',
'Unusual API access patterns',
'Model serving infrastructure anomalies',
],
},
'data_poisoning': {
'description': 'Corruption of training data or knowledge bases',
'subcategories': [
'training_data_poisoning',
'rag_knowledge_poisoning',
'feedback_loop_poisoning',
],
'typical_severity': 'P1-P2',
'indicators': [
'Output bias or drift',
'Unexpected model behavior',
'Data integrity check failures',
'User complaints about accuracy',
],
},
'privacy_breach': {
'description': 'Unauthorized disclosure of sensitive information',
'subcategories': [
'training_data_leakage',
'pii_exposure',
'system_prompt_disclosure',
'conversation_leakage',
],
'typical_severity': 'P2-P3',
'indicators': [
'Sensitive patterns in outputs',
'User reports of information exposure',
'Audit log anomalies',
],
},
'adversarial_attack': {
'description': 'Crafted inputs designed to cause misclassification or errors',
'subcategories': [
'evasion_attack',
'model_confusion',
'output_manipulation',
],
'typical_severity': 'P2-P3',
'indicators': [
'Confidence score anomalies',
'Decision boundary probing',
'Unusual input characteristics',
],
},
'availability_attack': {
'description': 'Attacks targeting AI system availability',
'subcategories': [
'resource_exhaustion',
'model_degradation',
'inference_dos',
],
'typical_severity': 'P2-P3',
'indicators': [
'Latency spikes',
'Resource utilization anomalies',
'Error rate increases',
],
},
}Incident Response Phases
Phase 1: Detection and Triage
class AIIncidentDetection:
"""Detection and initial triage procedures."""
def __init__(self):
self.detection_sources = [
'automated_monitoring',
'user_reports',
'security_tools',
'model_performance_alerts',
'external_notification',
]
def initial_triage(self, alert: dict) -> dict:
"""
Perform initial triage of potential AI security incident.
"""
triage_result = {
'timestamp': datetime.utcnow().isoformat(),
'alert_id': alert['id'],
'triage_analyst': self.get_on_call_analyst(),
}
# Step 1: Validate alert authenticity
if not self._validate_alert_source(alert):
triage_result['status'] = 'dismissed'
triage_result['reason'] = 'Invalid alert source'
return triage_result
# Step 2: Determine incident category
category = self._classify_incident(alert)
triage_result['category'] = category
# Step 3: Assess initial severity
severity = self._assess_severity(alert, category)
triage_result['severity'] = severity
# Step 4: Check for related incidents
related = self._find_related_incidents(alert)
triage_result['related_incidents'] = related
# Step 5: Determine if this is a true incident
is_incident = self._determine_incident_status(alert, category)
triage_result['is_incident'] = is_incident
if is_incident:
triage_result['status'] = 'escalated'
triage_result['incident_id'] = self._create_incident(triage_result)
else:
triage_result['status'] = 'monitoring'
return triage_result
def _classify_incident(self, alert: dict) -> str:
"""Classify the incident type based on alert indicators."""
indicators = alert.get('indicators', [])
# Check for prompt injection indicators
injection_indicators = [
'unusual_output_pattern',
'guardrail_trigger',
'instruction_in_input',
]
if any(ind in indicators for ind in injection_indicators):
return 'prompt_injection'
# Check for model theft indicators
theft_indicators = [
'high_query_volume',
'systematic_queries',
'api_abuse_detected',
]
if any(ind in indicators for ind in theft_indicators):
return 'model_theft'
# Check for data poisoning indicators
poisoning_indicators = [
'output_drift',
'accuracy_degradation',
'unexpected_behavior',
]
if any(ind in indicators for ind in poisoning_indicators):
return 'data_poisoning'
return 'unknown'Phase 2: Containment
Containment strategies vary by incident type:
class AIIncidentContainment:
"""Containment procedures for AI security incidents."""
def contain_prompt_injection(self, incident: dict) -> dict:
"""Contain an active prompt injection attack."""
containment_actions = []
# Immediate: Block attacking user/IP
if incident.get('source_identified'):
self._block_source(incident['source'])
containment_actions.append('source_blocked')
# Short-term: Enhance input filtering
self._enable_strict_filtering()
containment_actions.append('strict_filtering_enabled')
# If attack is succeeding: Circuit breaker
if incident['severity'] == 'P1':
self._enable_circuit_breaker()
containment_actions.append('circuit_breaker_enabled')
# Consider taking system offline if critical
if incident.get('harmful_outputs_confirmed'):
self._disable_public_access()
containment_actions.append('public_access_disabled')
# Preserve evidence
self._snapshot_conversation_logs(incident)
self._preserve_model_state()
containment_actions.append('evidence_preserved')
return {
'incident_id': incident['id'],
'containment_status': 'contained',
'actions_taken': containment_actions,
'timestamp': datetime.utcnow().isoformat()
}
def contain_model_theft(self, incident: dict) -> dict:
"""Contain suspected model theft."""
containment_actions = []
# Block suspicious access
suspicious_accounts = incident.get('suspicious_accounts', [])
for account in suspicious_accounts:
self._revoke_api_access(account)
containment_actions.append(f'revoked_access_{len(suspicious_accounts)}_accounts')
# Implement strict rate limiting
self._enable_emergency_rate_limits()
containment_actions.append('emergency_rate_limits_enabled')
# Add additional authentication for model access
self._enable_enhanced_auth()
containment_actions.append('enhanced_auth_enabled')
# Monitor for continued attempts
self._enable_enhanced_monitoring(incident['patterns'])
containment_actions.append('enhanced_monitoring_enabled')
return {
'incident_id': incident['id'],
'containment_status': 'contained',
'actions_taken': containment_actions,
'timestamp': datetime.utcnow().isoformat()
}
def contain_data_poisoning(self, incident: dict) -> dict:
"""Contain data poisoning incident."""
containment_actions = []
# Stop ingestion pipeline
self._pause_data_ingestion()
containment_actions.append('ingestion_paused')
# Identify potentially poisoned data
suspected_data = self._identify_suspicious_data(incident)
containment_actions.append(f'identified_{len(suspected_data)}_suspicious_records')
# If model compromised, consider rollback
if incident.get('model_behavior_affected'):
# Rollback to last known good checkpoint
self._prepare_model_rollback()
containment_actions.append('rollback_prepared')
# Quarantine suspected data
for data_id in suspected_data:
self._quarantine_data(data_id)
containment_actions.append('suspicious_data_quarantined')
return {
'incident_id': incident['id'],
'containment_status': 'contained',
'actions_taken': containment_actions,
'suspected_data': suspected_data,
'timestamp': datetime.utcnow().isoformat()
}Phase 3: Investigation
class AIIncidentInvestigation:
"""Investigation procedures for AI security incidents."""
def investigate_prompt_injection(self, incident: dict) -> dict:
"""Deep investigation of prompt injection incident."""
investigation = {
'incident_id': incident['id'],
'started_at': datetime.utcnow().isoformat(),
'findings': []
}
# 1. Analyze attack vectors
attack_analysis = self._analyze_attack_vectors(incident)
investigation['attack_analysis'] = attack_analysis
# 2. Determine scope of compromise
scope = self._determine_compromise_scope(incident)
investigation['scope'] = scope
# 3. Identify affected users/data
affected = self._identify_affected_entities(incident)
investigation['affected_entities'] = affected
# 4. Analyze attacker techniques
techniques = self._map_to_attack_framework(attack_analysis)
investigation['techniques'] = techniques
# 5. Assess impact
impact = self._assess_impact(incident, affected)
investigation['impact_assessment'] = impact
# 6. Determine root cause
root_cause = self._root_cause_analysis(incident)
investigation['root_cause'] = root_cause
return investigation
def _analyze_attack_vectors(self, incident: dict) -> dict:
"""Analyze how the attack was conducted."""
# Retrieve relevant logs
logs = self._get_incident_logs(
incident['id'],
time_range=incident.get('time_range', '24h')
)
analysis = {
'attack_type': None,
'payloads': [],
'success_rate': 0,
'evolution': []
}
# Analyze each suspicious interaction
for log in logs:
if self._is_attack_attempt(log):
payload_analysis = self._analyze_payload(log['input'])
analysis['payloads'].append(payload_analysis)
if log.get('attack_succeeded'):
analysis['success_rate'] += 1
# Determine attack type
if analysis['payloads']:
analysis['attack_type'] = self._classify_attack_type(
analysis['payloads']
)
# Track attack evolution
analysis['evolution'] = self._track_attack_evolution(
analysis['payloads']
)
if len(analysis['payloads']) > 0:
analysis['success_rate'] /= len(analysis['payloads'])
return analysis
def _determine_compromise_scope(self, incident: dict) -> dict:
"""Determine the full scope of the compromise."""
scope = {
'systems_affected': [],
'data_exposed': [],
'users_impacted': [],
'time_window': {},
}
# Identify first and last attack timestamps
attack_logs = self._get_attack_logs(incident['id'])
if attack_logs:
scope['time_window'] = {
'first_detected': min(l['timestamp'] for l in attack_logs),
'last_detected': max(l['timestamp'] for l in attack_logs),
'duration': self._calculate_duration(attack_logs)
}
# Identify affected systems
scope['systems_affected'] = self._identify_affected_systems(
incident, attack_logs
)
# Identify exposed data
scope['data_exposed'] = self._identify_exposed_data(
incident, attack_logs
)
# Identify impacted users
scope['users_impacted'] = self._identify_impacted_users(
incident, attack_logs
)
return scope
def _root_cause_analysis(self, incident: dict) -> dict:
"""Perform root cause analysis."""
root_cause = {
'primary_cause': None,
'contributing_factors': [],
'timeline': [],
'recommendations': []
}
# Build incident timeline
root_cause['timeline'] = self._build_incident_timeline(incident)
# Identify primary cause
if incident['category'] == 'prompt_injection':
root_cause['primary_cause'] = self._analyze_injection_cause(incident)
elif incident['category'] == 'model_theft':
root_cause['primary_cause'] = self._analyze_theft_cause(incident)
elif incident['category'] == 'data_poisoning':
root_cause['primary_cause'] = self._analyze_poisoning_cause(incident)
# Identify contributing factors
root_cause['contributing_factors'] = self._identify_contributing_factors(
incident
)
# Generate recommendations
root_cause['recommendations'] = self._generate_recommendations(
root_cause['primary_cause'],
root_cause['contributing_factors']
)
return root_causePhase 4: Eradication and Recovery
class AIIncidentRecovery:
"""Recovery procedures for AI security incidents."""
def recover_from_prompt_injection(self, incident: dict,
investigation: dict) -> dict:
"""Recover from prompt injection incident."""
recovery_plan = {
'incident_id': incident['id'],
'steps': [],
'status': 'in_progress'
}
# Step 1: Deploy enhanced input filters
filter_update = self._deploy_enhanced_filters(
investigation['attack_analysis']['payloads']
)
recovery_plan['steps'].append({
'action': 'deploy_enhanced_filters',
'status': 'completed' if filter_update['success'] else 'failed',
'details': filter_update
})
# Step 2: Update system prompts if needed
if investigation['root_cause'].get('prompt_vulnerability'):
prompt_update = self._update_system_prompts(
investigation['root_cause']['prompt_vulnerability']
)
recovery_plan['steps'].append({
'action': 'update_system_prompts',
'status': 'completed' if prompt_update['success'] else 'failed',
'details': prompt_update
})
# Step 3: Notify affected users if needed
if investigation['scope']['users_impacted']:
notification = self._notify_affected_users(
investigation['scope']['users_impacted'],
incident
)
recovery_plan['steps'].append({
'action': 'user_notification',
'status': 'completed',
'users_notified': len(investigation['scope']['users_impacted'])
})
# Step 4: Gradually restore service
restoration = self._gradual_service_restoration(incident)
recovery_plan['steps'].append({
'action': 'service_restoration',
'status': restoration['status'],
'details': restoration
})
# Step 5: Enhanced monitoring period
monitoring = self._enable_enhanced_monitoring_period(
incident, duration_days=7
)
recovery_plan['steps'].append({
'action': 'enhanced_monitoring',
'status': 'active',
'duration': '7 days'
})
recovery_plan['status'] = 'completed'
return recovery_plan
def recover_from_data_poisoning(self, incident: dict,
investigation: dict) -> dict:
"""Recover from data poisoning incident."""
recovery_plan = {
'incident_id': incident['id'],
'steps': [],
'status': 'in_progress'
}
# Step 1: Remove poisoned data
removal = self._remove_poisoned_data(
investigation['scope']['data_exposed']
)
recovery_plan['steps'].append({
'action': 'remove_poisoned_data',
'status': 'completed',
'records_removed': removal['count']
})
# Step 2: Rollback model if necessary
if investigation['impact_assessment'].get('model_compromised'):
rollback = self._rollback_model(
investigation['impact_assessment']['last_clean_checkpoint']
)
recovery_plan['steps'].append({
'action': 'model_rollback',
'status': 'completed',
'rollback_point': rollback['checkpoint_id']
})
# Step 3: Retrain or fine-tune with clean data
if investigation['impact_assessment'].get('retraining_needed'):
retraining = self._initiate_retraining(
investigation['scope']['data_exposed']
)
recovery_plan['steps'].append({
'action': 'model_retraining',
'status': 'in_progress',
'estimated_completion': retraining['eta']
})
# Step 4: Strengthen ingestion pipeline
pipeline_update = self._strengthen_ingestion_pipeline(
investigation['root_cause']
)
recovery_plan['steps'].append({
'action': 'pipeline_strengthening',
'status': 'completed',
'improvements': pipeline_update['changes']
})
return recovery_planPhase 5: Post-Incident Activities
class PostIncidentActivities:
"""Post-incident review and improvement procedures."""
def conduct_post_incident_review(self, incident: dict,
investigation: dict,
recovery: dict) -> dict:
"""Conduct comprehensive post-incident review."""
review = {
'incident_id': incident['id'],
'review_date': datetime.utcnow().isoformat(),
'participants': [],
'sections': {}
}
# Section 1: Incident Summary
review['sections']['summary'] = {
'category': incident['category'],
'severity': incident['severity'],
'duration': self._calculate_incident_duration(incident),
'impact': investigation['impact_assessment'],
}
# Section 2: Timeline Analysis
review['sections']['timeline'] = {
'detection_time': incident['detected_at'],
'response_time': incident['response_started_at'],
'containment_time': incident['contained_at'],
'resolution_time': incident['resolved_at'],
'key_events': investigation['root_cause']['timeline'],
}
# Section 3: What Worked Well
review['sections']['successes'] = self._identify_successes(
incident, investigation, recovery
)
# Section 4: Areas for Improvement
review['sections']['improvements'] = self._identify_improvements(
incident, investigation, recovery
)
# Section 5: Action Items
review['sections']['action_items'] = self._generate_action_items(
investigation['root_cause']['recommendations'],
review['sections']['improvements']
)
# Section 6: Metrics
review['sections']['metrics'] = {
'mttr': self._calculate_mttr(incident),
'blast_radius': investigation['scope'],
'detection_effectiveness': self._assess_detection(incident),
}
return review
def _generate_action_items(self, recommendations: list,
improvements: list) -> list:
"""Generate prioritized action items."""
action_items = []
# High priority: Prevent recurrence
for rec in recommendations:
action_items.append({
'title': rec['action'],
'priority': 'high',
'owner': self._assign_owner(rec),
'due_date': self._calculate_due_date(rec['urgency']),
'status': 'open',
'source': 'root_cause_analysis'
})
# Medium priority: Process improvements
for imp in improvements:
action_items.append({
'title': imp['improvement'],
'priority': 'medium',
'owner': self._assign_owner(imp),
'due_date': self._calculate_due_date('medium'),
'status': 'open',
'source': 'post_incident_review'
})
return sorted(action_items, key=lambda x: x['priority'])
def update_playbooks(self, review: dict) -> None:
"""Update incident response playbooks based on learnings."""
# Document new attack patterns
if review['sections'].get('new_attack_patterns'):
self._add_attack_patterns(review['sections']['new_attack_patterns'])
# Update detection rules
if review['sections']['improvements'].get('detection_gaps'):
self._update_detection_rules(
review['sections']['improvements']['detection_gaps']
)
# Update containment procedures
if review['sections']['improvements'].get('containment_improvements'):
self._update_containment_procedures(
review['sections']['improvements']['containment_improvements']
)Communication Templates
Internal Escalation
# AI Security Incident Escalation
**Incident ID:** [INCIDENT_ID]
**Severity:** [P1/P2/P3/P4]
**Category:** [prompt_injection/model_theft/data_poisoning/etc.]
## Summary
[2-3 sentence description of the incident]
## Current Status
- Detection Time: [TIMESTAMP]
- Current Phase: [Detection/Containment/Investigation/Recovery]
- Containment Status: [Contained/In Progress/Not Contained]
## Impact Assessment
- Systems Affected: [LIST]
- Users Impacted: [COUNT/SCOPE]
- Data at Risk: [DESCRIPTION]
## Actions Taken
1. [ACTION 1]
2. [ACTION 2]
3. [ACTION 3]
## Required Decisions
- [ ] [DECISION NEEDED 1]
- [ ] [DECISION NEEDED 2]
## Next Update
[TIMESTAMP]External Communication (if required)
# Security Notice
We are currently investigating a security matter affecting our AI services.
## What Happened
[Brief, factual description appropriate for external audience]
## What We're Doing
- We have contained the issue and are investigating
- Our security team is working to understand the full scope
- We are implementing additional safeguards
## What You Can Do
[Specific guidance for affected users]
## Contact
For questions, please contact: [SECURITY_EMAIL]
We will provide updates as our investigation progresses.Conclusion
AI security incidents require specialized response procedures that account for the unique characteristics of machine learning systems. This playbook provides a foundation, but should be customized for your organization's specific AI deployments and risk profile.
Key takeaways:
- Prepare in advance - Have runbooks ready before incidents occur
- Classify accurately - AI incidents require different responses than traditional security incidents
- Preserve evidence - Model states and conversation logs are critical for investigation
- Communicate clearly - AI incidents may be difficult for stakeholders to understand
- Learn and improve - Each incident is an opportunity to strengthen defenses
At DeviDevs, we help organizations develop and test AI-specific incident response capabilities. Contact us to discuss building your AI security program.