AI Security

AI Security Incident Response: A Comprehensive Playbook

DeviDevs Team
11 min read
#incident-response#ai-security#playbook#security-operations#enterprise-security

AI Security Incident Response: A Comprehensive Playbook

When an AI security incident occurs, traditional incident response procedures often fall short. The unique characteristics of AI systems - non-deterministic behavior, complex data dependencies, and novel attack vectors - require specialized response procedures.

This playbook provides a comprehensive framework for responding to AI-specific security incidents.

Why AI Needs a Specialized IR Playbook

AI incidents differ from traditional security incidents in several critical ways:

| Traditional Incident | AI Incident | |---------------------|-------------| | Clear attack indicators | Subtle behavioral changes | | Static malware signatures | Dynamic prompt-based attacks | | Binary state (compromised/clean) | Degraded or biased behavior | | Isolated system impact | Downstream decision cascade | | Deterministic reproduction | Non-deterministic behavior |

Incident Classification Framework

Severity Levels

Severity Levels:
 
  Critical (P1):
    definition: "Immediate threat to business operations or safety"
    examples:
      - Active prompt injection causing harmful outputs to users
      - Model theft confirmed with data exfiltration
      - AI making critical decisions with poisoned data
      - Public-facing AI producing harmful or illegal content
    response_time: 15 minutes
    escalation: Immediate executive notification
 
  High (P2):
    definition: "Significant security breach with potential for escalation"
    examples:
      - Detected prompt injection attempts with partial success
      - Suspected training data poisoning
      - Unauthorized access to model weights or training data
      - AI output quality degradation indicating compromise
    response_time: 1 hour
    escalation: Security leadership within 2 hours
 
  Medium (P3):
    definition: "Security concern requiring investigation"
    examples:
      - Anomalous query patterns suggesting probing
      - Minor prompt injection attempts blocked by guardrails
      - Unexpected model behavior changes
      - Access control policy violations
    response_time: 4 hours
    escalation: Team lead within 1 business day
 
  Low (P4):
    definition: "Security observation for awareness"
    examples:
      - Failed authentication attempts on AI endpoints
      - Minor policy violations detected and blocked
      - Routine security scanning findings
    response_time: 1 business day
    escalation: Weekly security review

Incident Categories

class AIIncidentCategory:
    """Classification of AI-specific security incidents."""
 
    categories = {
        'prompt_injection': {
            'description': 'Attempts to manipulate AI behavior through crafted inputs',
            'subcategories': [
                'direct_injection',      # User input manipulation
                'indirect_injection',    # Via external data sources
                'stored_injection',      # Persistent payloads
            ],
            'typical_severity': 'P1-P2',
            'indicators': [
                'Unusual output patterns',
                'Safety guardrail triggers',
                'Context window anomalies',
                'Instruction-like content in user messages',
            ],
        },
 
        'model_theft': {
            'description': 'Attempts to extract model weights or replicate capabilities',
            'subcategories': [
                'weight_extraction',     # Direct model stealing
                'distillation_attack',   # Training surrogate model
                'api_abuse',             # Systematic querying
            ],
            'typical_severity': 'P1-P2',
            'indicators': [
                'High query volumes from single source',
                'Systematic input patterns',
                'Unusual API access patterns',
                'Model serving infrastructure anomalies',
            ],
        },
 
        'data_poisoning': {
            'description': 'Corruption of training data or knowledge bases',
            'subcategories': [
                'training_data_poisoning',
                'rag_knowledge_poisoning',
                'feedback_loop_poisoning',
            ],
            'typical_severity': 'P1-P2',
            'indicators': [
                'Output bias or drift',
                'Unexpected model behavior',
                'Data integrity check failures',
                'User complaints about accuracy',
            ],
        },
 
        'privacy_breach': {
            'description': 'Unauthorized disclosure of sensitive information',
            'subcategories': [
                'training_data_leakage',
                'pii_exposure',
                'system_prompt_disclosure',
                'conversation_leakage',
            ],
            'typical_severity': 'P2-P3',
            'indicators': [
                'Sensitive patterns in outputs',
                'User reports of information exposure',
                'Audit log anomalies',
            ],
        },
 
        'adversarial_attack': {
            'description': 'Crafted inputs designed to cause misclassification or errors',
            'subcategories': [
                'evasion_attack',
                'model_confusion',
                'output_manipulation',
            ],
            'typical_severity': 'P2-P3',
            'indicators': [
                'Confidence score anomalies',
                'Decision boundary probing',
                'Unusual input characteristics',
            ],
        },
 
        'availability_attack': {
            'description': 'Attacks targeting AI system availability',
            'subcategories': [
                'resource_exhaustion',
                'model_degradation',
                'inference_dos',
            ],
            'typical_severity': 'P2-P3',
            'indicators': [
                'Latency spikes',
                'Resource utilization anomalies',
                'Error rate increases',
            ],
        },
    }

Incident Response Phases

Phase 1: Detection and Triage

class AIIncidentDetection:
    """Detection and initial triage procedures."""
 
    def __init__(self):
        self.detection_sources = [
            'automated_monitoring',
            'user_reports',
            'security_tools',
            'model_performance_alerts',
            'external_notification',
        ]
 
    def initial_triage(self, alert: dict) -> dict:
        """
        Perform initial triage of potential AI security incident.
        """
 
        triage_result = {
            'timestamp': datetime.utcnow().isoformat(),
            'alert_id': alert['id'],
            'triage_analyst': self.get_on_call_analyst(),
        }
 
        # Step 1: Validate alert authenticity
        if not self._validate_alert_source(alert):
            triage_result['status'] = 'dismissed'
            triage_result['reason'] = 'Invalid alert source'
            return triage_result
 
        # Step 2: Determine incident category
        category = self._classify_incident(alert)
        triage_result['category'] = category
 
        # Step 3: Assess initial severity
        severity = self._assess_severity(alert, category)
        triage_result['severity'] = severity
 
        # Step 4: Check for related incidents
        related = self._find_related_incidents(alert)
        triage_result['related_incidents'] = related
 
        # Step 5: Determine if this is a true incident
        is_incident = self._determine_incident_status(alert, category)
        triage_result['is_incident'] = is_incident
 
        if is_incident:
            triage_result['status'] = 'escalated'
            triage_result['incident_id'] = self._create_incident(triage_result)
        else:
            triage_result['status'] = 'monitoring'
 
        return triage_result
 
    def _classify_incident(self, alert: dict) -> str:
        """Classify the incident type based on alert indicators."""
 
        indicators = alert.get('indicators', [])
 
        # Check for prompt injection indicators
        injection_indicators = [
            'unusual_output_pattern',
            'guardrail_trigger',
            'instruction_in_input',
        ]
        if any(ind in indicators for ind in injection_indicators):
            return 'prompt_injection'
 
        # Check for model theft indicators
        theft_indicators = [
            'high_query_volume',
            'systematic_queries',
            'api_abuse_detected',
        ]
        if any(ind in indicators for ind in theft_indicators):
            return 'model_theft'
 
        # Check for data poisoning indicators
        poisoning_indicators = [
            'output_drift',
            'accuracy_degradation',
            'unexpected_behavior',
        ]
        if any(ind in indicators for ind in poisoning_indicators):
            return 'data_poisoning'
 
        return 'unknown'

Phase 2: Containment

Containment strategies vary by incident type:

class AIIncidentContainment:
    """Containment procedures for AI security incidents."""
 
    def contain_prompt_injection(self, incident: dict) -> dict:
        """Contain an active prompt injection attack."""
 
        containment_actions = []
 
        # Immediate: Block attacking user/IP
        if incident.get('source_identified'):
            self._block_source(incident['source'])
            containment_actions.append('source_blocked')
 
        # Short-term: Enhance input filtering
        self._enable_strict_filtering()
        containment_actions.append('strict_filtering_enabled')
 
        # If attack is succeeding: Circuit breaker
        if incident['severity'] == 'P1':
            self._enable_circuit_breaker()
            containment_actions.append('circuit_breaker_enabled')
 
            # Consider taking system offline if critical
            if incident.get('harmful_outputs_confirmed'):
                self._disable_public_access()
                containment_actions.append('public_access_disabled')
 
        # Preserve evidence
        self._snapshot_conversation_logs(incident)
        self._preserve_model_state()
        containment_actions.append('evidence_preserved')
 
        return {
            'incident_id': incident['id'],
            'containment_status': 'contained',
            'actions_taken': containment_actions,
            'timestamp': datetime.utcnow().isoformat()
        }
 
    def contain_model_theft(self, incident: dict) -> dict:
        """Contain suspected model theft."""
 
        containment_actions = []
 
        # Block suspicious access
        suspicious_accounts = incident.get('suspicious_accounts', [])
        for account in suspicious_accounts:
            self._revoke_api_access(account)
        containment_actions.append(f'revoked_access_{len(suspicious_accounts)}_accounts')
 
        # Implement strict rate limiting
        self._enable_emergency_rate_limits()
        containment_actions.append('emergency_rate_limits_enabled')
 
        # Add additional authentication for model access
        self._enable_enhanced_auth()
        containment_actions.append('enhanced_auth_enabled')
 
        # Monitor for continued attempts
        self._enable_enhanced_monitoring(incident['patterns'])
        containment_actions.append('enhanced_monitoring_enabled')
 
        return {
            'incident_id': incident['id'],
            'containment_status': 'contained',
            'actions_taken': containment_actions,
            'timestamp': datetime.utcnow().isoformat()
        }
 
    def contain_data_poisoning(self, incident: dict) -> dict:
        """Contain data poisoning incident."""
 
        containment_actions = []
 
        # Stop ingestion pipeline
        self._pause_data_ingestion()
        containment_actions.append('ingestion_paused')
 
        # Identify potentially poisoned data
        suspected_data = self._identify_suspicious_data(incident)
        containment_actions.append(f'identified_{len(suspected_data)}_suspicious_records')
 
        # If model compromised, consider rollback
        if incident.get('model_behavior_affected'):
            # Rollback to last known good checkpoint
            self._prepare_model_rollback()
            containment_actions.append('rollback_prepared')
 
        # Quarantine suspected data
        for data_id in suspected_data:
            self._quarantine_data(data_id)
        containment_actions.append('suspicious_data_quarantined')
 
        return {
            'incident_id': incident['id'],
            'containment_status': 'contained',
            'actions_taken': containment_actions,
            'suspected_data': suspected_data,
            'timestamp': datetime.utcnow().isoformat()
        }

Phase 3: Investigation

class AIIncidentInvestigation:
    """Investigation procedures for AI security incidents."""
 
    def investigate_prompt_injection(self, incident: dict) -> dict:
        """Deep investigation of prompt injection incident."""
 
        investigation = {
            'incident_id': incident['id'],
            'started_at': datetime.utcnow().isoformat(),
            'findings': []
        }
 
        # 1. Analyze attack vectors
        attack_analysis = self._analyze_attack_vectors(incident)
        investigation['attack_analysis'] = attack_analysis
 
        # 2. Determine scope of compromise
        scope = self._determine_compromise_scope(incident)
        investigation['scope'] = scope
 
        # 3. Identify affected users/data
        affected = self._identify_affected_entities(incident)
        investigation['affected_entities'] = affected
 
        # 4. Analyze attacker techniques
        techniques = self._map_to_attack_framework(attack_analysis)
        investigation['techniques'] = techniques
 
        # 5. Assess impact
        impact = self._assess_impact(incident, affected)
        investigation['impact_assessment'] = impact
 
        # 6. Determine root cause
        root_cause = self._root_cause_analysis(incident)
        investigation['root_cause'] = root_cause
 
        return investigation
 
    def _analyze_attack_vectors(self, incident: dict) -> dict:
        """Analyze how the attack was conducted."""
 
        # Retrieve relevant logs
        logs = self._get_incident_logs(
            incident['id'],
            time_range=incident.get('time_range', '24h')
        )
 
        analysis = {
            'attack_type': None,
            'payloads': [],
            'success_rate': 0,
            'evolution': []
        }
 
        # Analyze each suspicious interaction
        for log in logs:
            if self._is_attack_attempt(log):
                payload_analysis = self._analyze_payload(log['input'])
                analysis['payloads'].append(payload_analysis)
 
                if log.get('attack_succeeded'):
                    analysis['success_rate'] += 1
 
        # Determine attack type
        if analysis['payloads']:
            analysis['attack_type'] = self._classify_attack_type(
                analysis['payloads']
            )
 
            # Track attack evolution
            analysis['evolution'] = self._track_attack_evolution(
                analysis['payloads']
            )
 
        if len(analysis['payloads']) > 0:
            analysis['success_rate'] /= len(analysis['payloads'])
 
        return analysis
 
    def _determine_compromise_scope(self, incident: dict) -> dict:
        """Determine the full scope of the compromise."""
 
        scope = {
            'systems_affected': [],
            'data_exposed': [],
            'users_impacted': [],
            'time_window': {},
        }
 
        # Identify first and last attack timestamps
        attack_logs = self._get_attack_logs(incident['id'])
        if attack_logs:
            scope['time_window'] = {
                'first_detected': min(l['timestamp'] for l in attack_logs),
                'last_detected': max(l['timestamp'] for l in attack_logs),
                'duration': self._calculate_duration(attack_logs)
            }
 
        # Identify affected systems
        scope['systems_affected'] = self._identify_affected_systems(
            incident, attack_logs
        )
 
        # Identify exposed data
        scope['data_exposed'] = self._identify_exposed_data(
            incident, attack_logs
        )
 
        # Identify impacted users
        scope['users_impacted'] = self._identify_impacted_users(
            incident, attack_logs
        )
 
        return scope
 
    def _root_cause_analysis(self, incident: dict) -> dict:
        """Perform root cause analysis."""
 
        root_cause = {
            'primary_cause': None,
            'contributing_factors': [],
            'timeline': [],
            'recommendations': []
        }
 
        # Build incident timeline
        root_cause['timeline'] = self._build_incident_timeline(incident)
 
        # Identify primary cause
        if incident['category'] == 'prompt_injection':
            root_cause['primary_cause'] = self._analyze_injection_cause(incident)
        elif incident['category'] == 'model_theft':
            root_cause['primary_cause'] = self._analyze_theft_cause(incident)
        elif incident['category'] == 'data_poisoning':
            root_cause['primary_cause'] = self._analyze_poisoning_cause(incident)
 
        # Identify contributing factors
        root_cause['contributing_factors'] = self._identify_contributing_factors(
            incident
        )
 
        # Generate recommendations
        root_cause['recommendations'] = self._generate_recommendations(
            root_cause['primary_cause'],
            root_cause['contributing_factors']
        )
 
        return root_cause

Phase 4: Eradication and Recovery

class AIIncidentRecovery:
    """Recovery procedures for AI security incidents."""
 
    def recover_from_prompt_injection(self, incident: dict,
                                      investigation: dict) -> dict:
        """Recover from prompt injection incident."""
 
        recovery_plan = {
            'incident_id': incident['id'],
            'steps': [],
            'status': 'in_progress'
        }
 
        # Step 1: Deploy enhanced input filters
        filter_update = self._deploy_enhanced_filters(
            investigation['attack_analysis']['payloads']
        )
        recovery_plan['steps'].append({
            'action': 'deploy_enhanced_filters',
            'status': 'completed' if filter_update['success'] else 'failed',
            'details': filter_update
        })
 
        # Step 2: Update system prompts if needed
        if investigation['root_cause'].get('prompt_vulnerability'):
            prompt_update = self._update_system_prompts(
                investigation['root_cause']['prompt_vulnerability']
            )
            recovery_plan['steps'].append({
                'action': 'update_system_prompts',
                'status': 'completed' if prompt_update['success'] else 'failed',
                'details': prompt_update
            })
 
        # Step 3: Notify affected users if needed
        if investigation['scope']['users_impacted']:
            notification = self._notify_affected_users(
                investigation['scope']['users_impacted'],
                incident
            )
            recovery_plan['steps'].append({
                'action': 'user_notification',
                'status': 'completed',
                'users_notified': len(investigation['scope']['users_impacted'])
            })
 
        # Step 4: Gradually restore service
        restoration = self._gradual_service_restoration(incident)
        recovery_plan['steps'].append({
            'action': 'service_restoration',
            'status': restoration['status'],
            'details': restoration
        })
 
        # Step 5: Enhanced monitoring period
        monitoring = self._enable_enhanced_monitoring_period(
            incident, duration_days=7
        )
        recovery_plan['steps'].append({
            'action': 'enhanced_monitoring',
            'status': 'active',
            'duration': '7 days'
        })
 
        recovery_plan['status'] = 'completed'
        return recovery_plan
 
    def recover_from_data_poisoning(self, incident: dict,
                                   investigation: dict) -> dict:
        """Recover from data poisoning incident."""
 
        recovery_plan = {
            'incident_id': incident['id'],
            'steps': [],
            'status': 'in_progress'
        }
 
        # Step 1: Remove poisoned data
        removal = self._remove_poisoned_data(
            investigation['scope']['data_exposed']
        )
        recovery_plan['steps'].append({
            'action': 'remove_poisoned_data',
            'status': 'completed',
            'records_removed': removal['count']
        })
 
        # Step 2: Rollback model if necessary
        if investigation['impact_assessment'].get('model_compromised'):
            rollback = self._rollback_model(
                investigation['impact_assessment']['last_clean_checkpoint']
            )
            recovery_plan['steps'].append({
                'action': 'model_rollback',
                'status': 'completed',
                'rollback_point': rollback['checkpoint_id']
            })
 
        # Step 3: Retrain or fine-tune with clean data
        if investigation['impact_assessment'].get('retraining_needed'):
            retraining = self._initiate_retraining(
                investigation['scope']['data_exposed']
            )
            recovery_plan['steps'].append({
                'action': 'model_retraining',
                'status': 'in_progress',
                'estimated_completion': retraining['eta']
            })
 
        # Step 4: Strengthen ingestion pipeline
        pipeline_update = self._strengthen_ingestion_pipeline(
            investigation['root_cause']
        )
        recovery_plan['steps'].append({
            'action': 'pipeline_strengthening',
            'status': 'completed',
            'improvements': pipeline_update['changes']
        })
 
        return recovery_plan

Phase 5: Post-Incident Activities

class PostIncidentActivities:
    """Post-incident review and improvement procedures."""
 
    def conduct_post_incident_review(self, incident: dict,
                                    investigation: dict,
                                    recovery: dict) -> dict:
        """Conduct comprehensive post-incident review."""
 
        review = {
            'incident_id': incident['id'],
            'review_date': datetime.utcnow().isoformat(),
            'participants': [],
            'sections': {}
        }
 
        # Section 1: Incident Summary
        review['sections']['summary'] = {
            'category': incident['category'],
            'severity': incident['severity'],
            'duration': self._calculate_incident_duration(incident),
            'impact': investigation['impact_assessment'],
        }
 
        # Section 2: Timeline Analysis
        review['sections']['timeline'] = {
            'detection_time': incident['detected_at'],
            'response_time': incident['response_started_at'],
            'containment_time': incident['contained_at'],
            'resolution_time': incident['resolved_at'],
            'key_events': investigation['root_cause']['timeline'],
        }
 
        # Section 3: What Worked Well
        review['sections']['successes'] = self._identify_successes(
            incident, investigation, recovery
        )
 
        # Section 4: Areas for Improvement
        review['sections']['improvements'] = self._identify_improvements(
            incident, investigation, recovery
        )
 
        # Section 5: Action Items
        review['sections']['action_items'] = self._generate_action_items(
            investigation['root_cause']['recommendations'],
            review['sections']['improvements']
        )
 
        # Section 6: Metrics
        review['sections']['metrics'] = {
            'mttr': self._calculate_mttr(incident),
            'blast_radius': investigation['scope'],
            'detection_effectiveness': self._assess_detection(incident),
        }
 
        return review
 
    def _generate_action_items(self, recommendations: list,
                              improvements: list) -> list:
        """Generate prioritized action items."""
 
        action_items = []
 
        # High priority: Prevent recurrence
        for rec in recommendations:
            action_items.append({
                'title': rec['action'],
                'priority': 'high',
                'owner': self._assign_owner(rec),
                'due_date': self._calculate_due_date(rec['urgency']),
                'status': 'open',
                'source': 'root_cause_analysis'
            })
 
        # Medium priority: Process improvements
        for imp in improvements:
            action_items.append({
                'title': imp['improvement'],
                'priority': 'medium',
                'owner': self._assign_owner(imp),
                'due_date': self._calculate_due_date('medium'),
                'status': 'open',
                'source': 'post_incident_review'
            })
 
        return sorted(action_items, key=lambda x: x['priority'])
 
    def update_playbooks(self, review: dict) -> None:
        """Update incident response playbooks based on learnings."""
 
        # Document new attack patterns
        if review['sections'].get('new_attack_patterns'):
            self._add_attack_patterns(review['sections']['new_attack_patterns'])
 
        # Update detection rules
        if review['sections']['improvements'].get('detection_gaps'):
            self._update_detection_rules(
                review['sections']['improvements']['detection_gaps']
            )
 
        # Update containment procedures
        if review['sections']['improvements'].get('containment_improvements'):
            self._update_containment_procedures(
                review['sections']['improvements']['containment_improvements']
            )

Communication Templates

Internal Escalation

# AI Security Incident Escalation
 
**Incident ID:** [INCIDENT_ID]
**Severity:** [P1/P2/P3/P4]
**Category:** [prompt_injection/model_theft/data_poisoning/etc.]
 
## Summary
[2-3 sentence description of the incident]
 
## Current Status
- Detection Time: [TIMESTAMP]
- Current Phase: [Detection/Containment/Investigation/Recovery]
- Containment Status: [Contained/In Progress/Not Contained]
 
## Impact Assessment
- Systems Affected: [LIST]
- Users Impacted: [COUNT/SCOPE]
- Data at Risk: [DESCRIPTION]
 
## Actions Taken
1. [ACTION 1]
2. [ACTION 2]
3. [ACTION 3]
 
## Required Decisions
- [ ] [DECISION NEEDED 1]
- [ ] [DECISION NEEDED 2]
 
## Next Update
[TIMESTAMP]

External Communication (if required)

# Security Notice
 
We are currently investigating a security matter affecting our AI services.
 
## What Happened
[Brief, factual description appropriate for external audience]
 
## What We're Doing
- We have contained the issue and are investigating
- Our security team is working to understand the full scope
- We are implementing additional safeguards
 
## What You Can Do
[Specific guidance for affected users]
 
## Contact
For questions, please contact: [SECURITY_EMAIL]
 
We will provide updates as our investigation progresses.

Conclusion

AI security incidents require specialized response procedures that account for the unique characteristics of machine learning systems. This playbook provides a foundation, but should be customized for your organization's specific AI deployments and risk profile.

Key takeaways:

  1. Prepare in advance - Have runbooks ready before incidents occur
  2. Classify accurately - AI incidents require different responses than traditional security incidents
  3. Preserve evidence - Model states and conversation logs are critical for investigation
  4. Communicate clearly - AI incidents may be difficult for stakeholders to understand
  5. Learn and improve - Each incident is an opportunity to strengthen defenses

At DeviDevs, we help organizations develop and test AI-specific incident response capabilities. Contact us to discuss building your AI security program.

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.