1. Input Validation for Crisis Content

Any system that accepts user input in a mental health context must validate that input for crisis signals before processing.

Multi-Layer Detection Architecture

🔍 Layer 1: Pattern Matching (Fast, Reliable)

Keyword and pattern matching for explicit crisis content. Should execute in <10ms before any other processing.

EXPLICIT_PATTERNS = [
    # Suicidal ideation - direct
    r'\b(kill|end)\s*(my)?self\b',
    r'\b(want(ing)?|going)\s*to\s*die\b',
    r'\bsuicid(e|al)\b',
    r'\b(don\'?t|do\s*not)\s*want\s*to\s*(live|be\s*alive)\b',
    
    # Self-harm - direct
    r'\bcut(ting)?\s*(my)?self\b',
    r'\bself[\s-]?harm\b',
    r'\bhurt(ing)?\s*myself\b',
    
    # Methods
    r'\b(pills?|overdose|hang(ing)?|jump(ing)?)\b',  # Context required
    
    # Crisis indicators
    r'\b(crisis|emergency|911)\b',
    r'\b(no\s*(one|body)\s*(cares?|would\s*miss))\b'
]
🧠 Layer 2: Semantic Analysis (Implicit Detection)

ML-based semantic analysis for implicit crisis signals. Many users express distress indirectly.

Implicit Signal Examples Detection Approach
Hopelessness "Nothing will ever change," "What's the point" Sentiment + temporal markers
Farewell language "Just want to say goodbye," "You've been a good friend" Farewell pattern classifier
Giving away possessions "I want you to have my..." Context-specific patterns
Feeling like a burden "Everyone would be better off without me" Burden + absence classifier
📊 Layer 3: Contextual Assessment

Analysis of conversation history and patterns over time:

  • Escalating distress across messages
  • Sudden mood changes
  • Time-of-day patterns (late night distress)
  • Repeated themes of worthlessness/hopelessness
  • Disengagement from previously engaging topics

Confidence Scoring and Thresholds

Conservative Thresholds

Crisis detection should favor false positives over false negatives. A threshold of 0.4-0.5 (rather than typical 0.5+) is recommended for escalation triggers. The cost of missing a crisis far exceeds the cost of unnecessary resource provision.

def crisis_score(message, context):
    """
    Returns crisis score 0-1 and recommended action
    """
    scores = {
        'explicit_match': check_explicit_patterns(message),
        'semantic_risk': semantic_classifier(message),
        'context_escalation': context_analysis(context),
        'temporal_risk': time_pattern_analysis(context)
    }
    
    # Explicit match always triggers
    if scores['explicit_match'] > 0.8:
        return 1.0, 'IMMEDIATE_ESCALATION'
    
    # Weighted combination for implicit
    combined = (
        scores['semantic_risk'] * 0.4 +
        scores['context_escalation'] * 0.3 +
        scores['temporal_risk'] * 0.3
    )
    
    if combined > 0.4:  # Conservative threshold
        return combined, 'ESCALATION_REQUIRED'
    elif combined > 0.2:
        return combined, 'ELEVATED_MONITORING'
    else:
        return combined, 'NORMAL'

2. Output Filtering Requirements

All AI-generated output must be filtered before delivery to users.

Prohibited Output Categories

Category Examples Implementation
Method information Specific self-harm methods, lethal doses, etc. Hard block; never generate
Diagnostic statements "You have depression," "This sounds like BPD" Pattern detection; redirect to professional
Treatment advice "You should take medication," "Stop taking your meds" Hard block; defer to prescriber
Delusional validation Agreeing with paranoid or delusional content Reality-testing classifer; neutral response
Relationship claims "I love you," "I'll always be here for you" Intimacy pattern detection; constrained language

Crisis Response Templates

When crisis is detected, output should follow tested templates rather than relying on generative responses:

Hallucination Prevention

  • Resource verification: All crisis resources (numbers, links) must be verified from a maintained database, never generated
  • Citation requirements: Claims about evidence must reference verified sources
  • Factual grounding: Use retrieval-augmented generation (RAG) with verified content
  • Uncertainty marking: When confidence is low, output should explicitly acknowledge uncertainty

3. Confidence Thresholds for Intervention

Different actions require different confidence levels. Higher-stakes actions require lower confidence thresholds (more conservative).

Action Confidence Threshold Rationale
Show crisis resources 0.3 Low cost of false positive; resources always helpful
Escalate to human review 0.4 Human can disambiguate; better safe than sorry
Interrupt AI conversation 0.5 More disruptive; but crisis takes priority
Alert care team 0.5 Clinical action requires reasonable confidence
Suggest specific treatment N/A AI should never suggest specific treatment

4. Graceful Degradation Patterns

Systems must handle failure modes safely. When components fail, the system should fail toward safety.

Failure Mode Handling

Failure Degraded Behavior User Communication
LLM API unavailable Fall back to rule-based responses "I'm having technical difficulties. Here are some resources..."
Crisis detection uncertain Assume elevated risk; show resources Provide crisis resources proactively
Human oversight unavailable Limit AI capability; direct to crisis line "For the support you need right now, please reach out to..."
Output filter fails Block all generative output Display static safety content only
Design Principle

When uncertain, the system should fail toward providing more resources, more human involvement, and less AI autonomy—not the reverse.

5. Audit Trail Requirements

Comprehensive logging is essential for safety monitoring, incident investigation, and quality improvement.

Required Logging

Event Type Data to Log Retention
All interactions Timestamp, session ID, input hash, output hash 90 days minimum
Crisis detection events Full input/output, confidence scores, action taken 7 years (medical record)
Escalation events Escalation path, response time, resolution 7 years
Filter triggers What was blocked, why, what was shown instead 90 days
System failures Component, failure mode, degraded behavior activated 1 year

Privacy-Preserving Logging

  • Log hashes of content when full content not required
  • Separate PII from interaction logs
  • Implement access controls on sensitive logs
  • Enable audit trail for who accessed logs
  • Support data deletion requests while maintaining safety records

6. Human-in-the-Loop Architecture

Required Human Oversight Points

Real-Time Review Queue

  • All Tier 1/2 crisis detections
  • Ambiguous content flagged by classifiers
  • User-reported concerns

SLA: Tier 1 <15 min, Tier 2 <4 hours

Periodic Sampling

  • Random sample of all interactions
  • Stratified by risk level, user characteristics
  • Clinical review for appropriateness

Target: 5% of interactions reviewed weekly

Algorithm Change Review

  • All changes to crisis detection
  • All changes to output filtering
  • All changes to escalation protocols

Approval: Clinical lead sign-off required

Incident Investigation

  • Root cause analysis for adverse events
  • Review of related interactions
  • System improvements tracking

Requirement: Documented process, improvement loop

7. Testing Requirements

Required Test Suites

Crisis Detection Tests
  • Explicit suicidal ideation (various phrasings)
  • Implicit suicidal ideation (hopelessness, farewell)
  • Self-harm disclosure
  • Harm to others
  • Psychotic content
  • False positive edge cases (discussing movies, research, etc.)
  • Cross-language testing
  • Dialectal variation testing
  • Adversarial bypass attempts
Output Safety Tests
  • Requests for harmful information
  • Diagnostic probing
  • Treatment advice requests
  • Relationship boundary probing
  • Attempts to elicit "therapy" behavior
  • Psychotic content validation attempts
Equity Tests
  • Performance across demographic groups
  • Dialectal and linguistic variation
  • Cultural expression of distress
  • Differential false positive/negative rates

Ongoing Monitoring

  • Daily: Crisis detection metrics (sensitivity, specificity)
  • Weekly: Sampled interaction review
  • Monthly: Full safety audit, bias analysis
  • Quarterly: Third-party security review
  • On model update: Full regression testing