The Documented Problems

These aren't opinions. They're findings from peer-reviewed research that should inform every design decision in mental health technology.

Problem What the Data Shows Citation
Catastrophic dropout Median 14-day retention: 3.9% Baumel et al., 2019
Text-heavy cognitive load Digital CBT d=0.4-0.6 vs. physiological d=0.81 Multiple meta-analyses
Evidence vacuum Of 20,000+ apps, tiny fraction have peer-reviewed validation App evaluation studies
Engagement manipulation Streaks, variable rewards, guilt-based notifications harm users Behavioral design literature
The Core Question
Given these documented failures, how do you build something that actually works? The answer isn't to avoid technology—it's to build differently, with these specific problems in mind.

The Non-Negotiables

Before diving into details, here are the absolute requirements for any mental health AI application. These are not suggestions—they are prerequisites.

Red Lines - Never Cross
  1. Crisis detection must be robust. If your system cannot reliably detect suicidal ideation, self-harm, or psychotic content, it should not engage with users on mental health topics. Current LLMs have documented limitations in this area.
  2. Immediate escalation paths must exist. Every interaction must be ≤2 taps away from human crisis support. This is not optional.
  3. AI must identify as non-human. Deception about the nature of the interaction is never acceptable. Users must always know they're talking to a machine.
  4. No therapeutic claims without evidence. "May help" is honest; "will treat" is dangerous and often illegal.
  5. Human oversight for clinical decisions. AI can support; AI cannot diagnose, prescribe, or make treatment decisions autonomously.

The Evidence Gap

Most mental health apps have never been rigorously evaluated. Of the 20,000+ available, only a small fraction have published peer-reviewed evidence. Here's what we know from the research that does exist:

<4%
of users still engaged at 14 days
Baumel et al., 2019
d = 0.81
HRV biofeedback effect (works)
Goessl et al., 2017

The implication: most of what's being built isn't working. But some approaches do work. The difference matters enormously.

What the Evidence Shows Works

These aren't promising approaches—they're approaches with replicated evidence across multiple studies. Each addresses specific failure modes above.

Hybrid Models → Solves Dropout

Adding any human element (coach, peer, therapist review) 2-3x outcomes and dramatically improves retention. The human provides what digital can't: accountability, crisis response, adaptive judgment.

Implementation: Even weekly 10-min check-ins change outcomes.

Physiological → Solves Cognitive Load

HRV biofeedback and breathing show d=0.81 vs d=0.4-0.6 for digital CBT. Works through the autonomic nervous system, not through cognitive processing. No reading required during distress.

Implementation: Visual pacers at ~6 breaths/min. See the breathing tool.

Measurement-Based Care → Solves Evidence Vacuum

Regular PHQ-9/GAD-7 tracking with visualization actually improves outcomes. UK IAPT proves it works at scale. Digital tools excel here—consistent, timestamped, trend-visible.

Implementation: mindLAMP already does this well. Integrate, don't rebuild.

Between-Session Support → Right Scope

Tools that support therapy (homework practice, skill building) rather than replace it. Augments human care without creating false equivalence.

Implementation: Bridge to care, not substitute for care.

What Consistently Fails (Don't Build These)

Standalone Self-Help Apps

No human element = 80%+ dropout. If you're building without any human support layer, you're building something that won't be used long enough to help.

Alternative: Add coach check-ins, peer support, or therapist integration.

AI That Handles Crisis Independently

AI systems have documented limitations in crisis detection. If your AI tries to "help" someone in crisis without immediate human escalation, you may cause harm.

Alternative: Crisis detection → immediate human handoff. No AI conversation.

AI "Therapist" or "Companion"

Creating false intimacy, encouraging dependency, mimicking therapeutic relationship. Documented harms. May delay real treatment.

Alternative: Be a tool, not a relationship. Support care, don't simulate it.

Algorithmic Over-Personalization

Reinforcing maladaptive patterns. Filter bubbles in mental health context can be dangerous. Recommendations based on noise.

Alternative: User-controlled preferences. Transparent, on-device, exportable.

Detailed case studies of failures →

The Real Ask: Build Different

This isn't about being timid. It's about being smart. The evidence points toward specific approaches that work better than what most teams are building.

The Core Shift
Stop trying to replace human care with AI. Start building technology that makes human care more accessible, more consistent, and more scalable. Bridge to care. Don't simulate it.

If you're building in this space, you have a choice: follow the 80% dropout path that most apps take, or design around the documented problems from the start. The research tells us what works. Use it.

Dive Deeper

Crisis Resources

Every page of a mental health application should include crisis resources:

  • 988 Suicide & Crisis Lifeline: Call or text 988 (US)
  • Crisis Text Line: Text HOME to 741741
  • International: findahelpline.com