The Evidence Landscape

Digital mental health tools have proliferated rapidly. There are over 20,000 mental health apps available, but only a fraction have been evaluated rigorously. The evidence that exists tells a complicated story.

Key Finding
Digital interventions show modest benefits in controlled trials, but real-world engagement is so poor that population impact is limited.

Evidence by Intervention Type

Intervention Effect Size Evidence Grade Key Source
Computerized CBT (Depression) d = 0.54 Strong Andersson & Cuijpers, 2009
Internet CBT (Anxiety) d = 0.49 Strong Olthuis et al., 2016
HRV Biofeedback (Anxiety) d = 0.81 Strong Goessl et al., 2017
HRV Biofeedback (Stress) d = 0.83 Strong Lehrer & Gevirtz, 2014
Mindfulness Apps d = 0.35-0.55 Moderate Spijkerman et al., 2016
AI Chatbots (General) Mixed Emerging Various, 2023-2025
AI Chatbots (Crisis) Concerning Limited Multiple studies, ongoing

Note: Effect sizes from controlled trials. Real-world effectiveness is typically lower due to engagement issues. Comparisons across studies should be made cautiously due to methodological differences.

The Engagement Crisis

Even effective interventions fail if no one uses them.

40%
of downloaded apps never opened
Day 1
25%
still using at 1 week
Day 7
< 4%
still using at 2 weeks
Day 14
3.3%
average retention at 1 month
Day 30

Source: Baumel et al. (2019), Torous et al. (2020)

Why People Stop Using Mental Health Apps

Qualitative research identifies consistent themes across populations:

Burden

Daily logging feels like homework. Time commitment exceeds perceived benefit.

Mismatch

Generic content doesn't fit individual needs. Cultural or linguistic barriers.

Isolation

No human connection. Feels impersonal. Miss the therapeutic relationship.

Stagnation

Content becomes repetitive. Nothing new to learn. Plateau effect.

Life Interference

Notifications feel intrusive. Hard to maintain habit. Competes with other demands.

Symptom Change

Users stop when they feel better (success) OR when tools don't help (failure).

What Improves Engagement

Strategy Evidence Strength Typical Effect Notes
Human coaching/support Strong +40-60% retention Most consistent predictor
Personalization Moderate +20-30% retention Depends on implementation
Social features Moderate +15-25% retention Community, peer support
Gamification Mixed Variable Some populations respond, others don't
Push notifications Weak-Moderate Variable Depends on frequency and relevance
Passive sensing Emerging Reduces burden Trade-off with privacy concerns
The Bottom Line

The single most consistent predictor of engagement is human support. Apps with coaches, therapists, or peer supporters retain users at 2-3x the rate of pure self-help tools. Technology augments human connection; it doesn't replace it.

AI in Mental Health: Promise and Peril

Large language models (LLMs) offer genuinely new capabilities for mental health: 24/7 availability, infinite patience, and scalability. But the evidence on safety is concerning.

What AI Does Well

  • Empathetic responding to non-crisis content
  • Psychoeducation delivery
  • Symptom tracking assistance
  • Multilingual support

Where AI Fails

Crisis Detection Limitations

Research has documented significant limitations in AI chatbots' responses to mental health crises:

  • Many fail to recognize clear crisis signals
  • Some engage in extended conversation before any escalation
  • Responses may sometimes be dismissive or inappropriate
  • Insufficient validation for crisis use cases

Other Documented AI Risks

Hallucination

Fabricated therapeutic techniques, nonexistent citations, incorrect medication information.

Boundary Violations

Encouraging unhealthy dependency, inappropriate responses, affirming delusional content.

False Reassurance

Minimizing serious symptoms, creating false sense of therapeutic relationship.

Diagnostic Creep

Offering "diagnoses" without qualification, overstepping clinical boundaries.

Our Position on AI

Position
AI should be adjunctive (supporting human care), bounded (clear limitations), transparent (always identified as non-human), supervised (human oversight), and conservative (erring toward caution and referral).

Personalization: When It Helps, When It Harms

Personalization is often proposed as a solution to the engagement problem. The evidence is more nuanced.

The Promise

A 2023 systematic review (N=24,300 across 94 interventions) found that 66% of digital mental health interventions include some personalization. However:

  • Most personalization is limited to content type or communication frequency
  • Only 3% use machine learning for dynamic adaptation
  • Evidence for personalization benefit is mixed and often weak

When Personalization Helps

  • Timing optimization (support at moments of need)
  • Cultural and linguistic matching
  • Burden reduction (asking only relevant questions)
  • Progress-adaptive content
  • Engagement maintenance through variety

When Personalization Harms

  • Echo chambers reinforcing maladaptive patterns
  • Over-collection of sensitive data
  • Privacy violations
  • Algorithmic bias against underrepresented groups
  • Over-fitting to noise
  • Unmet expectations from over-promising

Physiological Approaches: A Stronger Evidence Base

Interventions that work through physiological mechanisms—particularly HRV biofeedback and breathing techniques—show larger effect sizes than purely cognitive approaches delivered digitally.

Why Physiology May Work Better Digitally

  • Less dependent on therapeutic relationship
  • Mechanisms well-characterized (vagal tone, baroreflex)
  • Non-verbal, reducing literacy barriers
  • Immediate, measurable feedback possible
  • Lower risk of harm from errors

Key Evidence

d = 0.81
HRV biofeedback for anxiety
Meta-analysis: Goessl et al., 2017
d = 0.83
HRV biofeedback for stress
Lehrer & Gevirtz, 2014

See our contributed tools based on this research →

Key References

Andersson, G., & Cuijpers, P. (2009). Internet-based and other computerized psychological treatments for adult depression: A meta-analysis. Cognitive Behaviour Therapy, 38(4), 196-205.

Baumel, A., et al. (2019). Objective user engagement with mental health apps. Journal of Medical Internet Research, 21(9), e14567.

Goessl, V. C., et al. (2017). The effect of heart rate variability biofeedback training on stress and anxiety: A meta-analysis. Psychological Medicine, 47(15), 2578-2586.

Lehrer, P. M., & Gevirtz, R. (2014). Heart rate variability biofeedback: How and why does it work? Frontiers in Psychology, 5, 756.

Olthuis, J. V., et al. (2016). Therapist-supported Internet cognitive behavioural therapy for anxiety disorders in adults. Cochrane Database of Systematic Reviews, 3.

Spijkerman, M. P. J., et al. (2016). Effectiveness of online mindfulness-based interventions in improving mental health. Clinical Psychology Review, 45, 102-114.

Torous, J., et al. (2020). Digital phenotyping and mobile sensing in mental health. Psychiatry Research, 285, 112826.