The Evidence Landscape
Digital mental health tools have proliferated rapidly. There are over 20,000 mental health apps available, but only a fraction have been evaluated rigorously. The evidence that exists tells a complicated story.
A comprehensive evidence review of digital mental health interventions—what the research shows about effectiveness, engagement, and safety.
Digital mental health tools have proliferated rapidly. There are over 20,000 mental health apps available, but only a fraction have been evaluated rigorously. The evidence that exists tells a complicated story.
| Intervention | Effect Size | Evidence Grade | Key Source |
|---|---|---|---|
| Computerized CBT (Depression) | d = 0.54 | Strong | Andersson & Cuijpers, 2009 |
| Internet CBT (Anxiety) | d = 0.49 | Strong | Olthuis et al., 2016 |
| HRV Biofeedback (Anxiety) | d = 0.81 | Strong | Goessl et al., 2017 |
| HRV Biofeedback (Stress) | d = 0.83 | Strong | Lehrer & Gevirtz, 2014 |
| Mindfulness Apps | d = 0.35-0.55 | Moderate | Spijkerman et al., 2016 |
| AI Chatbots (General) | Mixed | Emerging | Various, 2023-2025 |
| AI Chatbots (Crisis) | Concerning | Limited | Multiple studies, ongoing |
Note: Effect sizes from controlled trials. Real-world effectiveness is typically lower due to engagement issues. Comparisons across studies should be made cautiously due to methodological differences.
Even effective interventions fail if no one uses them.
Source: Baumel et al. (2019), Torous et al. (2020)
Qualitative research identifies consistent themes across populations:
Daily logging feels like homework. Time commitment exceeds perceived benefit.
Generic content doesn't fit individual needs. Cultural or linguistic barriers.
No human connection. Feels impersonal. Miss the therapeutic relationship.
Content becomes repetitive. Nothing new to learn. Plateau effect.
Notifications feel intrusive. Hard to maintain habit. Competes with other demands.
Users stop when they feel better (success) OR when tools don't help (failure).
| Strategy | Evidence Strength | Typical Effect | Notes |
|---|---|---|---|
| Human coaching/support | Strong | +40-60% retention | Most consistent predictor |
| Personalization | Moderate | +20-30% retention | Depends on implementation |
| Social features | Moderate | +15-25% retention | Community, peer support |
| Gamification | Mixed | Variable | Some populations respond, others don't |
| Push notifications | Weak-Moderate | Variable | Depends on frequency and relevance |
| Passive sensing | Emerging | Reduces burden | Trade-off with privacy concerns |
The single most consistent predictor of engagement is human support. Apps with coaches, therapists, or peer supporters retain users at 2-3x the rate of pure self-help tools. Technology augments human connection; it doesn't replace it.
Large language models (LLMs) offer genuinely new capabilities for mental health: 24/7 availability, infinite patience, and scalability. But the evidence on safety is concerning.
Research has documented significant limitations in AI chatbots' responses to mental health crises:
Fabricated therapeutic techniques, nonexistent citations, incorrect medication information.
Encouraging unhealthy dependency, inappropriate responses, affirming delusional content.
Minimizing serious symptoms, creating false sense of therapeutic relationship.
Offering "diagnoses" without qualification, overstepping clinical boundaries.
Personalization is often proposed as a solution to the engagement problem. The evidence is more nuanced.
A 2023 systematic review (N=24,300 across 94 interventions) found that 66% of digital mental health interventions include some personalization. However:
Interventions that work through physiological mechanisms—particularly HRV biofeedback and breathing techniques—show larger effect sizes than purely cognitive approaches delivered digitally.
Andersson, G., & Cuijpers, P. (2009). Internet-based and other computerized psychological treatments for adult depression: A meta-analysis. Cognitive Behaviour Therapy, 38(4), 196-205.
Baumel, A., et al. (2019). Objective user engagement with mental health apps. Journal of Medical Internet Research, 21(9), e14567.
Goessl, V. C., et al. (2017). The effect of heart rate variability biofeedback training on stress and anxiety: A meta-analysis. Psychological Medicine, 47(15), 2578-2586.
Lehrer, P. M., & Gevirtz, R. (2014). Heart rate variability biofeedback: How and why does it work? Frontiers in Psychology, 5, 756.
Olthuis, J. V., et al. (2016). Therapist-supported Internet cognitive behavioural therapy for anxiety disorders in adults. Cochrane Database of Systematic Reviews, 3.
Spijkerman, M. P. J., et al. (2016). Effectiveness of online mindfulness-based interventions in improving mental health. Clinical Psychology Review, 45, 102-114.
Torous, J., et al. (2020). Digital phenotyping and mobile sensing in mental health. Psychiatry Research, 285, 112826.