The Efficacy of Rule-Based Versus Large Language Model-Based Chatbots in Alleviating Symptoms of Depression and Anxiety: Systematic Review and Meta-Analysis #MMPMID41343858
Du Q; Ren Y; Meng ZL; He H; Meng S
J Med Internet Res 2025[Dec]; 27 (?): e78186 PMID41343858show ga
BACKGROUND: The global mental health crisis is becoming increasingly severe. Due to the shortage of mental health professionals, high treatment costs, and insufficient accessibility of services, there is an urgent need for scalable and low-cost intervention methods. In recent years, chatbots have shown potential for psychological interventions. The efficacy differences between large language model (LLM)-based and rule-based chatbots have not been systematically evaluated, with few studies directly comparing the two; existing meta-analyses have notable limitations: there is high heterogeneity in intervention design (eg, dialogue structure, interaction frequency, and duration) across studies, and there is a lack of direct comparison of differentiated intervention effects on depressive and anxiety symptoms, making it difficult to integrate conclusions. OBJECTIVE: By integrating studies from the past five years, this research evaluates the differences in effectiveness between LLM-based and rule-based chatbots in alleviating depressive and anxiety symptoms. It also analyzes the impacts of control group type, intervention duration, and age on intervention outcomes. By analyzing chatbot functionality, the study aims to provide evidence-based technological pathway options and optimization recommendations for differentiated interventions for depression and anxiety. METHODS: A systematic search of 7 databases included 15 studies published between 2020 and 2025. Robust variance estimation (RVE) was used to account for non-independent effect sizes, and standardized mean differences (SMDs) were calculated using Hedges g. Based on the expectation of clinical and methodological heterogeneity among studies, a random-effects model was preselected, and the pooled effect size was estimated using restricted maximum likelihood estimation (REML) and interpreted according to Cohen criteria. Publication bias was assessed using the RVE-adjusted Egger test, funnel plot asymmetry, and a fail-safe N. RESULTS: For depression, rule-based intervention achieved a small but significant effect (g=0.266; 95% CI 0.020-0.512; P=.04), while LLM-based intervention showed a nonsignificant effect with wide confidence intervals (g=0.407; 95% CI -0.734 to 1.550; P=.17). For anxiety, rule-based intervention did not yield a significant effect (g=0.147; 95% CI -0.073 to 0.367; P=.15). Similarly, LLM-based intervention showed a higher point estimate but also with nonsignificance and wide confidence intervals (g=0.711; 95% CI -0.334 to 1.760; P=.13). Subgroup analysis showed that the rule-based chatbot was more effective than the blank control for depression, with the greatest effect in the medium term (4-8 weeks). CONCLUSIONS: Rule-based chatbots have a modest effect on improving depressive symptoms and are suitable for environments with limited psychological resources; 4-8 weeks may be a critical intervention window. Intervention duration and participant age did not significantly influence intervention effectiveness. Limited by the sample size, robust evidence supporting the effectiveness of LLM-based chatbot interventions is lacking, and further sample size expansion is warranted.