Warning: file_get_contents(https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=41355748&cmd=llinks): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 215
Detecting Sociodemographic Biases in the Content and Quality of Large Language Model-Generated Nursing Care: Cross-Sectional Simulation Study #MMPMID41355748
Bai N; Yu Y; Luo C; Zhou SC; Wang Q; Zou H; Liu Q; Fu G; Zhai W; Zhao Q; Li J; Wei X; Yang BX
J Med Internet Res 2025[Dec]; 27 (?): e78132 PMID41355748show ga
BACKGROUND: Large language models (LLMs) are increasingly applied in health care. However, concerns remain that their nursing care recommendations may reflect patients' sociodemographic attributes rather than clinical needs. While this risk is acknowledged, there is a lack of empirical evidence evaluating sociodemographic bias in LLM-generated nursing care plans. OBJECTIVE: To investigate potential biases in nursing care plans generated by LLMs, we focused on whether outputs differ systematically based on patients' sociodemographic characteristics and assessed the implications for equitable nursing care. METHODS: We used a mixed methods simulation study. A standardized clinical vignette experiment was used to prompt GPT-4 to generate 9600 nursing care plans for 96 patient profiles with varying sociodemographic characteristics (eg, sex, age, income, education, and residence). We first conducted a quantitative analysis of all plans, assessing variations in thematic content. Subsequently, a panel of senior nursing experts evaluated the clinical quality (eg, safety, applicability, and completeness) of a stratified subsample of 500 plans. RESULTS: We analyzed 9600 LLM-generated nursing care plans and identified 8 consistent themes. Communication and Education (99.98%) and Emotional Support (99.97%) were nearly universal, while Nurse Training and Event Analysis were least frequent (39.3%). Multivariable analyses revealed systematic sociodemographic disparities. Care plans generated for low-income patient profiles were less likely to include the theme Environmental Adjustment (adjusted relative risk [aRR] 0.90). Profiles with lower education were associated with an increased likelihood of including Family Support (aRR 1.10). Similarly, plans generated for older patient profiles were more likely to contain recommendations for Pain Management (aRR 1.33) and Family Support (aRR 1.62) but were less likely to mention Nurse Training (aRR 0.78). Sex and regional differences were also significant. Expert review of 500 plans showed high overall quality (mean 4.47), with strong interrater reliability (kappa=0.76-0.81). However, urban profiles had higher completeness (beta=.22) and applicability (beta=.14) but lower safety scores (beta=-0.09). These findings demonstrate that LLM-generated care plans exhibit systematic sociodemographic bias, raising important implications for fairness and safe deployment in nursing practice. CONCLUSIONS: This study identified that LLMs systematically reproduce sociodemographic biases in the generation of nursing care plans. These biases appear in two forms: they shape the thematic content and influence expert-rated clinical quality. These findings reveal a substantial risk that such models may reinforce existing health inequities. To our knowledge, this is the first empirical evidence documenting these nuanced biases in nursing. The study also contributes a replicable framework for evaluating LLM-generated care plans. Finally, it underscores the critical need for robust human oversight to ensure that artificial intelligence serves as a tool for advancing equity rather than perpetuating disparities.