Omnicom Completes Acquisition of Interpublic, Forming the World’s Leading Marketing and Sales Company, Built for Intelligent Growth in the Next Era Omnicom Group
To Win, Omnicom Must Kill Its Darlings ADWEEK
Intended or not, the new Omnicom will forever change agencies as we’ve known them Digiday
IPG-Omnicom merger nears end; India leadership by Dec 2 | PUMA appoints Ramprasad Sridharan MD | Govt slams gaming firms in SC over PROGA Storyboard18
Omnicom set to complete Interpublic acquisition as EU approves deal IBC.org
ADEN, Yemen, Nov. 26 (Xinhua) — The United Arab Emirates (UAE) has pledged 1 billion U.S. dollars to support electricity and energy projects across war-ravaged Yemen, according to a report by the state-run Saba news agency on Wednesday.
The announcement was made following a meeting in Aden between Presidential Leadership Council chief Rashad Al-Alimi and UAE Ambassador Mohamed Hamad Al Zaabi, who reaffirmed Abu Dhabi’s intention to help restore Yemen’s battered power network.
Yemen has faced chronic electricity outages for more than two decades, with southern provinces like Aden experiencing blackouts that can stretch up to 12 hours a day. Damage to power plants, limited fuel supplies and fragmented authorities have left millions relying on private generators and small-scale solar systems.
The UAE’s new pledge coincided with the First National Energy Conference held in Aden, where government officials, investors, and renewable energy experts gathered to discuss long-term reforms aimed at stabilizing the grid and attracting financing to the country. ■
Dengue, known for intense flu-like symptoms, crushing fatigue and body aches, reached record global levels in 2024 and researchers have attributed its spread to climate…
Adolescents and young adults experience high rates of mental distress, with substance use and mood-related and anxiety disorders being among the most prevalent issues []. Significant mental distress triggered by the challenges encountered during this transitional stage in life, such as financial instability, interpersonal relationships, and career development [], has been implicated in adolescents and young adults’ decreased quality of life and increased suicide risk []. Adolescents and young adults also exhibit elevated rates of health-risky behaviors, such as poor dietary choices, inadequate sleep, and physical inactivity []. These behaviors are intricately linked with biological and psychosocial factors, including neurological changes, adverse childhood experiences, and peer pressure, which in turn exacerbate the incidence of chronic disease and mental distress among adolescents and young adults []. Despite these alarming trends, adolescents and young adults are less likely to seek health support, particularly for sensitive topics such as sexual and physical abuse, sexually transmitted infections and HIV, contraception methods, and substance use []. The majority of adolescents and young adult clinical patients reported unmet supportive care needs, with psychological needs being the most frequently cited, followed by needs of physical and daily living [,]. Moreover, traditional pediatric and adult interventions are predominantly disease-centric and often fail to address the nuanced, age-specific needs of adolescents and young adults []. Unlike children, whose parents typically make health care decisions on their behalf, or mature adults, who are expected to independently manage their appointments and treatments, adolescents and young adults occupy a transitional phase that shares characteristics with both groups but fully aligns with neither []. They have limited experience navigating health care systems or seeking external support, while simultaneously grappling with issues of identity, independence, and major life milestones []. These challenges highlight significant gaps in current promotive efforts targeting adolescents and young adults, which often struggle to provide effective, age-appropriate care due to workforce shortages and time constraints, underscoring the urgent need for tailored, flexible interventions that can address the complex and diverse health needs of this population [].
Chatbots are innovative digital tools that simulate conversations with users through a dialog interface, generating responses based on stored patterns []. Emerging evidence suggests that chatbots can effectively mitigate symptoms of mental health problems and encourage positive health behaviors [,]. For instance, studies have highlighted the efficacy of chatbot interventions in delivering cognitive-behavioral therapy, mindfulness-based practices, and motivational interviewing techniques for people with psychological distress and drug addiction [,]. Moreover, chatbots have also been shown to improve user adherence and satisfaction with treatment, which could be essential factors in achieving sustained long-term health outcomes [,]. Adolescents and young adults are particularly well-positioned to benefit from chatbots, given their favorable attitudes and openness to innovative health care solutions []. This population often experiences increased vulnerability related to identity formation, academic pressures, and relationship dynamics, while simultaneously possessing strong self-directed learning abilities and a preference for autonomy, making them more receptive to digital health solutions compared to children and older adults []. Autonomous chatbots hold a unique advantage by being perceived not only as easily accessible and nonjudgmental [], but also as capable of fostering a sense of peer support, which is a critical source of empowerment that provides invaluable information and psychological solace to adolescents and young adults [].
Existing reviews on the effectiveness of chatbots in health care have primarily focused on general populations, with limited focus on adolescents and young adults [,]. A recent randomized controlled trial (RCT) found that adolescents and young adult users often perceived the chatbot content as irrelevant or too generic, largely due to insufficient tailoring to personal needs []. Given the unique developmental, social, and technological contexts that characterize this demographic, it is necessary to systematically evaluate the evidence regarding chatbot interventions targeting adolescents and young adults. Moreover, the diversity in chatbot designs and targeted health outcomes requires a comprehensive synthesis to uncover limitations and highlight areas for future research within this population. Present studies often conflate chatbots with other types of conversational agents, such as voice-based virtual agents, embodied avatars, and social robots [,], overlooking the unique advantages of chatbots, particularly their ability to encourage adolescents and young adults to discuss sensitive topics anonymously without fear of judgment. This aspect is often less pronounced in interactions with avatars, robots, or conversations embedded in virtual reality, where social cues may inhibit open communication for those experiencing anxiety or discomfort in social situations []. The text-based nature of chatbots not only facilitates rapid information exchange but also allows users to read and review content repeatedly with unlimited, round-the-clock access. This feature enables users to process and reflect on information at their own pace and take positive actions, as it removes the pressure of maintaining a continuous dialog or responding in real time []. Furthermore, chatbots stand out for their accessibility and cost-effectiveness, as they can be deployed on commonly used platforms such as smartphones and tablets. This eliminates the need for expensive equipment or immersive environments, significantly enhancing their reach and usability and making them widely available to users across diverse socioeconomic backgrounds and settings [].
Generative artificial intelligence (AI) has brought chatbots like ChatGPT (OpenAI Inc) and Llama (Meta Inc) to the forefront of digital health innovation. These advanced systems, powered by natural language processing (NLP) and large language models, offer enhanced capabilities for processing complex information, enabling more human-like and adaptive responses to self-care needs []. Such flexibility better positions chatbots as promising tools, particularly beneficial for adolescents and young adults who may not proactively seek support from health care professionals or prefer to self-manage their health conditions. At present, there is no established gold standard for engineers to assess the development of chatbots and the quality of information they provide. There is also a lack of systematic evidence regarding their effectiveness for adolescents and young adults across various dialog systems (ie, rule-based, retrieval-based, or generative) and design features (eg, modalities, reminders, and frequency of sessions). These knowledge gaps must be addressed to effectively inform and guide future advancements in the field of chatbot development for health care applications for adolescents and young adults. This systematic review and meta-analysis aims to synthesize the evidence from randomized controlled trials (RCTs) to evaluate the effectiveness of AI chatbots in alleviating mental distress and promoting health-related behaviors among adolescents and young adults. Additionally, this study summarizes key design features of chatbots and examines how these characteristics may moderate intervention outcomes through subgroup analyses and meta-regression. User engagement and experiences with chatbot interactions are also explored and synthesized narratively. By addressing these objectives, the review seeks to provide valuable insights for the development and integration of innovative chatbot-based health care solutions, thereby supporting the enhancement of well-being among adolescents and young adults worldwide. The review questions are as follows:
What is the effectiveness of chatbots in alleviating mental distress and promoting health behaviors among adolescents and young adults?
What are the key design features of chatbots, and how do these features impact health outcomes in adolescents and young adults?
How do adolescents and young adults engage with chatbots, and what are their perceptions and experiences during these interactions?
Methods
Protocol Registration and Study Design
The review protocol was prospectively registered in PROSPERO (International Prospective Register of Systematic Reviews), CRD42024603472, and adhered to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 ().
Data Sources and Search Strategy
We conducted a systematic search across 8 databases (PubMed, PsycINFO, Cochrane Library, CINAHL, Embase, Web of Science, Scopus, and IEEE Xplore) using a wide array of search terms (Table S1 in ). Both subject headings (eg, Mesh and Emtree) and free-text keywords related to the core concepts, along with their synonyms and variants, were included. Additionally, the reference lists of previous reviews [,] and the included original studies were manually examined to identify any further eligible studies. The search covered all data from January 1, 2014 to January 26, 2025. This timeframe was selected because the chatbot powered by NLP and machine learning beyond simple rule-based systems began to have significant development and application in health care. This period also coincides with the widespread adoption of internet-connected mobile devices among adolescents and young adults, a group uniquely shaped by and deeply embedded in this digital landscape, ensuring that the evidence included is both technologically relevant and contextually appropriate to their experiences and behaviors. We fine-tuned our search strategy based on previous systematic reviews [,] to locate sources related to chatbots for alleviating mental distress or promoting health-related behaviors. The search was limited to English-language publications. After removing duplicates, 2 reviewers screened all titles and abstracts for eligibility independently. Subsequently, the full-text review was also performed by 2 reviewers, with any disagreements resolved through consultation with a third reviewer.
Eligibility Criteria
We developed our eligibility criteria based on the population, intervention, comparison, outcome, study design (PICOS) framework ():
Population: adolescents and young adults, typically characterized as individuals aged between 15 and 39 years [], in both clinical and nonclinical samples. Given varying definitions of adolescents and young adults by age and to ensure comprehensive inclusion of related studies, we included original research articles if over 50% of participants fell within the 15‐39 years age range, the average age of participants was within this range, or the study explicitly identified its population as “adolescents and young adults.”
Intervention: 2-way interactive chatbots designed primarily to alleviate mental distress or promote health behaviors. These chatbots should operate autonomously without human assistance and serve as the primary component of interventions irrespective of dialog initiatives, interaction modalities, platforms, and settings, but should not be embedded as secondary elements within other technologies, such as virtual reality, robots, or virtual avatars. They may have minor supplementary elements (eg, educational materials) or a simple graphical representation (eg, an icon or avatar), but their primary mode of interaction is through written dialog. Studies focused solely on the development or rationale of chatbot technology, without any empirical evaluation of user-chatbot interaction, were excluded.
Comparator: any control groups that did not involve chatbot technology, such as active controls (eg, treatment as usual), information controls (eg, e-book), and passive controls (eg, waitlist, assessment-only).
Outcome: eligible primary outcomes included mental health outcomes specified in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) [], as well as health behaviors, defined as actions taken by individuals that affect health or mortality, such as substance use, physical activity, and dietary habits []. Metrics related to user engagement with chatbots (eg, retention rates and frequency of interactions) and user experience (eg, satisfaction, acceptability, and usability) were also concluded when reported alongside primary outcomes.
Study design: RCTs. Studies were excluded if they were conference abstracts, preprints without peer review, or if the full text was unavailable. Publications that did not present original research findings, including editorials, letters, comments, trial registrations, and study protocols, were also excluded.
Table 1. Eligibility criteria (PICOS framework).
Category
Inclusion criteria
Exclusion criteria
Population
Studies were included if they were about adolescents and young adults, which could be shown by:
Over 50% of participants were within 15‐39 years
The average age was within 15‐39 years
The study explicitly identified its population as “adolescents and young adults.”
Studies that did not report any information about age groups
Intervention
2-way interactive chatbots:
With the aim of alleviating mental distress or promoting health behaviors
Operating autonomously without human assistance
Serving as the primary component of the intervention
Primary interaction is through written dialog
Chatbots embedded as secondary elements in other technologies (eg, VR, robots, and virtual avatars)
Studies focused solely on development or rationale without empirical evaluation of user interaction
Comparator
Active controls (eg, treatment as usual)
Information controls (eg, e-books)
Passive controls (eg, wait-list, assessment-only)
Control groups that involved another chatbot technology
Outcome
Primary outcomes:
Mental health outcomes specified in the DSM-5 []
Health behaviors (eg, substance use, physical activity, and dietary habits) []
Secondary outcomes:
User engagement (eg,retention rates, frequency of interactions)
User experience (eg,satisfaction, acceptability, and usability)
Studies that reported only on secondary metrics without any primary outcomes
Study design
Conference abstracts
Preprints without peer review
Unavailable full text
Nonoriginal research (eg, editorials, letters, trial registrations, and study protocols)
aPICOS: population, intervention, comparison, outcome, study design.
bVR: virtual reality.
cDSM-5: Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition.
dRCT: randomized controlled trial.
Data Extraction
We developed a comprehensive data extraction form on Microsoft Excel. The following data were extracted from all included studies: publication details (title, author, and year), study details (study design, region, and recruitment setting), participant characteristics (sample type, sample size, and demographics), chatbot intervention characteristics (name, duration, therapeutic approach, session, and safety measures), and chatbot design features (deployment, delivery platform, dialog system methods, AI technique, and interaction mode). For quantitative analysis, we extracted outcomes and their measures related to targeted conditions, including mental distress (eg, depressive, anxiety, and psychosomatic symptoms), health-related behaviors (eg, physical activity, dietary habits, and substance use). We also extracted and narratively synthesized data related to user engagement (eg, frequency of interactions, number of engaged sessions, and active days) and experience (eg, open-ended feedback, satisfaction, and perceived usability) with chatbots. The data extraction was processed by one reviewer, and then cross-checked by a second reviewer. Any disagreements between reviewers have been resolved through consensus with the involvement of a third reviewer.
Statistical Analysis
A comprehensive narrative synthesis was conducted to systematically summarize study characteristics, chatbot design features, user engagement metrics, and qualitative findings regarding user experience. This approach involved extracting and thematically analyzing relevant data from included studies to identify patterns, barriers, and facilitators of effective chatbot implementation. To assess the effectiveness of chatbot interventions, we conducted a meta-analysis on RCTs wherein participants were randomly assigned to an experimental group receiving a target chatbot intervention or to a control group. We conducted meta-analyses for overall mental distress and specific symptoms reported by at least 3 trials, including depression, anxiety, positive affect, negative affect, stress, and well-being. Given the focus of included studies spanned a wide range of health-related behaviors, we estimated pooled effect sizes for an overall behavioral health outcome, including sleep-related safety behaviors, stress management, mindfulness, cigarette abstinence, and pain coping. Additionally, general outcomes related to psychological and physical health, such as life satisfaction and self-efficacy, were analyzed as well.
The analyses were conducted using the Review Manager (RevMan; The Cochrane Collaboration) 5.4 [] and Stata MP 18 (StataCorp LLC) []. The standardized mean difference (SMD) with a 95% CI was used to compute the effect size of the continuous statistics as different measurement tools were used for the same outcomes across trials. To combine outcomes reported in continuous and categorical formats, odds ratios were transformed into SMD []. Heterogeneity among studies was assessed using the I² statistic and the Cochran Q statistic. The random effect model was used to account for moderate to high heterogeneity across studies. We calculated SMD using postintervention outcome data that provided means and SDs. When both intention-to-treat and completer analyses were reported, the former was prioritized for analysis. For studies with multiarm designs that included multiple experimental or control groups, we combined the means and SDs from the different arms to create a single pair-wise comparison, as suggested by the Cochrane guidelines for integrating multiple groups from a single study []. If a study did not report sufficient data (mean, SD, SE, 95% CI, and sample size) to calculate SMD, we contacted corresponding authors for missing data; studies lacking necessary data were excluded from meta-analysis. For sensitivity analysis, we used a “leave-one-out” method to identify influential studies and assess the robustness of estimates.
We conducted a series of subgroup analyses on the primary outcomes to explore potential moderators. Informed by prior research [], we examined three study characteristics (ie, control group types, intervention duration, and target sample), as well as four chatbot features (ie, dialog system methods, reminders, interaction mode, and deployment formats) as potential moderators of intervention effects. Specifically, we explored three types of control group (ie, active, information, and passive controls), considering that differences in the nature of participant engagement could influence observed effect sizes; intervention duration was examined as it may impact the sustainability of chatbot effects; the target sample (ie, clinical, subclinical, and nonclinical) was included to account for baseline differences in health status that could moderate intervention outcomes []. In addition, 3 primary dialog system methods for input processing and response generation were examined: rule-based, retrieval-based, and generative models []. Rule-based chatbots operate on a predefined set of rules, producing predictable responses that are inherently limited in scope. Retrieval-based chatbots select responses from a predefined database of possible answers, enabling some level of contextual understanding while remaining constrained by the availability of their resources. Generative chatbots learn patterns from large datasets and create new, dynamic content, offering greater flexibility to handle diverse and complex conversations []. Further, we classified chatbots as those with reminders or those without. Chatbot reminders can serve various functions, including login prompts, system greetings, and mood tracking notifications. For interaction modes, we differentiated between chatbots delivering text-only interactions and those incorporating multimedia materials, such as videos or images. Finally, for deployment, we categorized chatbots as either standalone apps or web-based tools, with the latter being integrated into instant messengers or accessed via websites. Additionally, meta-regression analyses were conducted for continuous variables (ie, gender) when there were at least 10 observations available []. Funnel plots and Egger test were used to explore publication bias for meta-analyses that involved more than 10 studies []. P<.05 was set as statistically significant.
Quality and Risk of Bias
The Cochrane risk of bias tool (ROB 2) was used to assess the risk of bias in the included RCTs. This assessment tool evaluates 5 domains of potential bias: randomization process, deviations from the intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result. For each domain, a trial can be categorized as having a low risk, some concerns, or a high risk of bias. For the overall risk-of-bias judgment, a trial was deemed to have a low risk of bias only if all domains were rated as low risk. Conversely, any trial was judged to have a high risk of bias if it scored high in any domain. We used GRADEpro GDT software (Evidence Prime, Inc) to evaluate the quality of evidence from meta-analyses, which could be reduced based on 5 key factors: risk of bias, inconsistency, indirectness, imprecision, and publication bias.
Results
Search Results
Searches of 8 databases identified 2495 unique citations (). After removing duplicates, we excluded 1113 records based on titles and abstract screening, resulting in 69 records for full-text review. We additionally included 3 eligible trials identified through reference lists of previous reviews and original studies. A total of 31 studies [-,,,-] met the inclusion criteria and were included in the systematic review for narrative synthesis. Among the 31 studies, 5 randomized trials [,,-] did not report sufficient data for calculating the pooled effect size; thus, 26 randomized trials were included for meta-analysis [,,,,-].
Figure 1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart. RCT: randomized controlled trial.
Results of Systematic Review
A total of 29,637 participants from 18 countries and regions were involved in 31 studies [-,,,-], recruited from clinical settings (n=4), community (n=10), online (n=10), and mixed settings (n=7). The majority (n=19) had sample sizes under 200 adolescents and young adults. Most were single-site studies, with 10 [-,,,,,,,] conducted in the United States, 5 in China [,,,,], and only one [] multisite study conducted in Switzerland, Germany, and Austria. Among the 31 studies, 12 involved nonclinical populations [,,-,,-,], 11 included participants with health problems via self-report or screening (eg, anxiety, depression, or substance use) [,,,,,-,,], and 8 studies involved clinical samples with diagnosed mental or physical health issues [,,,,,,,]. Eighteen studies explicitly demonstrated their research focus on adolescents and young adults [,,,,-,,,,,,], one of which focused on young cancer survivors [], and 4 studies exclusively supported women with specific circumstances, such as intimate partner violence, pregnancy, and childbirth [,,,]. Intervention duration varied considerably, from several minutes to 4 months, with 15 studies conducting additional follow-up surveys from 2 weeks to 6 months [,,,-,-,,-,,]. Table S2 in presents the characteristics of studies included in this review.
We extracted data on the characteristics of the chatbot intervention and their technical design features (Table S3 in ). These chatbots were most commonly designed to improve depressive and anxiety symptoms, which were assessed in 20 [,,,,,,-,-,,,-] and 19 studies [,,,,,-,-,,,-], respectively, followed by 7 studies targeting stress management [,,,,,,]. Specifically, several studies delivered psychotherapy or behavior support for people who experienced substance use and addiction (n=4) [,,,], self-ambivalence and appearance distress (n=3) [,,], attention-deficit or hyperactivity disorder (ADHD) (n=2) [,], sleep disorder (n=2) [,], relationship and social activity problems (n=2) [,], and eating disorder (n=1) []. Cognitive behavioral therapy was the most common therapeutic approach (n=21) [,,,,-,-,,,], followed by mindfulness-based therapy (n=9) [,,,-,,,], motivational interviewing (MI) (n=5) [,,,,], stress coping (n=4) [,,,], acceptance and commitment therapy (n=3) [,,], interpersonal psychotherapy (n=3) [,,], dialectical behavior therapy (n=3) [,,], positive psychology (n=2) [,], and emotion-focused therapy (n=2) [,]. In addition to the core treatment, other notable design features included empathic responses, customization, mood tracking, reflection, accountability, goal-setting, mascot or static avatars, gamified interaction, and problem-solving. Seven studies were tailored to address key challenges unique to adolescents and young adults, such as academic work management, life transitions, relationships [-], body image concerns [,], and self-esteem issues [,], which were particularly salient during this developmental stage.
Regarding the design characteristics of chatbots, instant messenger platforms (ie, Facebook [Meta Platforms], WeChat [Tencent Holdings Limited]) and standalone smartphone apps emerged as the most popular platforms for delivering chatbot services, featured in 15 [-,,,,,-,,,,] and 13 studies [,-,-,,,,,,], respectively. The remaining 3 studies deployed the chatbots on websites [,,]. Most of the chatbots provided periodical pop-up notifications to remind users to interact with chatbots (n=22). 21 studies integrated auditory or visual content based on text-based generation [,,,,-,,,,,,,]. Eighteen studies incorporated safety measures in chatbots, such as access to human professionals, a crisis hotline, suicidal ideation monitoring, and referral to local resources [,,,,,,-,-,,-]. The majority of chatbots (n=18) used a rule-based approach to interact with users [,,-,-,,,,,,-], while 10 studies used a retrieval-based system [-,,,,,,,]. Only 3 studies explored generative approaches for chatbot development, using Bidirectional Encoder Representation from Transformers (BERT) and GPT to create real-time responses [,,], and one study used GPT-3.5 to refine the chatbot following its pilot testing phase []. In terms of AI techniques, NLP was used in most studies (n=12) to analyze user intent and context, facilitating the selection of appropriate responses [,,,,,,,,,,,]. Additionally, some reports integrated other methodologies, including machine learning (n=7) [-,,,,], natural language understanding (n=5) [,,,,], and deep learning (n=3) [,,], to enhance the chatbots’ learning capacity and contextual comprehension.
Usage data and user engagement with chatbots were tracked in 23 studies through various metrics, including the frequency of interactions or exchanged messages (n=11) [,,,,,,,,,,], the number of engaged sessions or completion rates (n=9) [,,,,,,,,], the length of conversations (n=7) [,,,,,,], the number of active days (n=6) [,,,,,], the number of check-ins (n=3) [,,], and the time period for peak use (n=1) []. More than half of the studies (n=17) reported higher than 20% attrition in the intervention group [,,,,,-,,,,,,,,,]. Two studies analyzed the change in performance of user engagement over a time period [,]. Additionally, 24 studies explored user experiences, using metrics such as satisfaction (n=8) [,,,,,,,], helpfulness (n=5) [,,,,], working alliance (n=5) [,,,,], and acceptability (n=4) [,,,]. Open-ended user feedback was documented in 14 studies [-,,,,,,,,,,,], providing valuable insights into both the strengths and limitations of chatbot interactions. On the positive side, chatbots were frequently praised as effective tools for promoting understanding and awareness of health topics through structured exercises and detailed explanations (n=6) [,,,,,]. Users valued chatbots for their empathy, emotional support, and ability to foster a sense of being heard (n=6) [,,,,,]. Personalization and ease of access were commonly highlighted (n=4) [,,,] with chatbots regarded as a convenient alternative to traditional therapy []. Features such as reminders, weekly summaries, and visually engaging elements like emojis, avatars, and interactive interfaces enhanced the user experience, contributing to adherence and helping users stay on track with their health goals (n=3) [,,]. However, notable challenges were also identified, with repetitive and rigid interactions emerging as a major concern (n=10) [,,,,,,,,,]. Users expressed frustration over the inability of chatbots to handle open-ended or unexpected responses (n=6) [-,,,], and some conversations were criticized for being overly general or lacking depth and clarity (n=5) [,,,,]. Technical issues, such as glitches, looping conversations, and slow operations, were frequently reported (n=7) [,,,,,,], disrupting the interaction flow and significantly diminishing overall usability.
Of the 31 studies, only one study reported mediators between chatbot interventions and outcomes, in which visceral anxiety, catastrophic thinking, and fear of food were observed to be significant mediators between chatbot use and gastrointestinal symptom severity (P<.001) and quality of life (P<.001) []. For moderators, one study revealed significant interaction effects of group by ethnicity and by writing behaviors for social activity, stress, and life satisfaction []. Two studies noted that people with more severe baseline physical and mental health symptoms experienced more pronounced benefits of chatbots [,]. Four studies probed the moderating role of user engagement. Specifically, the frequency or the number of times of interaction with the chatbot was positively correlated with the reduction in ADHD symptoms (P=.03) [] and loneliness (P<.006) []. The dosage, measured as engaged sessions, was correlated with improvement in anxiety (P=.06) [], and depression (P=.08), quality of life (P=.07) []. Another study revealed that the reported commitment to change behavior significantly increased with time (P<.001), suggesting higher commitment toward the end of the intervention than in the middle or at the start [].
Results of Meta-Analysis
Overall Mental Distress
A total of 21 studies, comprising 2813 participants in the experimental groups and 3116 in the control groups, were included in the meta-analysis for the overall mental distress. Among these, indicators for anxiety (n=18) [,,,,-,-,,,-] and depression (n=17) [,,,,-,-,,-] were most commonly examined, and the remaining assessments included somatic symptoms (n=3) [,,], sleep disorders (n=2) [,], ADHD (n=2) [,], substance use disorders (n=2) [,], and eating disorders (n=1) []. Compared to control conditions, participants interacting with chatbots exhibited significantly greater reductions in the overall mental distress, with an effect size of SMD −0.35 (95% CI −0.46 to −0.24; P<.001) (). The “leave-one-out” sensitivity analysis demonstrated the robustness of the findings, with estimated effect sizes ranging from −0.30 to −0.36 (Figure S11 in ). The results of the funnel plot and Egger test revealed potential publication bias (P=.01), while no additional studies were imputed with the Trim-and-Fill approach and the adjusted effect size (SMD −0.372, 95% CI −0.529 to −0.216) was identical to the observed value, suggesting a negligible impact on the conclusions. The subgroup analyses revealed 4 significant moderators. Studies that targeted subclinical and clinical samples produced larger effect sizes than those for nonclinical populations (P=.003). Chatbots deployed as standalone apps were significantly more effective than those delivered via instant messenger or websites (P=.03). Among different chatbot architectures, generative chatbots demonstrated the largest effect size, followed by retrieval-based and rule-based systems (P=.007). Interestingly, studies comparing chatbots to active control did not show significant group differences, and their pooled effect was significantly lower than those comparing chatbots to information and passive controls (P=.02). The detailed results of subgroup analysis are presented in Table S4 in .
Figure 2. Forest plot for the effects of chatbots on overall mental distress. [,,,,,-,-,,-]
Depression
The pooled effect size for the 17 postintervention comparisons between chatbots and various control conditions on depression was (SMD −0.43, 95% CI: −0.62 to −0.23; P<.001), with high heterogeneity (P<.001; I2=81%) (Figure S1 in ). The sensitivity analysis demonstrated the robustness of the findings, with estimated effect sizes ranging from −0.34 to −0.47 (Figure S11 in ). The results of the funnel plot and Egger test revealed potential publication bias (P=.02), while no additional studies were imputed with the Trim-and-Fill approach and the adjusted effect size (SMD −0.44, 95% CI −0.66 to −0.21) was identical to the observed value, suggesting a negligible impact on the conclusions. Subgroup analyses revealed a significant difference between dialog system methods (P=.03). Specifically, retrieval-based chatbots demonstrated the strongest and most reliable effect, followed by rule-based chatbots with a smaller but significant effect (P<.001). Generative chatbots, while showing a potentially large effect, exhibited a wide CI and failed to reach statistical significance (Table S4 in ).
Anxiety
A total of 18 studies were included for the effects on anxiety [,,,,-,-,,,-]. Compared to the control groups, participants interacting with chatbots exhibited a significantly greater reduction in anxiety, with an effect size of SMD −0.37 (95% CI −0.58 to −0.17; P<.001) ( Figure S2 in ). The heterogeneity was considerably high across included trials (P<.001; I2=87%). The sensitivity analysis revealed a stable pooled effect size ranging from −0.35 to −0.41 and remaining statistically significant when an influential study was excluded [] (Figure S11 in ). There is no significant publication bias as supported by the funnel plot and Egger test (P=.18). The subgroup analyses highlighted significant differences in chatbot effectiveness between deployment formats (P=.05). Specifically, standalone chatbots produced higher between-group effects on anxiety compared to those delivered via instant messenger or website (Table S4 in ).
Positive Affect
There is no statistically significant effect of chatbot interventions observed on positive affect compared to controls (SMD 0.03, 95% CI: −0.15 to 0.21; P=.73), with substantial heterogeneity across 11 studies (P=.002; I²=63%) (Figure S3 in ). The pooled effect sizes remained relatively stable with confidence intervals consistently crossing the null value after sequentially omitting each study (Figure S11 in ). The funnel plot showed a symmetrical pattern with data points scattered evenly around the pooled effect size, suggesting the absence of marked small-study effects, which was further confirmed by the Egger test (P=.55).
Negative Affect
A small but statistically significant decrease in negative affect among participants who used chatbots compared to controls (SMD −0.27, 95% CI=−0.53 to −0.01; P=.04) was observed among 11 studies (Figure S4 in ). All estimated effect sizes yielded from sensitivity analysis consistently fell within the 95% CI, ranging from −0.26 to −0.31 (Figure S11 in ). The heterogeneity significantly decreased from an I2 value of 83% (P<.001) to 0% (P=.84) when we excluded the study by Romanovskyi et al [], though the overall effect remained significant. The funnel plot was visually symmetrical, and the Egger test for small-study effects did not detect significant publication bias (P=.39).
Stress
Participants engaging with chatbots demonstrated a significantly greater reduction in stress compared to various control conditions, with a moderate effect size (SMD −0.41, 95% CI: −0.50 to −0.31; P<.001) (Figure S5 in ). No heterogeneity (I2=0%; P=.54) was observed across 6 included studies, indicating that the effects of chatbots on stress were consistent and generalizable across studies with differing characteristics. The sensitivity analysis further confirmed the robustness of the findings, with estimated effect sizes ranging from −0.40 to −0.56 (Figure S11 in ). Specifically, when we excluded the study by Haug et al [], a slightly larger effect size estimate (SMD −0.56, 95% CI −0.76 to −0.36) was observed. This deviation may be attributed to the inappropriate use of a single-item measure for stress symptoms and a considerably larger sample size compared to other trials. Nevertheless, the overall effect remained statistically significant even when the influential study was excluded.
Psychosomatic Symptoms
Five studies assessed psychosomatic symptoms influenced by chatbot interventions, resulting in a significantly larger reduction in various symptoms compared to control groups (SMD −0.48, 95% CI −0.82 to −0.14; P=.006) (Figure 6 in ). The sensitivity analysis indicated the robustness of the findings, with estimated effect sizes ranging from −0.36 to −0.49 (Figure S11 in ). The heterogeneity among included studies was considerable (P=.002; I²=76%), but significantly decreased (P=.20; I²=35%) after we excluded the study by Sabour et al [] while the overall effect remained the same direction and significance. Subgroup analyses revealed three significant moderators. Specifically, studies that targeted clinical samples showed a greater decrease in psychosomatic symptoms than those focusing on subclinical and nonclinical samples (P=.008). Chatbots deployed as standalone apps yielded significantly greater effects than web-based platforms (P=.002). Additionally, retrieval-based systems showed the largest effects, outperforming both generative and rule-based chatbots (P=.001) (Table S4 in ). However, these results should be interpreted with caution due to the limited number of studies available for each subgroup.
Self-Ambivalence and Appearance Distress
Four distinct measures targeted negative self-relevant thoughts and body image were included for evaluating the influence of various interventions on self-ambivalence and appearance distress in this analysis. A significant positive effect favoring chatbots was observed compared to passive control groups (SMD -0.25, 95% CI −0.34 to −0.17; P<.001), with moderate heterogeneity across studies (P=.19; I²=38%) (Figure S7 in ). The pooled estimates remained statistically significant, with the overall effect size ranging from −0.20 to −0.31 and within comparable confidence intervals (Figure S11 in ).
Life Satisfaction and Well-Being
Ten relevant outcomes from 7 separate trials were meta-analyzed for the overall life satisfaction and well-being. A significantly greater improvement for participants in the chatbot groups was observed than those in controls (SMD 0.12, 95% CI 0.03-0.21; P=.01), with moderate heterogeneity detected across 7 trials (P=.06; I²=44%) (Figure S8 in ). The sensitivity analysis suggested the robustness of the findings, with the overall effect sizes ranging from 0.07 to 0.13 ( Figure S11 in ). However, when we excluded two influential studies [,], the 95% CI crossed the null value, while the direction maintained the same. The absence of publication bias was evidenced by the funnel plot and Egger test (P=.76). Subgroup analyses revealed a significant difference in effects between dialog systems (P=.04) (Table S4 in ). Moreover, meta-regression analysis revealed statistical effects of gender (P=.02) on the pooled effect size (Figure S12 in ).
Self-Efficacy
Six trials were included in the meta-analysis to evaluate the pooled effect of chatbot interventions on self-efficacy outcomes, resulting in a positive trend effect favoring the experimental group but no statistically significant difference obtained (SMD 0.14, 95% CI −0.14 to 0.41; P=.33) (Figure S9 in ). Considerably high heterogeneity was observed across the included studies (P<.01; I²=86%), which may be attributed to differences in specific measurement targets, encompassing general self-efficacy, self-efficacy in addressing body image concerns, and confidence in self-management for health and well-being. The results of the sensitivity analysis showed that the overall effect remained stable, with SMD estimates ranging from 0.10 to 0.26, and the pooled effect remaining statistically nonsignificant when individual studies were excluded (Figure S11 in ).
Health Behavior Change
Nine health behavior outcomes from 6 separate trials were included for the meta-analysis, revealing a statistically significant effect in favor of chatbot interventions (SMD 0.11, 95% CI 0.03-0.19; P=.006) (). Moderate heterogeneity among studies was observed among studies (P=.06; I²=46%), potentially attributed to the wide spectrum of health behaviors we targeted. Sensitive analyses demonstrated the robustness of this result, with estimates ranging from 0.09 to 0.14 (Figure S12 in ). Notably, the omission of 2 specific outcomes [,] resulted in a slight increase in the combined effect size and significantly decreased the heterogeneity. The symmetric funnel plot and Egger test (P=.43) indicated a low likelihood of publication bias. Studies designed with active controls produced less between-group effects than those compared to a passive control group (P=.02). Additionally, chatbots that sent check-in reminders produced more positive effects on changing behaviors than those that did not (P=.02) ( Table S4 in ).
Figure 3. Forest plot for the effects of chatbots on health behavior change. [,,,,,]
Quality and Risk of Bias
The interrater reliability, as measured by Cohen kappa, ranged from 0.471 to 0.523 across 5 domains of the Cochrane ROB 2 tool, indicating moderate agreement between the raters. For any discrepancies identified between raters, discussions were held to achieve consensus; if consensus could not be reached, a third reviewer was consulted to make the final decision. The overall risk of bias was rated as high for 25 studies (Figure S13 in ). The majority of studies (26/31) demonstrated appropriate randomization procedures and were rated as low risk in the domain of randomization process. However, 5 studies raised concerns due to insufficient reporting on the random allocation approach or observed imbalances in baseline characteristics between groups. For the domain of deviation from the intended interventions, no studies exhibited significant deviations from the intended interventions, though neither participants nor those delivering the interventions could be blinded due to the nature of the intervention. 19 studies adhered to the ITT principle. However, 8 studies were judged to raise some concerns in this domain due to the absence of appropriate analyses to estimate the effect of assignment to the intervention. Additionally, 7 studies were rated as high risk because a substantial proportion of participants were excluded from the analyses, which could have significantly impacted the validity of the results. 12 studies were judged to have a low risk in the domain of missing outcome data, while 14 were rated as high risk due to imbalanced drop-out rates between groups and lack of evidence that appropriate methods were used to address the potential bias introduced by high attrition. The primary reason for the notable source of bias arising from the measurement of the outcome was the reliance on self-reported outcomes as the preferred method in most trials, where 16 studies were rated as high risk because self-reported measures are inherently prone to biases, and the strong level of belief in the beneficial effects of the intervention could influence outcome assessments. In the selection of the reported result domain, 12 studies raised some concerns due to the unavailability of their protocols or trial registrations, or minor discrepancies between the planned and reported outcome measurements. Furthermore, 2 studies were judged to have a high risk as their reported results were likely selected from multiple eligible measures or analyses, raising concerns about selective reporting. The quality of evidence, evaluated using the GRADE approach, was rated as very low to low, possibly due to the overall high risk of bias or substantial heterogeneity across the majority of studies (Table S5 in ).
Discussion
Principal Findings
In this systematic review and meta-analysis, we synthesized evidence on the effectiveness of chatbots for adolescents and young adults and found overall significant positive effects in alleviating mental distress and promoting health behavior change. The most pronounced effects were observed in studies that compared chatbot interventions to information controls, used standalone mobile apps for deployment, used generative or retrieval-based chatbots, or targeted individuals in subclinical and clinical groups. Additionally, chatbots with reminders that encourage users to engage in interactions have been more effective in promoting behavior change. Moreover, user engagement was a significant moderator influencing chatbot effectiveness, while repetitiveness and inflexibility of content emerged as the most common barriers to retain chatbot adherence. Despite the proposed advantages of chatbots as accessible, cost-effective treatment alternatives, none of the studies included in this review conducted cost-effectiveness analyses or focused on low-resource settings.
Across the included studies, chatbots consistently demonstrated small-to-moderate effects in reducing symptoms of depression, anxiety, negative affect, stress, and psychosomatic problems among adolescents and young adults. These findings reinforce prior evidence, underscoring the promise of chatbots as scalable and accessible tools to address specific mental health challenges in this population []. Notably, retrieval-based chatbots demonstrated a consistent moderate effect in reducing depressive and psychosomatic symptoms, suggesting that the structured and evidence-based design may offer a more reliable and effective approach to delivering mental health support. In contrast, the comparatively modest effects observed with rule-based chatbots may stem from their inherent limitations in flexibility and reliance on predefined scripts. While rule-based systems can be effective in specific scenarios, their rigid architecture often restricts their ability to adapt to the diverse and dynamic needs of individuals with mental health problems. Generative chatbots, despite showing the strongest effects for overall mental distress, did not demonstrate consistent effects for specific mental health problems, which may be attributed to the limited available evidence. This uncertainty highlights the need for further research to better understand the potential and the limitations of generative chatbots applied in this context. Additionally, our analysis indicated that chatbots were more effective for psychosomatic symptoms in clinical populations compared to nonclinical groups, which aligns with the notable trend across studies that individuals with more severe baseline symptoms tended to derive greater benefits from interventions [,]. Moreover, the larger effect size observed for standalone chatbots in alleviating anxiety, compared to web-based ones, indicates that the deployment format may play a crucial role in influencing the effectiveness of chatbots. This may be attributed to the personalized and engaging design of the independent system, allowing for a more focused therapeutic engagement with less interruption, as opposed to chatbots integrated into instant messenger apps or websites that may cause more distractions. In addition, our review is among the first to provide valuable evidence supporting the effectiveness of chatbots in reducing self-ambivalence and appearance distress. While the effect size was modest, this finding is particularly significant for adolescents and young adults, who frequently grapple with issues related to identity, self-esteem, and body image. This highlights the potential of chatbots to address sensitive and deeply personal concerns that individuals may find difficult or shameful to discuss with human professionals. The ability of chatbots to offer a nonjudgmental and accessible platform for support is crucial in this context. However, it is important to note that this synthesized result was derived from four different measures, requiring the need for further research to explore subgroup analyses to provide deeper insights into the specific contexts and conditions under which chatbots are most effective.
A significant but small effect was observed for life satisfaction and well-being, while no statistically significant improvement was noted for positive affect and self-efficacy. These findings align with the result of a previous review [], which reported limited impacts of conversational agents on fostering positive psychological well-being. This phenomenon may reflect a ceiling effect in certain populations or could be attributed to the primary focus of most therapeutic strategies, which tend to prioritize addressing mental health problems over promoting well-being, resilience, and recovery. This underscores the need for future chatbot designs that incorporate elements based on positive psychology skills, such as acknowledgment of positive events, personal strengths, and gratitude exercises. Moreover, such positive states may require longer-term or more intensive therapeutic sessions to yield measurable improvements. However, insufficient follow-up data for these outcomes can be accessed for validating our assumptions. Furthermore, our findings revealed that studies with a higher proportion of women reported greater improvements in overall well-being. This draws new attention to the possibility that the effectiveness of chatbots may be influenced by gender-related factors, such as differences in communication styles or help-seeking behaviors, with women potentially being more inclined to seek support for mental health issues or to engage in emotional disclosure that may align more closely with the empathetic design of many chatbots []. However, it is notable that no study in our review explicitly examined gender differences in user engagement or interaction patterns with chatbots. Two studies [,] used Linguistic Inquiry and Word Count (LIWC) to analyze participants’ response transcripts. While indicating a potential relationship between word use frequency and mental well-being, these studies did not identify gender-based differences in expression characteristics. Further research is warranted to explore whether women exhibit stronger adherence to chatbots, or different interaction styles (ie, use of reflective language), and whether these factors serve as mechanisms for boosting therapeutic outcomes.
The effectiveness of chatbots in health behavior changes, though significant, remains relatively small, which aligns with a previous review []. Several factors may account for this observation. First, the limited statistical power resulting from the small number of trials (n=5) included may have constrained the ability to detect larger effects. The use of chatbots to encourage physical activities and healthy lifestyles within adolescents and young adults is markedly underreported, remaining a vast scope for further research to evaluate their impact on promoting sustained behavior change. Second, the reliance on self-reported measures introduces inherent biases and inaccuracies, which may compromise the validity of the observed findings. To address this issue, incorporating objective data collection methods, such as wearable devices or biological markers, could enhance the precision and reliability of outcome measurements and provide more robust evidence for behavior change. Third, differences in the theoretical underpinnings used across studies to drive behavioral change could have elicited diverse responses to chatbot interventions. However, due to the small number of original studies included, we are unable to further disentangle these nuanced effects on specific types of health behaviors. Moreover, our analysis revealed that studies using active controls reported smaller effects for chatbots compared to those using passive controls. This suggests that while chatbots may offer unique advantages, their incremental value may be less pronounced when benchmarked against well-established interventions. It is imperative for forthcoming studies to determine whether the chatbot interventions yield greater benefits when integrated as complementary tools rather than being standalone. In addition, regular check-in reminders from chatbots may serve as effective cues to action, reinforcing user engagement and adherence to desired behaviors. Further research is warranted to explore the extent to which the frequency and timing of reminders impact their efficacy.
The diversity in chatbot evaluation methods suggests a critical gap and calls for exploratory research to develop professionally validated instruments for assessing chatbot accuracy, safety, and user experience. The notable attrition rates observed in both groups, coupled with unsatisfactory completion of chatbot sessions, underscore the pressing need to optimize future research design to enhance user engagement and facilitate a more positive experience. To this end, it is imperative to involve adolescents and young adult participants in the chatbot design process, such as surveys, interviews, and user testing, ensuring that the intervention features align with their preferences, expectations, and behavioral patterns []. Additionally, optimizing the chatbot’s performance and designing a clear, user-friendly conversational interface are crucial to ensuring a satisfying user experience that promotes sustained engagement. Moreover, generative AI systems present significant opportunities in this regard, with the potential to achieve more flexibility, deeper contextual understanding, and superior response quality, which have demonstrated remarkable user engagement globally []. Notably, generative AI chatbots can respond adaptively to unexpected user inputs, even those not previously encountered, and avoid repetitive responses to varied queries, fostering more human-like dialogs that enhance users’ sense of being understood and empathized with. Despite these advancements, the application of chatbots in the domains of psychological and physical health remains cautious. Most therapeutic chatbots currently rely on rule-based or retrieval-based designs. This limitation is primarily due to concerns about the insecurity, potential biases, and “hallucination” of AI-generated content when addressing sensitive issues, which could lead to unintended negative consequences []. The “black box” nature of deep learning algorithms makes it impossible to predict conversational trajectories in advance []. Retrieval-augmented generation (RAG) offers a promising solution by connecting generative models with real-time information retrieval from external knowledge bases. This approach facilitates secure incorporation of up-to-date information and sensitive data while reducing the likelihood of hallucination and improving the accuracy through context grounding []. Graph-based RAG (GraphRAG) demonstrates significant potential for extracting holistic insights from lengthy documents by structuring RAG data into graphs. This enhances the capabilities of large language models to produce evidence-based medical responses, thereby increasing safety and reliability when managing private medical data []. Given the unique risks faced by adolescents and young adults, such as disclosure of self-harm intent to chatbots, or the reinforcement of harmful thought patterns by algorithms, it is crucial that research efforts should prioritize the establishment of clear safety protocols and robust evaluation frameworks to ensure their ethical and responsible deployment [].
Limitations
While our findings break new ground in exploring the influence of chatbot dynamics on holistic psychosocial well-being, specifically within adolescents and young adult populations, the conclusions are somewhat constrained by several limitations. First, the inclusion of studies with populations that were not exclusively adolescents and young adults but had a mean age within an eligible age range, though necessary to ensure comprehensive coverage of relevant evidence, may have introduced potential variability in contextual factors that may compromise the findings. Second, although the incorporation of diverse participant demographics enhances the ecological validity of the results, the lack of strict clinical thresholds for mental distress at baseline in some studies may dilute the observed intervention effects for clinically significant cohorts. Third, while examining a broad array of outcomes provides valuable insights into the potential of chatbots in health care, the variation in measurement instruments across studies for the same outcomes, as well as the combination of different health behaviors into a single aggregated outcome, may introduce substantial heterogeneity and obscure important distinctions between specific behaviors. Furthermore, due to the limited number of studies with follow-up data on the same outcomes and the wide variability in follow-up durations, it was not feasible to conduct a meta-analysis assessing sustained impacts. Crucially, the majority of included studies were assessed as having a high risk of bias, which may result in misestimation of effect sizes. Consequently, the certainty of evidence for most outcomes was rated as very low to low, substantially restricting both the generalizability and reliability of the observed effects. Moreover, while the adjusted effect sizes for overall mental distress and depressive outcomes appear robust to publication bias, the potential for unpublished negative or inconclusive studies suggests that the true effect of AI chatbots may be smaller than reported. Therefore, the conclusions drawn from this review should be interpreted with considerable caution. Finally, despite the rapid proliferation of generative AI, this review underscores a critical gap in empirical research evaluating their specific impacts among adolescents and young adult populations, which also hindered our ability to provide evidence on the effects of the specific mechanisms of generative models on therapeutic outcomes. The clinical effectiveness of generative AI chatbots in mental and behavioral health remains unknown. Future studies are expected to implement large-scale, long-term interventions with rigorous designs to fully understand the benefits and advantages of chatbots integrated with generative systems.
Conclusions
This study provides evidence supporting the overall effectiveness of chatbots in alleviating mental distress and promoting positive health behaviors among adolescents and young adults. The effectiveness of chatbots varied across different target samples and control conditions, and three key design features were identified as significant moderators of chatbot efficacy: dialog system methods, deployment format, and the use of reminders. Among the dialog systems, retrieval chatbots demonstrated the most consistent and reliable effects, while generative AI chatbots showed potential but exhibited variability in their effectiveness. Given the growing use of generative AI, it is crucial to establish robust safety protocols and evaluation frameworks before their implementation in real-world settings. Future research should focus on validating the long-term effects and consistency of generative AI chatbots while exploring their broader applications in mental health and behavioral interventions for adolescents and young adults.
The authors would like to thank Shaowei Guan and John Law for their expert insights and guidance on the identifications of key chatbot design features.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The research was conducted in the JC STEM Lab of Digital Oncology Care Enhancement (DOCE) funded by The Hong Kong Jockey Club Charities Trust.
The datasets analyzed during this study are available from the corresponding author on reasonable request.
Edited by Amy Schwartz; submitted 30.Jun.2025; peer-reviewed by Kimberly Kaphingst, KittisaK Jermsittiparsert; final revised version received 24.Sep.2025; accepted 16.Oct.2025; published 26.Nov.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
The global proliferation of digital health technologies (DHTs), ranging from telemedicine to artificial intelligence (AI)-driven diagnostics, has reshaped health care delivery []. These innovations offer significant potential to address global health system challenges by improving service coverage, health care efficiency, and the quality of health care practices and services [,]. Within this global context, China has actively promoted DHT adoption through its “Healthy China 2030” initiative, which specifically aims to develop interoperable health data platforms, facilitate cross-sector medical collaboration, and reduce urban-rural health care disparities []. However, despite these advancements, the adoption and usage of DHTs among physicians remain uneven, influenced by a complex interplay of factors []. At the organizational level, existing research has established that institutional support systems (eg, training and technical assistance) and conducive regulatory environments are critical contextual facilitators of DHT adoption []. Conversely, growing evidence underscores that individual cognitive factors may be even more pivotal in shaping physicians’ decisions—such as perceived usefulness and ease of use, self-efficacy in using DHTs, and deeply held mental models about clinical workflows. Nevertheless, the field lacks robust evidence to explain how these cognitive mechanisms account for the substantial variations observed in physicians’ DHT adoption patterns, particularly across different clinical contexts and implementation stages [,]. These variations appear to originate from both methodological differences in how studies measure technology acceptance and unaddressed heterogeneity among physician populations, particularly across different medical specialties and practice settings. This study addresses this gap by applying latent profile analysis (LPA) to identify distinct subgroups of physicians based on their personal evaluations of DHT adoption. Given the central role of physicians in the digital transformation of health care, understanding their perspectives is essential for ensuring the successful implementation and widespread adoption of these technologies.
DHT Adoption Landscape
The term “digital health,” which evolved from “eHealth,” refers to the application of information and communication technologies to support health care and health-related fields. More recently, “digital health” has been introduced as a broader concept encompassing eHealth (including mobile health) and emerging fields such as the application of advanced computing sciences in data, genomics, and AI []. The adoption of DHT services to support patient care has grown significantly in health care institutions worldwide. Driven by the increasing prevalence of mobile phones and the widespread availability of preventive health and fitness applications, DHT and eHealth are playing an increasingly important role in enhancing medical workflows []. However, while digital health solutions are increasingly popular with the public, implementation faces hurdles in clinical settings. A central challenge is the lack of systematic frameworks to rigorously evaluate both benefits and risks. This evaluation gap contributes to professional hesitancy among health care providers and institutions, limiting user engagement and contributing to differences in technology uptake across care settings []. Recent literature confirms that DHT adoption rates exhibit significant variation across different service types, clinical specialties, and patient subgroups []. Moreover, the underusage of DHT poses considerable difficulties for modern health care systems. Hospitals experience decreased operational efficiency, reduced care quality, and financial strain due to factors such as patient attrition and restricted insurance reimbursements []. In turn, patients’ limited access to DHT may lead to suboptimal care, including extended waiting times, which further widens existing health disparities []. Therefore, effectively addressing these DHT adoption challenges is essential for promoting sustainable, equitable, and patient-centered health care delivery in the future.
Determinants of Uneven DHT Adoption
The heterogeneous adoption patterns of DHTs stem from a dynamic interaction between enabling factors and systemic barriers. When DHTs demonstrate measurable clinical effectiveness, health care providers are more likely to recognize their potential for enhancing work efficiency and patient outcomes, thereby developing favorable attitudes toward technology adoption. This positive perception creates a virtuous cycle that may ultimately improve clinical performance []. Conversely, inadequate integration of DHTs with existing clinical workflows often generates resistance among health care professionals, potentially undermining implementation efforts [].
Current evidence frames DHT adoption through a tripartite model integrating: (1) individual factors (eg, perceived utility vs digital literacy gaps); (2) organizational and environmental factors (eg, supportive policies vs financial constraints); and (3) technological factors (eg, interoperability vs security risks) []. Among physicians, adoption barriers are particularly multifaceted, spanning cognitive (eg, technophobia), attitudinal (eg, skepticism toward clinical efficacy), and experiential domains (eg, limited previous exposure). Resistance often stems from perceived workflow disruptions, eroded patient-provider dynamics, or mismatches between technology design and clinical needs. Conversely, demonstrable efficiency gains, user-friendly interfaces, and alignment with professional norms foster acceptance. Critically, adoption patterns reflect an interplay of these dimensions; for instance, even robust technology may fail if organizational support (eg, training) is lacking [,]. Tailored strategies addressing domain-specific barriers (eg, pilot programs for technophobic clinicians and interoperable tools for fragmented systems) are essential to bridge gaps between policy goals and real-world implementation [].
The Unified Theory of Acceptance and Use of Technology 2 (UTAUT 2) has been effectively applied across international contexts, including Germany and the United States, to examine DHT adoption. Studies based on this framework, which often incorporate constructs such as perceived security and relative advantage and use age-stratified sampling, consistently identify performance expectancy and hedonic motivation as key drivers of usage intention. These studies also highlight security concerns as a major barrier []. Further research on German mobile health apps revealed the predominant influence of hedonic motivation over utilitarian factors, with contextual variations observed between lifestyle and therapeutic apps []. Collectively, these findings underscore the adaptability of UTAUT 2 across diverse health care technologies and cultural settings, particularly when incorporating domain-specific variables. However, research based on UTAUT 2 remains largely confined to conventional methods such as subgroup analyses and clustering approaches, which rely on variable-centered techniques such as moderation analysis or predefined demographic comparisons. These methodological constraints may limit the ability to capture clinically meaningful, person-oriented adoption profiles []. Realizing the full generalizability of DHT adoption models requires not only careful consideration of user and provider heterogeneity, along with further validation across diverse populations, but also the adoption of more nuanced, person-centered analytical frameworks. A comprehensive understanding of physicians’ adoption behaviors demands a multidimensional perspective that simultaneously assesses perceptions of utility, risks, barriers, and usage intentions, ultimately moving beyond structural models toward person-centered approaches.
Despite physicians’ pivotal role as clinical decision-makers and primary end users of DHTs, current research predominantly centers on citizen [] and patient perspectives [,], or on technical feasibility [], leaving a significant gap regarding health care professionals’ perceptions and experiences. Few studies have specifically targeted the evaluation of the creation, implementation, long-term use, and self-reported barriers and facilitators to DHT use by health care professionals []. Moreover, the majority of existing studies, including those using established theoretical frameworks such as the technology acceptance model [] and the UTAUT model [], rely predominantly on variable-centered approaches. These approaches focus on the relationship between DHT or eHealth service implementation and various factors across the overall sample. From this perspective, most previous studies—including those using UTAUT 2—focus on aggregate relationships and isolated moderators, thereby overlooking systematic heterogeneity within physician populations. Such constraints ultimately diminish their capacity to explain actual usage patterns within complex health care environments. More critically, such variable-centered methods inherently assume population homogeneity and thus obscure meaningful heterogeneity across distinct user subgroups, leading to an inadequate characterization of clinically relevant adoption patterns and context-specific barriers. This gap is especially pronounced in the Chinese context, where rapid, policy-driven digital health transformation may have generated unique adoption profiles not captured by conventional approaches.
Study Rationale and Objectives
To address these limitations, this study introduces LPA as a novel, person-centered methodological framework for investigating physician adoption of DHTs. LPA is a probabilistic modeling technique that identifies naturally occurring subgroups within multidimensional data based on shared response patterns []. This method is particularly valuable for capturing heterogeneity and identifying nuanced profiles of technology acceptance that remain concealed in variable-level analyses [,]. In contrast to previous variable-centered studies, LPA enables (1) the identification of clinically meaningful subgroups characterized by distinct configurations of perceptions across benefits, barriers, and behavioral intentions; (2) the examination of multilevel predictors of subgroup membership; and (3) the development of tailored implementation strategies aligned with the specific needs of different physician populations. Given physicians’ pivotal role in health care’s digital transformation, these insights are critical for developing targeted interventions that move beyond one-size-fits-all adoption strategies to account for the nuanced needs and perceptions of different clinician subgroups [].
Therefore, this study is designed to achieve 2 key objectives. First, it aims to classify Chinese physicians’ DHT preferences using LPA to identify heterogeneous subgroups based on a 3D evaluation framework. Second, it seeks to investigate how demographic and occupational factors correlate with profile membership. By transcending aggregate-level insights, this approach offers a more nuanced and clinically relevant understanding of DHT adoption behaviors. As DHTs become increasingly prevalent, the findings are poised to inform tailored interventions that address implementation barriers, especially among hesitant health care professionals. Furthermore, this research provides actionable recommendations for policymakers, health authorities, medical institutions, and insurers to support the design of context-sensitive DHT adoption strategies that enhance physician engagement and ultimately improve health care delivery.
Methods
Study Design and Data Sources
With the approval of the Shaanxi Provincial Health Commission and authorization from the Xi’an Municipal Health Commission, we undertook a cross-sectional investigation across health care facilities in Xi’an, Shaanxi Province, China. This investigation, conducted from October 18 to December 23, 2023, was a crucial part of the “2023 Healthcare Worker Survey” and the broader 7th Xi’an Health Services Survey. The survey aimed to evaluate medical staff’s practice status, working conditions, and health to inform local health policy and management. It has also been used in previous studies on health care professionals’ well-being and occupational challenges []. This study used a cross-sectional survey design, conducted in accordance with the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines ( []).
We used random cluster sampling to select 46 hospitals (26 Level-II and 20 Level-III) from municipal and county-level medical institutions in Xi’an. Eligible participants included licensed physicians (including therapists and clinical practitioners) with full-time or contractual employment status in either public or private hospitals. To ensure sample homogeneity and mitigate potential selection bias, we restricted our sample to physicians affiliated with institutions that had formally implemented DHT programs. This inclusion criterion accounted for self-selection bias, given that physicians who had adopted DHT voluntarily before institutional rollout might have exhibited systematically more favorable attitudes toward DHT than the broader physician population (detailed information on the data resources is provided in ).
To ensure data quality, we conducted a pilot test with 814 health care workers (achieving 93.5% compliance) and trained liaison officers from 33 city-level hospitals and 9 county-level government departments on survey protocols, quality control, and tool usage. We implemented a range of data quality control measures, including consistency checks (eg, control questions 12 and 55), logic verification (eg, years of service), outlier detection (eg, age range), and completion time analysis (requiring >3 minutes for >90% completion). From an initial 8617 responses, 3766 were excluded due to incomplete data (n=283), invalid entries (n=97), excessively short completion times (n=46), or employment at institutions where the relevant DHT was not implemented or its status was unknown (n=3431). The remaining 4851 responses were included in the final analysis (detailed Missing Completely At Random test results are provided in Section 2, ).
Demographic and Occupational Characteristics of Participants
Drawing on previous literature regarding barriers and facilitators of DHT adoption, which highlights the association between certain sociodemographic and occupational characteristics (eg, age, gender, professional title, and years of experience) [] and DHT adoption, we included similar indicators in our analysis to examine their association with profile membership. Specifically, the sociodemographic and occupational factors assessed in this study comprised: (1) sociodemographic factors such as gender, age, educational attainment (Bachelor’s, Master’s, or PhD), annual income level (stratified by tertiles), and self-rated health status (5-point Likert scale: 1=very poor to 5=excellent); and (2) occupational variables such as hospital grade (Level-II [secondary] vs Level-III [tertiary], professional title [resident, attending, or chief physician]), years of clinical experience, weekly working hours, monthly night shift frequency, as well as psychosocial measures including work satisfaction (assessed using a 10-item scale), occupational stress (4-item scale), and doctor-patient relationship quality (3-item scale).
Doctor-Patient Relationship Quality Scale
Physicians’ perceptions of the doctor-patient relationship were measured using the DPRQ-3 (Doctor-Patient Relationship Questionnaire-3), a simple and easy-to-use questionnaire designed for assessing the doctor-patient relationship in medical settings, and served as the primary independent variable []. This 3-item scale includes questions such as: “How do you feel patients respect the doctor?”, “To what extent do you believe society respects the doctor profession?”, and “What do you think of the current doctor-patient relationship?”. Participants answered each item using a 5-point Likert scale (1=very disrespectful or very bad to 5=very respectful or very good). In this paper, the Cronbach α coefficient of this scale was 0.82.
Occupational Stress Scale
In this study, occupational stress is defined as the stressful aspects of clinical work encountered by physicians in their professional environment. The occupational stress scale was adapted from existing instruments to measure the psychological distress perceived by medical staff while performing their duties [,]. Participants responded to 4 items on a 6-point Likert scale ranging from 1 (strongly disagree) to 6 (strongly agree). These items included: “Overall, I feel great pressure at work,” “I feel a high level of tension at work,” “I’m having trouble sleeping because of work,” and “I’m nervous about going to work.” Selected items capture core dimensions of nursing stress (global pressure, tension, sleep disturbance, and work avoidance), aligning with Lazarus’s transactional stress model []. This scale is a validated tool that has been extensively used as a measure of job pressure and psychological distress in both medical staff and general occupational research, thus demonstrating its applicability to this study []. The total scores ranged from 4 to 24 and demonstrated high internal consistency (Cronbach α=0.94; composite reliability=0.88).
Work Satisfaction Scale
Work satisfaction was measured using a 10-item scale assessing several dimensions: overall job satisfaction, satisfaction with colleagues, expected income, leadership, working facilities, promotion prospects, internal management, welfare benefits, training opportunities, and opportunity for skill use []. Participants rated each item on a 6-point Likert scale ranging from “1=very dissatisfied to 6=very satisfied,” resulting in a total score from 10 to 60. The scale exhibited excellent internal consistency (Cronbach α=0.95). The full details of the scale are provided in Part B of .
Digital Health Care Technology Adoption Scale
Current literature indicates that both the general public and health care professionals widely recognize the significant potential benefits and barriers associated with DHTs [,,] or eHealth services []. With the aim of thoroughly investigating practicing physicians’ perspectives and preferences related to the implementation of DHTs, we developed a 14-item DHT adoption scale comprising 3 dimensions, based on a comprehensive literature review [,]. The scale development process, including expert validation procedures and pilot testing protocols, is provided in detail in Section 2, . Specifically, the selection of the 3 core dimensions—Perceived Benefits, Adoption Barriers, and Behavioral Intention—was guided by established technology adoption theories, notably the technology acceptance model and the UTAUT theories, which posit that behavioral intention is determined by a trade-off between perceived benefits (eg, usefulness) and perceived costs or barriers (eg, ease of use and risks) []. Also, recognizing that personal preference does not always translate into actual use, we incorporated a third dimension, Behavioral Intention, to capture a more behavioral measure of overall adoption willingness. This tripartite structure allows for a more comprehensive assessment that spans attitudinal, perceptual, and behavioral aspects of adoption.
Within the Perceived Benefits domain, which consists of 8 items, 4 specific indicators were identified as the most frequently cited drivers of DHT adoption in systematic reviews and physician surveys. These indicators include (1) improved diagnostic and treatment quality, (2) enhanced patient trust and satisfaction, (3) error rate reduction, and (4) increased income (driven by improved diagnostic and treatment efficiency) [,]. From the physician’s perspective, these represent core utilitarian, relational, and practical incentives. Similarly, the Adoption Barriers domain contains 5 items, with 4 key indicators consistently highlighted in previous literature as the most prevalent and impactful obstacles. These indicators comprise (1) technical barriers, (2) cybersecurity risks, (3) workload increase, and (4) patient experience reduction [,], reflecting central concerns regarding feasibility, security, and clinical workflow. The third dimension, Behavioral Intention, was assessed using a single-item scale designed to measure overall willingness to adopt. This provides a pragmatic measure of behavioral outcomes, complementing the multidimensional perceptual factors. Taken together, this framework ensures the scale captures both the complexity of DHT adoption decisions and a concrete behavioral intention.
All items were rated on a 5-point Likert scale, with each indicator score standardized to a range of 1 to 5. Higher scores in the Perceived Benefits domain indicated that participants recognized greater potential benefits of DHTs, whereas lower scores in the Adoption Barriers domain suggested that participants perceived higher potential costs and risks associated with DHT implementation. Correspondingly, higher scores in the Behavioral Intention domain demonstrated increased likelihood of both initial adoption and sustained usage of DHTs. The scope of DHTs considered in this study and the specific items included in the DHT scale are provided in Part A of . This scale demonstrated high internal consistency, with a Cronbach α of 0.88. Detailed information regarding the validity of the scale is provided in Table S5 of .
Data Analysis
Descriptive statistics and bivariate correlations were analyzed using Stata 17 (StataCorp LLC). Mplus version 8.3 (Muthén & Muthén) software was used to conduct the LPA and identify the DHT subgroups based on 9 domains (4 benefit domains, 4 barrier domains, and 1 objective domain). We assessed model fit using a comprehensive set of indices [], including the Akaike information criterion (AIC), Bayesian information criterion (BIC), adjusted BIC (aBIC), entropy, the Lo-Mendell-Rubin likelihood ratio test, and the bootstrap likelihood ratio test (BLRT). Lower values of AIC, BIC, and aBIC indicated better model fit []. The Lo-Mendell-Rubin likelihood ratio test and BLRT were used to compare improvements in model fit between adjacent models, with a significant P value (P<.05) suggesting that the class-k model provided a better fit than the class k-1 model. Entropy values, ranging from 0 to 1, were used to evaluate classification quality, with values closer to 1 indicating clearer class separation. In addition, the average posterior probability of class membership was examined, with values ≧0.80 indicating good discriminability. To ensure the validity of the results, each class was required to comprise more than 5% of the total sample []. The uncertainty in the estimated latent profile proportions was quantified using 95% CIs, constructed via a nonparametric bootstrap approach with 1000 replications. This method is robust and does not rely on distributional assumptions, making it particularly suitable for latent variable models.
Next, we performed ANOVA to compare DHT subscale scores across the 5 latent classes. Between-group differences in demographic, health, and occupational characteristics across DHT subtypes were assessed using χ2 tests (for categorical variables) and ANOVA (for continuous variables). To examine the relationships between the identified DHT profiles and key variables, we performed multivariate multinomial logistic regression analyses. Multicollinearity was assessed using variance inflation factor analysis (Table S4 in ). These models assessed the associations between DHT profiles and various predictors, with statistical significance determined at P<.05 (2-tailed).
Ethical Considerations
This study collected solely demographic and professional information, excluding any sensitive or personally identifiable biological data. The study protocol was approved by the Biomedical Ethics Committee of Xi’an Jiaotong University (approval no XJTUAE-2647). Electronic informed consent was obtained from all participants, and institutional authorization was granted by the Xi’an Municipal Health Commission. For the secondary analysis of the research data, we confirmed that the original ethical approval and consent procedures for the “2023 Healthcare Worker Survey” permitted the reuse of data for public health and policy studies without additional participant consent.
In this study, we prioritized the privacy and confidentiality of participants. The survey was designed to collect only nonsensitive information without any personally identifiable data. All data were deidentified at the time of collection, and analyses were conducted on aggregated datasets to prevent reidentification. Participants were not offered any form of compensation, as the survey was part of routine institutional activities. No images or multimedia materials that could lead to the identification of any individual are included in the paper or supplementary files.
Results
Descriptive Statistics and Correlations
A total of 4851 Chinese registered doctors from 46 health care facilities (including 26 Level-II hospitals and 20 Level-III hospitals) in Xi’an were analyzed in this study. The mean age was 38.37 (SD 8.67) years, with a range of 20 to 80 years. Among the participants, 2944 (60.69%) were female, and 1907 (39.31%) were male. In terms of education, 56.17% (2725/4851) held graduate degrees (master’s or doctoral degrees), while 43.83% (2126/4851) had a bachelor’s degree or below.
Among the 9 items in the DHT perception scale, the diagnosis and treatment quality indicator had the highest mean score of 3.98 (SD 0.78) in the benefit domain, while the income increase indicator had the lowest mean score of 3.08 (SD 1.01). In the barrier domain, the patient experience reduction indicator had the highest mean score of 3.80 (SD 0.96), whereas the workload increase indicator had the lowest mean score of 3.59 (SD 0.98). The mean score for the overall willingness indicator was 3.69 (SD 0.89). In terms of job-related scales, the mean scores for work satisfaction, occupational stress, and doctor-patient relationship perception were 44.30 (SD 9.69), 16.22 (SD 4.85), and 7.85 (SD 2.08), respectively. The bivariate correlations among the study variables are provided in Table S1 of . All indicators of DHT were moderately correlated; furthermore, compared to correlation analysis, LPA offers a more detailed characterization of Chinese doctors’ diverse perspectives on DHT.
Detecting Latent Profiles
The model fit statistics for the 1‐6 latent profile models are provided in . With an increase in the number of latent profiles, the AIC, BIC, and aBIC gradually decreased, and the BLRT showed significant results in comparisons between all models with k and k–1 classes. Although the class-6 model demonstrated the best fit based on AIC, BIC, aBIC, and entropy, the first group in this model included only 77 participants (1.6% of the total sample), leading to the rejection of the class-6 model. Compared to the class-4 model, the class-5 model identified a new category with a distinct DHT-related response probability pattern. Based on its optimal balance of model fit and interpretability, the class-5 model was selected as the final solution. This model showed the highest classification accuracy among comparable models, with an entropy value of 0.883, indicating well-separated and mutually exclusive profiles. This finding is further supported by the high average posterior class probabilities provided in Table S3 in .
Table 1. Model fit indices for the compared latent profile analysis models evaluating digital health technology adoption among physicians in China (cross-sectional survey, 2023; N=4851).
Model
AIC
BIC
aBIC
pLMR
pBLRT
Entropy
Group size for each profile
1
2
3
4
5
6
Class-1
113430.03
113546.79
113489.59
—
—
—
4851
—
—
—
—
—
Class-2
107352.93
107534.56
107445.59
<.001
<.001
0.760
2292
2559
—
—
—
—
Class-3
102959.52
103206.02
103085.26
<.001
<.001
0.830
2326
617
1908
—
—
—
Class-4
99087.54
99398.91
99246.38
<.001
<.001
0.882
1120
584
2485
562
—
—
Class-5
96769.86
97146.10
96961.80
<.001
<.001
0.883
516
1003
2276
545
511
—
Class-6
95262.60
95703.71
95487.65
<.001
<.001
0.889
528
77
1149
2082
498
517
aAIC: Akaike information criterion.
bBIC: Bayesian information criterion.
cABIC: adjusted BIC.
dpLMR: P value for LoMendell-Rubin adjusted likelihood ratio test for K versus K–1 profiles.
epBLRT: P value for bootstrapped likelihood ratio test.
fNot applicable.
The latent profile memberships showed significant differences in the means of the 8 indicator variables (as provided in Table S2 in ), and their characteristics are summarized in . The LPA was conducted to identify physician subgroups based on their standardized responses (on a 1–5 scale) across 3 key domains: Perceived Benefits, Adoption Barriers, and Behavioral Intention. The Perceived Benefits domain encompassed four indicators: (1) improved diagnostic and treatment quality, (2) enhanced patient trust and satisfaction, (3) error rate reduction, and (4) increased income. The Adoption Barriers domain included: (1) technical barriers, (2) cybersecurity risks, (3) workload increase, and (4) patient experience reduction. The Behavioral Intention domain measured the overall willingness to adopt. In the resulting profiles (Figure 1), higher scores in Perceived Benefits and Behavioral Intention indicate more positive perceptions and a greater likelihood of adoption, respectively. Conversely, higher scores in Adoption Barriers signify that physicians perceived these obstacles as more severe. The ANOVA and Bonferroni post hoc tests indicated that DHT subscale scores differed in all 5 classes (P<.001), with the “Error Rate Reduction” variable exhibiting the largest effect size (η2=0.627). In , Class 1 (n=516, 10.64% of the sample; 95% CI 9.76%-11.52%) demonstrated a distinctive pattern characterized by high perceived benefits, high perceived barriers, yet positive overall willingness toward DHTs. This profile represents physicians who recognize both notable advantages and substantial risks of digital health tools, but tend to maintain a generally positive willingness to adopt and use these technologies. Their pattern could suggest a risk-aware yet largely optimistic approach to digital transformation, potentially serving as engaged evaluators who might help optimize DHT implementation while acknowledging its challenges. This unique profile was therefore classified as the “Reform-Adaptable” group. Class 2 (n=1003, 20.68% of the sample, 95% CI 19.50%-21.86%) exhibited consistently low scores across all dimensions, suggesting generally skeptical attitudes toward DHTs. This profile appears to reflect physicians who perceive relatively minimal benefits while emphasizing substantial barriers, resulting in largely negative adoption intentions. Their resistance seems rooted in both practical concerns about implementation challenges and some fundamental doubts about the value of DHTs. This group was designated the “Negative” group. Class 3 (n=2276, 46.92% of the sample; 95% CI 45.50%-48.34%) was characterized by moderate scores near the average on all subscales. We interpret this pattern as representing physicians who acknowledge both the advantages and limitations of DHTs without a firm stance. This neutral position likely entails a “wait-and-see” approach, where adoption is contingent on contextual factors such as organizational support and peer behavior. Based on this rationale, we identified this group as the “Neutral” profile. Class 4 (n=545, 11.23% of the sample; 95% CI 10.33%-12.13%) presented a profile of low perceived benefits, low perceived barriers, and cautious overall willingness. These physicians appear to perceive limited advantages from DHTs while also minimizing implementation risks, resulting in generally low adoption intentions that seem based more on skepticism about the fundamental value proposition of DHTs rather than specific implementation concerns. This group was therefore labeled the “Reform-Conservative” group. Class 5 (n=511, 10.53% of the sample; 95% CI 9.66%-11.40%) displayed uniformly high scores across all subscales, implying favorable dispositions toward DHTs. This profile may represent physicians who recognize strong benefits, tend to minimize perceived barriers, and demonstrate relatively high adoption willingness. Their pattern suggests generally positive acceptance of digital transformation and potential leadership roles in promoting DHT implementation within their institutions. Consequently, this group was classified as the “Positive” group.
Figure 1. Characteristics of the 5 digital health technology (DHT) adoption profiles identified by latent profile analysis among hospital-based physicians in China (cross-sectional survey, 2023; N=4851), based on patterns of Perceived Benefits, Adoption Barriers, and Behavioral Intention.
Comparison of Demographic and DHT Scales in Each Latent Profile
outlines the comparison of demographic and job-related variables across different latent profiles. Significant differences were observed among the 5 DHT classes for variables such as gender, education background, income level, professional and technical title, working hours per week, years of health care work experience, self-rated health, work satisfaction, doctor-patient relationship perception, and occupational stress (all P<.05). However, no significant differences were found for age and night shift status across the 5 DHT profiles.
Table 2. Association between identified digital health technology adoption profiles and demographic and occupational characteristics among physicians in China (cross-sectional survey, 2023; N=4851).
g ANOVA F tests are used for continuous variables; F (df1, df2).
h Chi-square tests (χ² tests) are used for categorical variables; Chi-square (df).
As shown in , the Positive group (Class 5) demonstrated significantly higher proportions of participants affiliated with Level-II hospitals (χ24=38.32; P<.001), holding resident physician titles (χ28=44.96; P<.001), and possessing bachelor’s degrees (χ24=15.50; P<.001) compared with other groups. Notably, this group also reported the highest mean scores in both work satisfaction (mean 51.49, SD 9.92) and occupational stress (mean 18.82, SD 5.75).
Multivariate Multinomial Regression Results
and show the associations between key predictors and latent profile membership, using the subsequent class in each column as the reference. Male physicians were less likely to belong to the Neutral (Class 3) and Reform-Conservative (Class 4) groups compared with both the Reform-Adaptable (Class 1) and Negative (Class 2) groups (all odds ratios [ORs] <1), but more likely to belong to the Positive group (Class 5) than to Class 4 (OR 1.39, 95% CI: 1.05-1.84; P=.02). Those with a master’s degree or higher were less likely to be in Class 4 than Class 3 (OR 0.75, 95% CI 0.59‐0.96; P=.02). When using Class 2 as the reference, better self-rated health was significantly associated with higher odds of belonging to Class 1 (OR 1.21, 95% CI 1.03‐1.42; P=.02), Class 3 (OR 1.20, 95% CI 1.07‐1.34; P=.001), and Class 5 (OR 1.32, 95% CI 1.12‐1.55; P=.001). These graded associations indicate that gender, education, and self-rated health are important differentiating factors across distinct DHT perception profiles. However, contrary to expectations derived from existing literature, our findings revealed that age, professional title, and years of work experience did not significantly predict DHT adoption profile membership among physicians in the Chinese sample (all P>.05), suggesting important contextual differences in the determinants of DHT adoption.
Table 3. Multinomial logistic regression results (Part A) examining the demographic and occupational predictors of membership in the 5 digital health technology adoption profiles among Chinese physicians (cross-sectional survey, 2023; N=4851).
Variable
Class 5 vs Class 1, OR (95% CI)
Class 5 vs Class 2, OR (95% CI)
Class 5 vs Class 3, OR (95% CI)
Class 5 vs Class 4, OR (95% CI)
Class 2 vs Class 1, OR (95% CI)
Age (years)
0.99 (0.96‐1.02)
0.98 (0.95‐1.00)
1.00 (0.98‐1.03)
0.99 (0.96‐1.01)
1.01 (0.98‐1.04)
Gender (ref: female)
Male
0.89 (0.68‐1.18)
0.93 (0.71‐1.19)
1.23 (0.99‐1.54)
1.39 (1.05-1.84)
0.96 (0.75‐1.22)
Educational background (ref: bachelor’s degree and below)
Master’s degree and above
0.90 (0.64‐1.27)
1.01 (0.75‐1.36)
0.94 (0.72‐1.22)
1.25 (0.89‐1.75)
0.90 (0.67‐1.20)
Hospital grade (ref: Level-II)
Level-III
0.57 (0.39‐0.82)
0.66 (0.48‐0.90)
0.80 (0.61‐1.05)
0.56 (0.39‐0.81)
0.86 (0.62‐1.20)
Professional title (ref: resident physician)
Attending physician
1.06 (0.72‐1.54)
1.24 (0.88‐1.74)
1.18 (0.88‐1.60)
1.30 (0.87‐1.94)
0.85 (0.61‐1.19)
Chief physician
1.10 (0.63‐1.93)
0.93 (0.56‐1.52)
1.07 (0.69‐1.67)
0.89 (0.51‐1.58)
1.19 (0.73‐1.93)
Annual income level (ref: low)
Middle
0.90 (0.65‐1.25)
0.99 (0.74‐1.32)
0.82 (0.63‐1.06)
0.72 (0.51‐1.01)
0.91 (0.69‐1.22)
High
0.92 (0.62‐1.36)
1.01 (0.72‐1.44)
0.78 (0.57‐1.07)
0.43 (0.29‐0.63)
0.90 (0.64‐1.27)
Working hours (ref: ≤48 h/wk
>48 h/wk
0.78 (0.58‐1.03)
0.74 (0.58-0.96)
0.89 (0.71‐1.11)
0.60 (0.45‐0.80)
1.04 (0.81‐1.34)
Night shifts (ref: ≤4 nights/time per month)
>4 nights/time per month
0.86 (0.65‐1.15)
1.00 (0.78‐1.30)
1.02 (0.81‐1.28)
1.15 (0.86‐1.54)
0.86 (0.67‐1.11)
Health care working experience (ref: ≤10 years)
>10 years
0.89 (0.57‐1.39)
1.07 (0.73‐1.59)
0.92 (0.65‐1.30)
1.15 (0.74‐1.79)
0.83 (0.57‐1.21)
Self-rated health status
1.09 (0.91‐1.29)
1.32 (1.12‐1.55)
1.10 (0.95‐1.26)
1.23 (1.02-1.48)
0.83 (0.70-0.97)
Work Satisfaction Scale
1.04 (1.02‐1.06)
1.14 (1.12‐1.16)
1.10 (1.09‐1.12)
1.16 (1.14‐1.18)
0.91 (0.90‐0.93)
Doctor-Patient Relationship Scale
1.08 (1.01‐1.16)
0.86 (0.81‐0.92)
0.94 (0.89‐0.99)
0.77 (0.72‐0.82)
1.25 (1.17‐1.33)
Occupational Stress Scale
1.26 (1.22‐1.30)
1.18 (1.15‐1.22)
1.13 (1.11‐1.16)
1.12 (1.08‐1.15)
1.07 (1.04‐1.09)
aClass 1: Reform-Adaptable group.
bClass 2: Negative group.
cClass 3: Neutral group.
dClass 4: Reform-Conservative group.
eClass 5: Positive group.
fOR: odds ratio.
gBolded ORs indicate significance.
hP<.05.
iP<.01.
Table 4. Multinomial logistic regression results (Part B) examining the demographic and occupational predictors of membership in the 5 digital health technology adoption profiles among Chinese physicians (cross-sectional survey, 2023; N=4851).
Variable
Class 4 vs Class 1, OR (95% CI)
Class 4 vs Class 2, OR (95% CI)
Class 4 vs Class 3, OR (95% CI)
Class 3 vs Class 1, OR (95% CI)
Class 3 vs Class 2, OR (95% CI)
Age (years)
1.00 (0.97‐1.03)
0.99 (0.96‐1.01)
1.02 (0.99‐1.04)
0.99 (0.97‐1.01)
0.98 (0.96‐1.00)
Gender (ref: female)
Male
0.64 (0.48‐0.85)
0.67 (0.54‐0.84)
0.89 (0.72‐1.10)
0.72 (0.58‐0.90)
0.76 (0.64‐0.89)
Educational background (ref: bachelor’s degree and below)
Master’s degree and above
0.73 (0.52‐1.01)
0.80 (0.62‐1.05)
0.75 (0.59-0.96)
0.97 (0.74‐1.25)
1.07 (0.89‐1.30)
Hospital grade (ref: Level-II)
Level-III
1.01 (0.69‐1.48)
1.17 (0.86-1.59)
1.43 (1.08-1.89)
0.71 (0.53-0.95)
0.82 (0.66-1.01)
Professional title (ref: resident physician)
Attending physician
0.81 (0.55‐1.21)
0.95 (0.68‐1.33)
0.91 (0.67‐1.23)
0.89 (0.67‐1.20)
1.05 (0.84‐1.31)
Chief physician
1.23 (0.70‐2.17)
1.03 (0.65‐1.63)
1.20 (0.79‐1.83)
1.02 (0.67‐1.59)
0.87 (0.63‐1.19)
Annual income level (ref: low)
Middle
1.25 (0.90‐1.76)
1.38 (1.04-1.82)
1.13 (0.87‐1.47)
1.10 (0.85‐1.43)
1.21 (1.00-1.46)
High
2.15 (1.45‐3.18)
2.38 (1.73‐3.26)
1.83 (1.37‐2.45)
1.17 (0.86‐1.59)
1.29 (1.03-1.62)
Working hours (ref: ≤48 h/wk)
>48 h/wk
1.30 (0.98-1.73)
1.25 (1.00-1.59)
1.48 (1.19‐1.83)
0.88 (0.70‐1.10)
0.84 (0.72-1.00)
Night shifts (ref: ≤4 nights/time per month)
>4 nights/time per month
0.75 (0.56‐1.01)
0.87 (0.69‐1.10)
0.88 (0.71‐1.10)
0.85 (0.68‐1.06)
0.99 (0.83‐1.17)
Health care working experience (ref: ≤10 years)
>10 years
0.77 (0.50‐1.19)
0.93 (0.65‐1.33)
0.80 (0.58‐1.10)
0.97 (0.69‐1.36)
1.17 (0.91‐1.51)
Self-rated health status
0.89 (0.73‐1.07)
1.07 (0.92‐1.26)
0.89 (0.77‐1.03)
0.99 (0.86‐1.14)
1.20 (1.07‐1.34)
Work Satisfaction Scale
0.89 (0.88‐0.91)
0.98 (0.96‐0.99)
0.95 (0.94‐0.96)
0.94 (0.93‐0.95)
1.03 (1.02‐1.04)
Doctor-Patient Relationship Scale
1.40 (1.30‐1.51)
1.12 (1.06‐1.19)
1.23 (1.16‐1.29)
1.14 (1.08‐1.21)
0.92 (0.88‐0.96)
Occupational Stress Scale
1.13 (1.09‐1.16)
1.06 (1.03‐1.09)
1.02 (1.01-1.04)
1.11 (1.09‐1.14)
1.04 (1.02‐1.06)
aClass 1: Reform-Adaptable group.
bClass 2: Negative group.
cClass 3: Neutral group.
dClass 4: Reform-Conservative group.
eClass 5: Positive group.
fOR: odds ratio.
gBolded ORs indicate significance.
hP<.05.
iP<.01.
Notably, several work-related patterns emerged from the analysis. Physicians from tertiary (Level-III) hospitals were significantly less likely to be in Class 5 than in Classes 1, 2, and 4 (OR 0.57, 95% CI 0.39‐0.82; OR 0.66, 95% CI 0.48‐0.90; and OR 0.56, 95% CI 0.29‐0.81, respectively; all P=.001), but more likely to be classified in Class 4 than in Class 3 (OR 1.43, 95% CI 1.08‐1.89; P=.008). Furthermore, higher income was strongly associated with membership in Class 4 compared with all other classes (vs Class 1: OR 2.15, 95% CI 1.45‐3.18; vs Class 2: OR 2.38, 95% CI 1.73‐3.26; vs Class 3: OR 1.83, 95% CI 1.37‐2.45; vs Class 5: OR 2.34, 95% CI 1.58‐3.48; all P=.001). Similarly, working more than 48 hours per week significantly increased the likelihood of belonging to Class 4 relative to Classes 2, 3, and 5 (OR 1.25, 95% CI 1.08‐1.89, P=.045; OR 1.48, 95% CI 1.19‐1.83, P=.001; OR 1.67, 95% CI 1.24‐2.22, P=.001, respectively). When compared with Class 2, members of Class 3 were more likely to have higher income levels (middle income: OR 1.21, 95% CI 1.00‐1.46, P=.047; high income: OR 1.29, 95% CI 1.03‐1.62, P=.03) yet less likely to work over 48 hours per week (OR 0.84, 95% CI 0.72‐1.00; P=.044).
Compared with Class 1, individuals with higher work satisfaction were more likely to belong to Class 5 (OR 1.04, 95% CI 1.02‐1.06), while those with lower work satisfaction showed greater probabilities of membership in Class 2 (OR 0.91, 95% CI 0.90‐0.93), Class 3 (OR 0.94, 95% CI 0.93‐0.95), and Class 4 (OR 0.89, 95% CI 0.88‐0.91). Higher occupational stress and more positive doctor-patient relationship perceptions were also significantly associated with membership in Classes 2, 3, 4, and 5 relative to Class 1 (all P=.001). When compared with Class 2, higher work satisfaction (OR 1.03, 95% CI 1.02‐1.04) and more negative doctor-patient relationship perceptions (OR 0.92, 95% CI 0.88‐0.96) predicted membership in Class 3, whereas lower work satisfaction (OR 0.98, 95% CI 0.96‐0.99) and more positive relationship perceptions (OR 1.12, 95% CI 1.06‐1.19) were associated with Class 4. Higher occupational stress elevated the probability of classification into both Class 3 (OR 1.04, 95% CI 1.03‐1.09) and Class 4 (OR 1.06, 95% CI 1.03‐1.09). Also, using Class 3 as the reference, higher work satisfaction reduced the likelihood of belonging to Class 4 (OR 0.95, 95% CI 0.94‐0.96), while more positive doctor-patient relationship perceptions increased it (OR 1.23, 95% CI 1.16‐1.29). All reported associations were statistically significant (P=.001).
Furthermore, compared with physicians in Classes 2, 3, and 4, those in Class 5 demonstrated distinct characteristics across 3 key domains. Specifically, Class 5 physicians showed significantly higher odds of severe occupational stress (OR range 1.12‐1.18; P=.001), reported greater work satisfaction (OR range 1.10‐1.16; P=.001), yet held less positive expectations regarding doctor-patient relationships (OR range 0.77‐0.94; P=.001; refer to and for details).
Discussion
Principal Findings
This study accomplished its 2 primary objectives by applying LPA to examine physicians’ adoption of DHTs. First, using a tripartite framework (Perceived Benefits, Adoption Barriers, and Behavioral Intention), the analysis identified 5 clinically meaningful profiles that moved beyond conventional classifications [,]: Reform-Adaptable (n=516, 10.64%), Negative (n=1003, 20.68%), Neutral (n=2276, 46.92%), Reform-Conservative (n=545, 11.23%), and Positive (n=511, 10.53%). Second, the analysis demonstrated that profile membership was systematically correlated with a range of key demographic and occupational factors, including gender, education, income, hospital tier, working hours, self-rated health, occupational stress, job satisfaction, and perceptions of doctor-patient relationships. This association confirms the substantial heterogeneity in DHT adoption among physicians. Given their pivotal role in implementing DHTs to enhance patient care [], this divergence warrants attention and further investigation. By identifying the specific factors linked to each profile, our findings provide an empirical basis for developing tailored implementation strategies that account for these distinct physician subgroups.
In this study, we found that levels of occupational stress and work satisfaction differed significantly across the 5 latent profiles. Specifically, physicians reporting relatively high occupational stress alongside high work satisfaction were more likely to belong to Class 5 (Positive group), a profile characterized by greater perceived benefits and fewer adoption barriers regarding DHT implementation. To interpret this seemingly counterintuitive association, we used the Job Demands-Resources framework [], which posits that high job demands can motivate the adoption of functional resources, including digital tools, to mitigate work pressure. Our findings support this mechanism: physicians in the Positive group indicated that DHTs contributed to improved work efficiency and better management of daily workloads, notably by facilitating remote consultations and streamlining follow-up processes. Rather than perceiving digital tools as additional burdens, these physicians used DHTs as strategic resources to maintain autonomy and reduce time-related pressures. This observation aligns with previous studies indicating that health care professionals under high workload demands often adopt efficiency-enhancing technologies, including automated electronic health records, to alleviate operational strain and prevent burnout [].
Furthermore, we found that the combination of high stress and high job satisfaction likely reflects a subgroup of physicians who are highly engaged and adaptive. In our sample, those with greater work satisfaction (often stemming from institutional trust and personal adaptability) were generally more receptive to technological innovations promising improved efficiency, such as telemedicine systems []. Thus, our results suggest that, for certain physicians, occupational challenges may not inhibit but could even stimulate willingness to adopt practical digital solutions.
A notable divergence emerged between these findings and those of previous studies in the Western context [,], which identified physician age as a significant predictor of DHT adoption patterns. One plausible explanation may lie in the comprehensive integration of digital technologies within China’s health care system. The mandatory adoption of health codes during the COVID-19 pandemic and the widespread implementation of internet-based consultation systems may have reduced age-related digital disparities among physicians, diminishing the influence of online age as a distinguishing factor in DHT adoption. In addition, gender differences in DHT adoption patterns may reflect broader sociocultural dynamics within Chinese healthcare service systems. Female physicians—who comprised most of our sample—often bear disproportionate responsibilities for both clinical work and family care, which may limit their capacity to engage with new technologies that require additional training time. Previous studies suggest that women in healthcare settings, both in China and globally, tend to adopt a more cautious approach to technology adoption, prioritizing established practicality and reliability over novelty [,]. We also found that income level emerged as a significant predictor, likely reflecting structural aspects of China’s compensation system. Physicians in higher income brackets, often concentrated in specialized fields and tertiary hospitals, may perceive less economic incentive to adopt DHTs that could disrupt established workflows without immediate financial benefits. Conversely, physicians in lower-income segments might view DHTs as potential tools for improving efficiency and patient volume, thereby increasing earnings [].
Furthermore, while no significant differences were observed across professional titles, physicians working in secondary hospitals demonstrated a more positive perception of DHTs, reporting higher perceived benefits and lower barriers to adoption compared with those in tertiary hospitals. This divergence may reflect systemic differences within China’s tiered health care system. Physicians in tertiary hospitals frequently face overwhelming clinical workloads and academic pressures, which may contribute to innovation fatigue despite their greater access to technological resources. In contrast, secondary hospital physicians may perceive DHTs as strategic tools for enhancing institutional competitiveness and addressing resource constraints through telemedicine collaborations with tertiary centers. These findings suggest that implementing targeted DHT strategies in secondary hospitals could be particularly effective for improving service quality and patient satisfaction. For example, the COVID-19 pandemic catalyzed the widespread deployment of teleconsultation platforms to ensure continuity of care [,]. Videoconferencing enables not only remote patient monitoring but also real-time supervision of clinical teams by specialists from tertiary hospitals []. Evidence shows that many DHTs provide affordable platforms for grassroots hospitals to collaborate with advanced medical centers. Through structured initiatives, including clinician exchanges, treatment protocol standardization, and technical assistance, DHTs have significantly improved the quality of care at primary health care institutions and are strongly aligned with China’s tiered health care policy objectives [,]. These technologies help bridge resource gaps and expand access to specialized care, particularly for patients in secondary hospitals. The distinct patterns identified in this study, such as the reduced role of physician age and heightened receptivity in secondary hospitals, are shaped by China’s specific health care policy landscape [].
In fact, the national “Healthy China 2030” strategy explicitly prioritizes the integration of the internet, AI, and big data technologies throughout health care delivery []. This top-down mandate has catalyzed widespread institutional adoption of DHTs, creating an environment where exposure to digital tools is becoming universal. The rapid implementation of the health code system and telemedicine platforms during the COVID-19 pandemic, for instance, served as a form of nationwide digital training, which likely enhanced digital literacy among physicians of all demographic backgrounds and may have diminished conventional disparities associated with age []. Furthermore, as secondary hospitals are often direct targets of policy support and funding for digital capacity building, physicians in these settings report more positive perceptions of DHTs, viewing them as tools for professional advancement and better patient care. These findings may be generalizable to other health systems that use strong top-down digital integration policies and tiered care models, though local infrastructure and policy intensity would influence applicability.
Moreover, physicians with higher income levels, those working more than 48 hours per week, and those reporting more favorable doctor-patient relationships were more likely to belong to the Reform-Conservative group (Class 4), which perceived relatively low levels of both benefits and barriers associated with DHTs and maintained a conservative stance toward adoption. The association between more favorable doctor-patient relationships and membership in the Reform-Conservative group presents a theoretically intriguing paradox that merits elaboration. Rather than reducing DHT adoption, we believe this is because physicians with established positive patient relationships may perceive less need for DHTs that could potentially disrupt these carefully maintained interpersonal dynamics.
Within the Chinese health care context, where traditional relationship-centered models of care remain highly valued, physicians with strong patient relationships may view DHTs as potentially undermining the personal connection and trust they have cultivated. These physicians might perceive digital tools as introducing a layer of technological mediation into what they consider to be essentially human interactions, potentially diluting the emotional quality of care. Conversely, physicians experiencing challenges in patient communication might view DHTs as tools to enhance efficiency, standardize interactions, or overcome communication barriers, thus increasing their adoption motivation. This interpretation suggests that doctor-patient relationship quality operates not simply as a demographic variable but as a significant indicator of clinical satisfaction and practice style that consistently influences technology adoption decisions. Alternatively, this preference for traditional health care models may stem from the lack of observed improvements in service quality or efficiency post-DHT implementation in their settings, particularly among more clinically experienced physicians in demanding specialties such as neurosurgery, critical care, and emergency medicine. For these physicians, adapting complex workflows to incorporate DHTs may exacerbate feelings of burnout []. Similarly, in these demanding clinical environments, greater emphasis is placed on physicians’ technical competencies and their ability to deliver patient-centered health care services, which may consequently diminish their perceived need for DHTs [].
In contrast, the Reform-Adaptable group demonstrates a risk-aware yet optimistic approach, recognizing significant benefits despite acknowledging implementation barriers, resulting in consistently high adoption intentions. This group exhibits greater flexibility, often engaging in selective adoption of technologies with clear clinical advantages and actively participating in pilot programs. Policy measures should accordingly diverge: for Reform-Conservative physicians, efforts must demonstrate fundamental value through evidence-based outcomes and success stories, whereas Reform-Adaptable physicians may benefit from targeted support, technical assistance, and roles as digital champions to address specific workflow integration concerns.
In addition, many health care systems have failed to fully operationalize the targeted intervention capabilities of AI and digital solutions []. Across numerous institutions, the fundamental requirements for successful DHT implementation remain challenging, as issues of service accessibility, standardized protocols, safety guarantees, and system reliability are still not adequately addressed []. As technological advancements progress and clinical feedback from various departments informs iterative improvements to DHT systems, emerging technological breakthroughs—alongside evolving patient attitudes toward digital health care—may gradually shift the perspectives of more conservative practitioners and facilitate wider DHT adoption [].
Notably, approximately 31% of the physician cohort expressed significant concerns regarding DHT implementation barriers, particularly related to technological challenges, cybersecurity risks, increased workload, and potential negative impacts on patient experience. Consistent with previous comprehensive reviews [,,,], our study revealed that health care workers, regardless of the level of care or the specific technology involved, face recurring challenges related to infrastructure, technology, training, legal and ethical issues, time constraints, and workload increases. Furthermore, limitations on widespread DHT adoption are often rooted in health care workers’ anxiety about increased workload and disruptions to their established routines. This anxiety can contribute to professional burnout, which, in turn, threatens the long-term sustainability of these technologies [,]. These findings suggest that future development of DHTs should focus on thoughtfully integrating digital solutions with conventional clinical workflows to establish hybrid care delivery models that may help mitigate potential workload increases and burnout risks. To adequately address physicians’ concerns regarding DHT implementation, health care institutions should consider implementing tailored support systems. Specifically, customized training programs and continuing medical education initiatives designed to meet individual physicians’ competency needs and practice contexts could potentially reduce psychological barriers and facilitate more widespread, sustainable DHT adoption. Such personalized approaches may prove particularly valuable in addressing the varied adoption patterns identified in our study while maintaining clinical workflow integrity [].
While this study focuses on Chinese physicians, our findings reveal both parallels and distinctions with international contexts. Consistent with European findings, skepticism regarding the clinical value and workflow impact of DHTs was prevalent [,]. However, unlike US research emphasizing financial incentives, DHT adoption in China was more influenced by institutional support [,]. Comparisons with other Asian settings showed similar hospital-level effects, though these were more pronounced in China’s policy-driven system. This suggests that while core adoption mechanisms may be universal, specific drivers remain culturally and systemically distinct [].
Implications for Policy and Practice
The heterogeneity observed in DHT adoption profiles highlights the limitations of relying solely on efficiency-driven models and underscores the necessity of multidimensional assessment frameworks to guide successful DHT implementation within health care systems. The key distinction between these profiles lies in their 3D evaluation: Perceived Benefits, Adoption Barriers, and Behavioral Intention. The Reform-Adaptable group, despite perceiving high barriers, maintains a high willingness due to strong benefit perception and requires barrier-specific support. In contrast, the Reform-Conservative group shows low willingness driven by limited perceived benefits, necessitating value demonstration interventions. This perceptual divergence calls for tailored implementation strategies rather than uniform policies. Profile-specific recommendations are provided in Section 3 of .
Furthermore, this profiling framework enables the proactive management of systemic risks, such as workload intensification and burnout, particularly among overworked physicians (>48 hrs/wk) and conservative adopters. To ensure sustainable integration, especially in complex tertiary hospitals, health care systems must prioritize co-designed solutions that address critical implementation determinants such as interoperability, cybersecurity, and equitable workload redistribution. Consequently, policymakers can further support sustainable adoption by institutionalizing holistic adoption metrics that balance efficiency gains with medical workers’ well-being, ensuring that DHTs enhance rather than exacerbate pressures on the health care system. Consistent with the principles of the NASSS (Nonadoption, Abandonment, Scale-up, Spread, and Sustainability) framework principles, these strategies emphasize the need for context-adaptive implementation across technological, organizational, and professional dimensions, making them practical and scalable for long-term success [].
Strengths and Limitations
The current findings reveal heterogeneity among Chinese physicians, suggesting the potential value of tailored institutional measures and policies for DHT implementation. This study sought to introduce a person-centered analytical approach by using latent profile analysis, which moves beyond exclusive reliance on variable-centered methods to explore distinct typologies of physicians based on their multidimensional perceptions. This exploratory approach identified 5 potential subgroups, offering an alternative perspective for understanding adoption heterogeneity.
We developed and applied a preliminary 3D evaluation framework, encompassing perceived benefits, barriers, and overall willingness, to capture variations in adoption patterns. Furthermore, we examined how individual characteristics and occupational factors were associated with profile membership. The analyses indicated that the organizational context (eg, hospital tier) appeared to play a more prominent role than individual demographics in some profiles. These findings contribute to understanding physician acceptance within China’s policy environment and may offer a transferable methodological approach for examining technology adoption in other health care settings.
The typological framework itself represents a key innovation, offering a nuanced and actionable perspective for developing tailored interventions. For example, physicians in the Reform-Adaptable subgroup might benefit from barrier-reduction support, while those in the Reform-Conservative subgroup may require a clearer demonstration of technology value. The observed patterns around organizational determinants offer insights suggesting that national policy contexts might influence technology adoption pathways. By considering the characteristics of the different physician subgroups, health care administrators could explore ways to improve work environments, adjust workflows, and enhance DHT operational capabilities, potentially supporting physician engagement with DHT implementation.
Our study has several limitations that need to be acknowledged. First, the cross-sectional design of our study limits our ability to establish temporality and causality. While the selected evaluation indicators for DHT include both beneficial and adverse factors, future research must examine how health care professionals’ preferences evolve to support stronger causal inferences. Second, while this study benefits from a large sample size, its generalizability may be limited by the exclusive focus on physicians from Xi’an, China. Regions with different economic development levels, digital infrastructure, and policy implementation—both within China and globally—may demonstrate different adoption patterns. The digital health landscape varies significantly across health care systems in terms of funding, regulation, and technological readiness. However, the identified latent profiles and organizational influences reflect fundamental mechanisms that may transfer across similar contexts. Future research should validate these findings across diverse socioeconomic and cultural settings, particularly in rural areas and other countries with different health care models. Third, self-reported measures may involve social desirability bias, though anonymity was ensured. Future studies should include objective behavioral data.
Future Research Directions
As noted in previous research, health care professionals’ work environments significantly influence their adoption of DHTs. Consequently, we propose the following specific research directions. First, qualitative approaches such as in-depth interviews and focus groups could elucidate the reasons for resistance, particularly among physician subgroups skeptical of or negative toward DHTs. Second, longitudinal and mixed methods studies are warranted to explore how workplace factors—including job stress and doctor-patient relationships—shape DHT preferences over time, and how such preferences may, in turn, shape perceptions of the work environment. Finally, future research should expand the evaluation of DHT adoption willingness by integrating motivational factors such as incentive structures, professional fulfillment, and opportunities for personal development. This would support the creation of more nuanced typologies of physician engagement and help identify context-dependent barriers and facilitators across varied clinical settings.
Conclusion
This study used latent profile analysis to identify 5 distinct subgroups of Chinese physicians based on their perceptions of DHT adoption, providing a practical framework for designing precision interventions. While the profiles reveal considerable diversity in adoption attitudes, they also highlight unifying concerns about usability and professional autonomy that persist across all profiles. Our findings suggest divergent intervention pathways corresponding to these profiles. Reform-Adaptable physicians appear most likely to benefit from technical support and workflow integration, whereas Reform-Conservative physicians may respond better to compelling evidence of clinical value and peer success stories. These insights provide health care administrators and policymakers with empirically grounded guidance for developing tailored implementation strategies rather than relying on standardized approaches. Future research should validate the longitudinal stability of these profiles and assess tailored interventions through rigorous real-world trials. Ultimately, by embracing this nuanced understanding, health care systems can evolve from uniform implementation to precision enablement, thereby enhancing both the practical impact and responsible scalability of DHTs and addressing shared physician concerns.
The authors would also like to thank the editor and reviewers for their helpful suggestions and valuable comments. Most importantly, we thank all participating physicians for sharing their experiences amid demanding workloads. We confirm that no generative artificial intelligence tools were used in the preparation of this manuscript.
This study received financial support from multiple sources: the Leading Talents Project in Philosophy and Social Sciences, National Social Science Foundation of China (grant no 2022LJRC02), and the National Natural Science Foundation of China (grant nos 72374169 and 72474174).
The datasets generated or analyzed during this study are not publicly available, as they form part of an official health survey administered by the Shaanxi Provincial and Xi’an Municipal Health Commissions. However, the data are available from the corresponding author on reasonable request and with permission from the relevant health authorities.
None declared.
Edited by Amaryllis Mavragani, Stefano Brini; submitted 20.May.2025; peer-reviewed by Ahmed Tausif Saad, Judy Bowen, Kamel Mouloudj; final revised version received 10.Oct.2025; accepted 10.Oct.2025; published 26.Nov.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.