Introduction
Heart disease remains the leading cause of death for women in the United States []. Over 60 million women in the United States are living with heart disease []. Despite public campaigns, such as “Go Red for Women” by the American Heart Association [], awareness of heart disease as the leading cause of death among women has declined from 65% in 2009 to 44% in 2019 []. The most significant declines have been observed among Hispanic women, Black women, and younger women []. Given this troubling trend, there is an urgent need for alternative and scalable approaches to increase knowledge and awareness of heart disease in women.
An artificial intelligence (AI) chatbot could be one of the promising approaches to improve women’s awareness of heart disease. AI chatbots are built on natural language processing, natural language understanding, and machine learning. Several systematic reviews have investigated the usability and potential efficacy of AI chatbots in managing patients with various health conditions. Overall, AI chatbot–based interventions have shown the potential to improve mental health, such as depressive and anxiety symptoms; promote healthy diets; and enhance cancer screenings [-]. However, the fast-growing capabilities of AI chatbots raise questions about their ability to compete with human cognitive and emotional intelligence. Yet, only a few randomized controlled trials (RCTs) have directly compared the efficacy of AI chatbots to that of human agents. For example, the studies reported that the AI chatbot offers efficacious counseling to patients with breast cancer comparable to that of health professionals [-]. To the best of our knowledge, no clinical trial exists on whether an AI chatbot is effective in increasing women’s heart attack awareness and knowledge. Further empirical investigation is needed to more comprehensively evaluate the efficacy of AI chatbots compared to human agents.
Our research team initiated an AI chatbot development project aimed at increasing women’s knowledge and awareness of heart attack. As a first step, we collected a conversational dataset in which a research interventionist texted each participant with educational content on heart health (Human dataset) over 2 days. We subsequently developed and tested a fully automated SMS text messaging–based AI chatbot system named HeartBot, available 24/7, designed to achieve similar objectives and collected a conversational dataset between HeartBot and participants. The detailed study design, including HeartBot’s development mechanism and algorithmic structure, is published elsewhere []. This project presents a valuable opportunity for a comparative secondary analysis, and this paper focuses specifically on examining the outcomes of the 2 studies.
The aim of this secondary data analysis is to evaluate and explore the potential efficacy of the 2 heart attack education interventions (SMS text messaging intervention delivered by a human research interventionist vs an AI chatbot [hereafter HeartBot]) in community-dwelling women without a history of heart disease. The primary outcome is participants’ knowledge and awareness of symptoms and response to a heart attack. In addition, we examined differences in participants’ evaluations of user experience and conversational quality across the 2 formats by assessing message effectiveness, message humanness, naturalness, coherence, and conversational metrics. Our study is among the first to provide a detailed understanding and multidimensional comparison of human-delivered and automated AI chatbot interventions in the context of heart attack education. These findings contribute new insights into the relative strengths of human- and AI-driven health communication, offering practical guidance for designing more effective education and behavior change programs.
Methods
Study Design and Sample
This was a secondary analysis on 2 datasets collected from the AI Chatbot Development Project conducted from September 2022 to January 2024 []. The aims of the AI Chatbot Development Project are to conduct a series of studies to develop a fully automated AI chatbot to increase knowledge and awareness of heart attack in women in the United States. After convening a multidisciplinary team, we developed a knowledge bank using the clinical guidelines, published papers, and American Heart Association’s “Go Red for Women” materials [] to develop the content of the conversation. Then, we conducted a Wizard of Oz experiment with the Human dataset cohort, where participants interacted with a system they believed to be autonomous but was operated by a research interventionist [], to test the content and aid in the development of a text-based HeartBot with natural language capabilities. The research interventionist, who was a master-prepared, experienced cardiovascular nurse, served as the research interventionist to interact with the participants through SMS text messaging (phase 1: Human dataset).
After the first study (phase 1), we developed a fully automated AI chatbot, the HeartBot, to deliver the intervention through SMS text messaging (phase 2: HeartBot dataset). The detailed design of the project, including the protocol, participant eligibility criteria, and description of the HeartBot platform, was published elsewhere [].
The eligibility criteria for both studies were women (1) being aged 25 years or older, (2) living in the United States, (3) having access to the internet to complete the online survey and a cell phone with SMS text messaging capabilities, (4) having no history of cognitive impairment or history of heart disease or stroke, and (5) who were not health care professionals or students. The eligibility criteria were consistent throughout the 2 studies. Participants in both studies were mainly recruited from Facebook (Meta) and Instagram (Meta) from September 2022 to January 2023 and from October 2023 to January 2024, respectively.
Procedure and Interventions
For the Human dataset (phase 1), participants who were interested in the study were recruited online and underwent screening to confirm eligibility. Eligible participants provided written informed consent prior to enrollment and completed a baseline survey online. Then, participants engaged in 2 online conversation sessions over the course of 2 days over a week with a research interventionist, with each session covering educational content related to heart attack symptoms and response. presents the content of heart attack topics used in both studies. After having a text conversation, participants completed a post online survey to measure knowledge and awareness of symptoms and response to a heart attack, message effectiveness, message humanness, conversation naturalness and coherence, and perception of chatbot identity. Participants were provided with a $40 Amazon e-gift card upon completion of all study procedures.
We conducted a follow-up phase, developing and evaluating the text-based AI chatbot called HeartBot. A comprehensive description of the HeartBot was published previously []. In short, HeartBot was designed as a rule-guided, SMS-based conversational agent that delivers pre-authored educational messages in a structured format. We implemented it on the Google Dialoflow CX platform and linked it to Twilio [] for text messaging conversation based on the intents and entities paradigm []. HeartBot identified the general intent of each incoming message and responded with an appropriate, scripted reply. Although HeartBot can recognize a range of user inputs, its responses are intentionally constrained to maintain accuracy and consistency in delivering heart disease education. HeartBot engaged in 1 conversation session with the participants. We decided to condense the conversational messaging to only 1 session to reduce the chance of participant attrition and make sure participants can receive all educational information within 1 interaction. In contrast to the first phase of the project (Human dataset), 3 topics (how angina happens, medicines for heart attack, and operational procedures for treating heart attack) were dropped in the second phase (HeartBot dataset), and 2 quiz questions were included at the end of the conversation to assess participants’ retention of key knowledge outcomes. Participants then completed the post online survey and received a $20 Amazon e-gift card. Both studies used the same questionnaires for both the baseline online survey and the post online survey to measure knowledge and awareness of symptoms and response to a heart attack, hosted on a secure online tool called Research Electronic Data Capture [].
Ethical Considerations
The first and second studies (phases 1 and 2) were conducted in accordance with the ethical standards outlined in the Declaration of Helsinki. Institutional Review Board approvals were obtained from the University of California, Los Angeles (approval number: 23-000878), for the first study and from University of California, San Francisco (approval number: 23-29793), for the second study. For both studies, all participants provided written informed consent prior to study enrollment. Participation was voluntary, and participants were informed that they could withdraw at any time without penalty. All collected data were deidentified prior to analysis, and no personally identifiable information was retained. Data were stored on secure, password-protected servers accessible only to the research team. As part of the compensation, participants in the first and second studies who completed all study requirements received a $40 e-gift and a $20 e-gift card, respectively.
Measures
Primary Outcomes: Knowledge and Awareness of Symptoms and Response to Heart Attack
To assess the potential efficacy of a conversational intervention to increase the knowledge and awareness of symptoms and response to a heart attack, we adapted a previously validated scale [,]. These items have also been used in prior research involving women from diverse backgrounds to ensure broad applicability [-]. Participants were asked the following 4 questions on a scale of 1-4 where 1 indicated “not sure” and 4 indicated “sure”: (1) “How sure are you that you could recognize the signs and symptoms of a heart attack in yourself?,” (2) “How sure are you that you could tell the difference between the signs or symptoms of a heart attack and other medical problems?,” (3) “How sure are you that you could call an ambulance or dial 911 if you thought you were having a heart attack?,” and (4) “How sure are you that you could get to an emergency room within 60 minutes after onset of your symptoms of a heart attack?” The same questions were asked before and after the interaction with the research interventionist and HeartBot. A higher score indicates better knowledge and awareness of symptoms and response to a heart attack.
Other Measures
Overview
We used the AI Chatbot Behavior Change Model [] to assess user experience and conversational quality as key dimensions of effective chatbot communication. Message effectiveness and perceived message humanness were assessed to capture how participants interpreted and responded to the HearBot’s messages. These key measures were selected to better understand how participants evaluated the interaction and how specific communication features may have influenced their experience.
User Experience: Message Effectiveness
Based on the AI chatbot Behavior Change Model [], message effectiveness is conceptualized as an aspect of the broader category of “user experiences,” which measures the level of usefulness and convenience in chatbot conversations. Participants completed a post-survey measure known as the Effectiveness Scale, a semantic-differential scale originally developed based on prior research [,]. The scale consists of 5 items, including bipolar adjective pairs (effective vs ineffective, helpful vs unhelpful, beneficial vs not beneficial, adequate vs not adequate, and supportive vs not supportive). Each item was rated on a 7-point Likert scale, 1 being the negative pole (eg, “ineffective”) and 7 being the positive pole (eg, “effective”). The scores for each item were summed and averaged to create a mean composite score, with higher scores indicating greater perceived effectiveness of the messages.
Conversational Quality
Message Humanness
The humanness of chatbot messages in the AI chatbot Behavior Change Model [] is conceptualized as a part of “conversational quality,” a construct that reflects the perceived human-likeness and naturalness of chatbot interactions. To evaluate participants’ impressions of the messages sent by the research interventionist and HeartBot, participants completed the Anthropomorphism Scale [] during the post-survey. The scale includes 5 pairs of bipolar adjectives (natural vs fake, humanlike vs machine-like, conscious vs unconscious, lifelike vs artificial, and adaptive vs rigid). Participants rated each pair on a 7-point Likert scale, where 1 indicated the first adjective in the pair (eg, “natural”) and 7 indicated the second adjective (eg, “fake”). The scores for each item were summed and averaged to create a mean composite score. The higher scores indicate a greater perception of chatbot messages as more mechanical or artificial.
Conversational Naturalness and Coherence
Conversational quality can be assessed by participants’ subjective evaluation of the conversation’s naturalness and coherence []. To evaluate conversational quality, participants were asked to answer the following question in the post-survey: “Overall, how would you rate the conversations with your texting partner?” The response options are as follows: (1) Very unnatural, (2) Unnatural, (3) Neutral, (4) Natural, and (5) Very natural. Participants were also asked to answer the following question in the post-survey: “Overall, how would you rate the messages you received?” The response options are as follows: (1) Very incoherent, (2) Incoherent, (3) Neutral, (4) Coherent, and (5) Very coherent.
Conversational Metrics
Objective content and linguistic analyses of conversations can be used to evaluate specific dimensions of conversations, such as the length of conversations and the amount of information exchanged []. To measure these dimensions, the Linguistic Inquiry and Word Count (LIWC-22; Pennebaker Conglomerates) software [] was used to process and quantify the total word count of a conversation between the participant and the research interventionist or HeartBot. The number of words used by each agent (participant, research interventionist, and HeartBot) was separately measured to process individual contributions within each conversation.
Perception of Chatbot Identity (Human vs AI Chatbot)
At the end of the intervention, we asked the question: “Do you think you texted a human or an artificial intelligent chatbot during your conversation?” Participants were asked to select either of the 2 response options, which were dichotomous: (1) human or (2) artificial agent.
Sociodemographic, Past Chatbot Use, and Cardiovascular Risks
Self-reported sociodemographic information (ie, age, race or ethnicity, education, household income, marital status, and employment status) and cardiovascular risks (ie, smoking history, prescribed blood pressure, cholesterol, and diabetes medication intake, and family history of heart disease) were collected in the baseline survey online. The cardiovascular risk factor variables were selected based on the latest clinical guidelines []. In addition, the question “Have you used any chatbot in the past 30 days?” was used to assess past AI chatbot use experience. The participants were asked to select either Yes or No.
Statistical Analysis
We conducted a descriptive analysis to calculate counts and percentages, or means and SD for sociodemographic characteristics, past chatbot use, and cardiovascular risks. To compare the 2 datasets, we performed independent t tests to assess mean differences for continuous variables and used χ2 tests to examine group distributions. We first conducted Wilcoxon signed-rank tests to evaluate for statistically significant changes in heart attack knowledge and awareness outcome responses (not sure, somewhat not sure, somewhat sure, and sure) between the baseline and the post-interaction, within the Human dataset (phase 1) and HeartBot dataset (phase 2). Then, to adjust for potential confounders, we fit a series of ordinal mixed-effects models using the R (version 4.1.0; The R Foundation for Statistical Computing) [] package ordinal v2022.11.16 [], for each of the 4 knowledge questions as outcomes. We first fit these models stratified by Human dataset (phase 1) and HeartBot dataset (phase 2), and adjusting for fixed effects of post (vs pre; the primary coefficient of interest for these models, indicating whether each of the 2 interventions was successful), White (vs non-White), age, interaction group type, education, number of words used by the participants, mean text message effectiveness and humanness of scores, and a random effect for individual. We then fit a model on the entire dataset additionally adjusting for HeartBot (vs Human), and the interaction between HeartBot and post timepoint (ie, whether HeartBot is more effective than human; the primary coefficient of interest for this model). As an attempted sensitivity analysis, we tried to fit a mixed effects multinomial logistic regression model in Stata (version 16.1; StataCorp LLC) [] via the generalized structural equations command, but the models would not converge (likely owing to the small sample size and increased number of parameters to estimate compared to an ordinal logistic regression model). A 2-sided test was used with significance set at P<.05.
Results
Sample Characteristics
shows screening, enrollment, and follow-up of the study participants. A total of 171 participants in the Human dataset (phase 1) and 92 participants in the HeartBot dataset (phase 2) completed the study. presents the baseline sample characteristics for the 2 datasets. The mean age (SD) of participants was 41.06 (12.08) years in phase 1 and 45.85 (11.94) years in phase 2. In the Human dataset (phase 1), participants were primarily Black/African American (n=70, 40.9%), college graduates (n=103, 60.3%), and earning moderate-to-high income (n=68, 39.8%). Participants in the HeartBot dataset (phase 2) were primarily White (n=37, 40.2%), college graduates (n=66, 71.7%), and earning moderate-to-high income (n=39, 42.4%). A majority of participants in the Human dataset (phase 1) reported having experience in using chatbot (n=96, 56.1%) as did participants in the HeartBot dataset (phase 2; n=53, 57.6%).
Changes in Knowledge and Awareness of Heart Disease
presents the results of Wilcoxon signed-rank tests examining pre- to post-changes in 4 knowledge and awareness of heart disease outcomes. Supplementary Tables S1-S3 (in ) present the full ordinal logistic regression models: Table S1 for the human-delivered conversations, Table S2 for HeartBot conversations, and Table S3 for the combined data. Overall, Wilcoxon signed-rank tests revealed a significant increase in knowledge and awareness of heart disease across all 4 outcome measures following interactions with both research interventionist and HeartBot (human-delivered conversations: all P<0.001; HeartBot conversations: P<.001 for Q1-Q3 and P=.002 for Q4).
shows the adjusted odds ratios (AORs) from a series of ordinal logistic regression analyses for predicting each knowledge question for the Human dataset (phase 1). In the Human dataset (phase 1), after controlling for age, ethnicity, education, message effectiveness, message humanness, and chatbot use history, the human-delivered conversations improved participants’ knowledge and awareness in recognizing the signs and symptoms of a heart attack response (AOR 15.19, 95% CI 8.46‐27.25, P<.001), telling the difference between the signs or symptoms of a heart attack response (AOR 9.44, 95% CI 5.60‐15.91, P<.001), calling an ambulance or dialing 911 during a heart attack response (AOR 6.87, 95% CI 4.09‐11.55, P<.001), and getting to an emergency room within 60 minutes after onset of symptoms response (AOR 8.68, 95% CI 4.98‐15.15, P<.001). In the HeartBot dataset (phase 2), these effects were generally reduced but still substantially improved (see ; full model in ), for example, in recognizing the signs and symptoms questions (AOR 7.18, 95% CI 3.59-14.36, P<.001). A formal interaction test showed a statistically significant improvement of Human versus HeartBot dataset for all but the third question (calling an ambulance; P=.09) as shown in (Table S3 in ). We could not adjust for word count, as all human-delivered conversations in the Human dataset (phase 1) were longer than any of the HeartBot conversations in the HeartBot dataset (phase 2), and so the model would not fit; thus, we could not really differentiate the intervention effect from the word count.
Human-Delivered Conversation Versus HeartBot Conversation
presents the comparison of the evaluation of conversation quality between the 2 studies. In the Human dataset (phase 1), participants interacted with the research interventionist and completed conversation sessions over the course of 2 days. The mean (SD) and median number of words used by the participants and their conversing agent overall were 2322.00 (875.65) and 2097.00 words in the Human dataset (phase 1), and 888.04 (76.04) and 852 words in the HeartBot dataset (phase 2). Participants in the Human dataset (phase 1) ranked all conversational qualities, which include message effectiveness, message humanness, conversation naturalness, and coherence, significantly higher than those in the HeartBot dataset (phase 2). About 74.3% (127/171) and 66.3% (61/92) of the participants in the Human and HeartBot datasets in both groups correctly identified when they were conversing with a human or HeartBot, respectively.
Discussion
Principal Results
We compared the potential efficacy of human-delivered conversations versus HeartBot conversations in increasing participants’ knowledge and awareness of symptoms and the appropriate response to a heart attack in the United States, while controlling for potential confounding factors. Since this study was not an RCT, the efficacy of the HeartBot intervention, compared to the SMS text messaging intervention delivered by a research interventionist, cannot be established. Caution needs to be exercised when interpreting the findings. The findings suggest that interacting with both the research interventionist and HeartBot was associated with increased knowledge and awareness of a heart attack among participants (ie, recognizing signs and symptoms of a heart attack, telling the difference between the signs or symptoms of a heart attack and other medical problems, calling an ambulance or dialing 911 when experiencing heart attack, getting to an emergency room within 60 minutes after onset of symptoms of a heart attack). However, human-delivered conversations appeared to have a stronger association than HeartBot conversations for all except for the question regarding calling an ambulance (P=.09). This may be due to the fact that calling emergency services is a well-known emergency response behavior, which may not require adaptive or relational communication to be effectively understood. Yet, this does not suggest that HeartBot was ineffective. Interacting with HeartBot still led to significant improvements in increasing knowledge and awareness of a heart attack. Given its automated nature and lower cost, we view HeartBot as a promising and useful alternative, particularly in contexts where human resources are limited.
Several potential explanations can be considered due to the fundamental structural differences in the content and duration of the conversation sessions between the 2 studies. First, human-delivered conversations involved a more extended engagement process, comprising 2 separate sessions over a week, allowing participants to engage in a more prolonged and reflective learning process. In contrast, the HeartBot conversation was limited to a single session, which may have constrained the depth of discussion. Second, participants in the Human dataset (phase 1) produced significantly more words during the conversation, with a mean (SD) word count of 298.94 (227.90), compared to 80.57 (60.19) in the HeartBot dataset (phase 2). The greater verbosity in the Human dataset (phase 1) may have contributed to deeper discussions and enhanced knowledge reinforcement, potentially explaining the observed increase in efficacy. However, we were not able to statistically account for word count, as models adjusting for the covariate would not converge, likely owing to having very different distributions of word counts with little overlap in the 2 groups (humans a mean [SD] of 2322.00 [875.65] words, HeartBot a mean [SD] of 888.04 [76.04] words). Finally, human-delivered conversations were facilitated by a research interventionist, who is a master-prepared, cardiovascular nurse, allowing for greater flexibility in language use, response adaptation, and addressing participant queries in a more personalized manner. In contrast, HeartBot had the inherent limitation in its conversational algorithm, which appears less personalized and less flexible, following a structured script, limiting its ability to adjust dynamically to participants’ specific concerns.
HeartBot, a fully automated AI chatbot, was significantly associated with increased participants’ knowledge and awareness of symptoms and response to a heart attack and demonstrates significant potential as an innovative AI intervention. AI chatbots offer a scalable, 24/7 accessible, and personalized approach to health education for broader populations. AI chatbots’ adaptive algorithms allow for dynamic personalization, tailoring responses to individual user queries and comprehension levels, which may enhance engagement and knowledge retention beyond one-size-fits-all campaigns. Additionally, chatbot interactions require active engagement as participants read, process, and respond to information, reinforcing learning through interaction rather than passive intake []. The anonymized nature of chatbot conversations can also reduce psychological barriers, encouraging users to seek information more openly, especially on sensitive health topics []. Finally, HeartBot integrates structured quiz components, encouraging reinforcement of learning through immediate self-assessment and cognitive recall.
While these advantages highlight AI chatbots’ potential, findings from this study suggest room for improvement to further enhance their efficacy. First, increasing the number of interaction sessions—rather than a single 1-time interaction—may allow for more sustained engagement and deeper knowledge retention, aligning more closely with the multi-session format of human-delivered conversations. Second, further iterations could leverage machine learning algorithms to continuously refine conversation models and improve HeartBot’s flexibility in answering participants’ queries, which could make interaction with HeartBot feel more responsive and personalized. Lastly, to fully evaluate HeartBot’s long-term efficacy and potential parity with human-delivered conversations, a rigorously designed RCT would be instrumental. While this study provides promising preliminary insights, causal relationships cannot be established. Future research should prioritize RCTs to confirm these findings and support evidence-based deployment of such interventions.
Interestingly, user experience and conversational quality were perceived to be high across both studies, as participants generally rated the message as effective, humanlike, coherent, and natural. However, these perceptions were significantly higher in the Human dataset (phase 1). This may be due to participants subconsciously detecting cues that felt more human. Although the identity of the conversing partner was not disclosed, a substantial portion of participants misperceived whether they were interacting with a human or an AI chatbot. While the perception of partner identity was not a primary focus of this study, these misattributions nonetheless provide insight into how users process conversational agency. They highlight the inherent ambiguity in conversational agency and may reflect the challenge of replicating human communication subtleties in algorithmic interactions. While HeartBot demonstrated considerable communicative competence, it encountered limitations in fully imitating the nuanced relational aspects of human dialog. Drawing from the Computers Are Social Actors paradigm [], participants apply social interaction schemas to technological interfaces yet experience these interactions with less emotional depth and relational intimacy. Key communication studies have consistently highlighted the critical role of relational cues in establishing trust and engagement and promoting human-chatbot relationships. For example, research has shown that conversational agents can build positive relationships in health and well-being settings through verbal behaviors like humor [], social dialog [], and empathy []. Although HeartBot successfully delivered equivalent factual content, it inherently struggled to reproduce the affective dimensions that characterize human-to-human communication. These findings suggest that while AI chatbots provide a promising technological intervention, they must continue to evolve in their ability to simulate the nuanced relational components of effective human health communication.
Limitations and Suggestions for Future Studies
Several limitations of this study need to be acknowledged. Without a true RCT, the causal inferences regarding the 2 interventions cannot be determined, and the findings provide only exploratory comparative insights due to the following reasons. The 2 datasets were not collected under a single randomized protocol. Participants were not randomly assigned, making the study vulnerable to selection bias and unmeasured confounders. In particular, human-delivered conversations were much longer (~2322 words) than HeartBot (~888 words). Statistical adjustment was not possible due to nonoverlapping distributions. This is a major confounder that prevents clear attribution of effects to delivery mode versus conversation length. In other words, the differences in exposure length make it impossible to disentangle “agent effect” (human vs HeartBot) from “dose effect” (amount of content). The interventions also differed not only in delivery agent but also in structure: (1) the human-delivered arm included 2 sessions, while the chatbot was a single session; (2) some topics were omitted in the HeartBot group; and (3) incentives differed ($40 vs $20). An RCT addressing these limitations is warranted to validate this study’s findings.
Another limitation is related to the study measures and the timing of the measures. The outcome assessment relied on subjective Likert scale responses, which may be influenced by recall or social desirability bias. Furthermore, the outcomes were assessed between 4 and 6 weeks after the intervention. Thus, the study only captures short-term awareness and knowledge gains rather than sustained retention or behavior change. Future studies need to include objective or performance-based measures (eg, quizzes, simulated scenarios) to complement self-reports, longitudinal follow-up (ie, 2-24 mo) to assess retention, and whether increased awareness and knowledge translate into real-world emergency response behaviors. Additionally, the multinomial mixed effects logistic regression model would not converge. This is a known problem with these models due to a combination of small cell counts in specific outcome categories and the high-dimensional nature of the random effects in the model. However, our more parsimonious ordinal mixed effects logistic regression model converged and appeared to fit the data well.
The last limitation is related to the generalizability of the finding. The current recruitment strategy relied on social media (Facebook or Instagram) and self-selected women who were comfortable with technology. This may skew the sample toward digitally literate participants and limit generalizability to more diverse or higher-risk groups. Thus, future studies should include purposive recruitment strategies targeting underrepresented groups (ie, older women, nondigital populations, and those with lower health literacy).
Conclusions
The study’s findings provide new insights into the fully automated AI HeartBot, compared to the human-driven text message conversation, and suggest that it has potential in improving women’s knowledge and awareness of heart attack symptoms and appropriate response behaviors. Nevertheless, the current evidence remains preliminary. To rigorously establish the efficacy of the HeartBot intervention, future research should employ RCT designs with the capacity to reach broad and diverse populations.
The project was supported by the Noyce Foundation and the UCSF School of Nursing Emile Hansen Gaine Fund. The project sponsors had no role in the study design, collection, analysis, or interpretation of data, writing the report, or deciding to submit the report for publication.
The datasets generated or analyzed during this study are available from the corresponding author on reasonable request.
None declared.
Edited by Javad Sarvestan; submitted 26.Feb.2025; peer-reviewed by Chidinma Chikwe, Neeladri Misra, Reenu Singh; final revised version received 22.Sep.2025; accepted 22.Sep.2025; published 17.Oct.2025.
© Diane Dagyong Kim, Jingwen Zhang, Kenji Sagae, Holli A DeVon, Thomas J Hoffmann, Lauren Rountree, Yoshimi Fukuoka. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 17.Oct.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.