The initial search yielded a total of 150 studies across four databases. After removing duplicates, 120 unique records were screened based on titles and abstracts. Of these, 30 full-text articles were assessed for eligibility. Following a full-text review, 14 studies were included in the final synthesis from PubMed, Scopus, Web of Science, and Google Scholar (Fig. 1). The remaining studies were excluded for reasons including lack of educational context, absence of AI-related content, or unavailability of full text. A summary of the study selection process is presented in the PRISMA flow diagram (Fig. 1).
PRISMA flow diagram of study selection
Study characteristics
The narrative review includes 14 studies covering diverse geographic locations including the United States, India, Australia, Germany, Saudi Arabia, and multi-national collaborations. The selected studies employed a mix of methodologies: scoping reviews, narrative reviews, cross-sectional surveys, educational interventions, integrative reviews, and qualitative case studies. The target population across studies ranged from undergraduate medical students and postgraduate trainees to faculty members and in-service professionals.
Artificial Intelligence (AI) was applied across various educational contexts, including admissions, diagnostics, teaching and assessment, clinical decision-making, and curriculum development. Several studies focused on stakeholder perceptions, ethical implications, and the need for standardized curricular frameworks. Notably, interventions such as the Four-Week Modular AI Elective [13] and the Four-Dimensional AI Literacy Framework [12] were evaluated for their impact on learner outcomes.
Table 1 provides a comprehensive summary of each study, outlining country/region, study type, education level targeted, AI application domain, frameworks or interventions used, major outcomes, barriers to implementation, and ethical concerns addressed.
Risk of bias assessment
A comprehensive risk of bias assessment was conducted using appropriate tools tailored to each study design. For systematic and scoping reviews (e.g., Gordon et al. [2], Khalifa & Albadawy [8], Crotty et al. [11]), the AMSTAR 2 tool was applied, revealing a moderate risk of bias, primarily due to the lack of formal appraisal of included studies and incomplete reporting on funding sources. Observational studies such as that by Parsaiyan & Mansouri [9] were assessed using the Newcastle-Ottawa Scale (NOS) and showed a low risk of bias, with clear selection methods and outcome assessment. For cross-sectional survey designs (e.g., Narayanan et al. [10], Ma et al. [12], Wood et al. [14], Salih [20]), the AXIS tool was used. These showed low to moderate risk depending on sampling clarity, non-response bias, and data reporting. Qualitative and mixed-methods studies such as those by Krive et al. [13] and Weidener & Fischer [15] were appraised using a combination of the CASP checklist and NOS, showing overall low to moderate risk, particularly for their methodological rigor and triangulation. One study [19], which employed a quasi-experimental design, was evaluated using ROBINS-I and was found to have a moderate risk of bias, primarily due to concerns about confounding and deviations from intended interventions. Lastly, narrative reviews like Mondal & Mondal [17] were categorized as high risk due to their lack of systematic methodology and critical appraisal Table 2.
Characteristics of included studies
A total of 14 studies were included in this systematic review, published between 2019 and 2024. These comprised a range of study designs: 5 systematic or scoping reviews, 4 cross-sectional survey studies, 2 mixed-methods or qualitative studies, 1 quasi-experimental study, 1 narrative review, and 1 conceptual framework development paper. The majority of the studies were conducted in high-income countries, particularly the United States, United Kingdom, and Canada, while others included contributions from Asia and Europe, highlighting a growing global interest in the integration of artificial intelligence (AI) in medical education.
The key themes addressed across these studies included: the use of AI for enhancing clinical reasoning and decision-making skills, curriculum integration of AI tools, attitudes and readiness of faculty and students, AI-based educational interventions and simulations, and ethical and regulatory considerations in AI-driven learning. Sample sizes in survey-based studies ranged from fewer than 100 to over 1,000 participants, representing diverse medical student populations and teaching faculty.
All included studies explored the potential of AI to transform undergraduate and postgraduate medical education through improved personalization, automation of feedback, and development of clinical competencies. However, variability in methodology, focus, and outcome reporting was observed, reinforcing the importance of structured synthesis and cautious interpretation.
-
A.
Applications of AI in Medical Education
AI serves multiple educational functions. Gordon et al. identified its use in admissions, diagnostics, assessments, clinical simulations, and predictive analytics [2]. Khalifa and Albadawy reported improvements in diagnostic imaging accuracy and workflow efficiency [8]. Narrative reviews by Parsaiyan et al. [9] and Narayanan et al. [10] highlighted AI’s impact on virtual simulations, personalized learning, and competency-based education.
-
B.
Curricular innovations and interventions
Several studies introduced innovative curricular designs. Crotty et al. advocated for a modular curriculum incorporating machine learning, ethics, and governance [11], while Ma et al. proposed a Four-Dimensional Framework to cultivate AI literacy [12]. Krive et al. [13] reported significant learning gains through a four-week elective, emphasizing the value of early, practical exposure.
Studies evaluating AI-focused educational interventions primarily reported improvements in knowledge acquisition, diagnostic reasoning, and ethical awareness. For instance, Krive et al. [13] documented substantial gains in students’ ability to apply AI in clinical settings, with average quiz and assignment scores of 97% and 89%, respectively. Ma et al. highlighted enhanced conceptual understanding through their framework, though outcomes were primarily self-reported [12]. However, few studies included objective or longitudinal assessments of educational impact. None evaluated whether improvements were sustained over time or translated into clinical behavior or patient care. This reveals a critical gap and underscores the need for robust, multi-phase evaluation of AI education interventions.
-
C.
Stakeholder perceptions
Both students and faculty showed interest and concern about AI integration. Wood et al. [14] and Weidener and Fischer [15] noted a scarcity of formal training opportunities, despite growing awareness of AI’s importance. Ethical dilemmas, fears of job displacement, and insufficient preparation emerged as key concerns.
-
D.
Ethical and regulatory challenges
Critical ethical issues were raised by Mennella et al. [16] and Mondal and Mondal [17], focusing on data privacy, transparency, and patient autonomy. Multiple studies called for international regulatory standards and the embedding of AI ethics within core curricula.
While several reviewed studies acknowledged the importance of ethical training in AI, the discussion of ethics often remained surface-level. A more critical lens reveals deeper tensions that must be addressed in AI-integrated medical education. One such tension lies between technological innovation and equity AI tools, if not designed and deployed with care, risk widening disparities by favoring data-rich, high-resource settings while neglecting underrepresented populations. Moreover, AI’s potential to entrench existing biases—due to skewed training datasets or uncritical deployment of algorithms—poses a threat to fair and inclusive healthcare delivery.
Another pressing concern is algorithmic opacity. As future physicians are expected to work alongside AI systems in high-stakes clinical decisions, the inability to fully understand or challenge these systems’ inner workings raises accountability dilemmas and undermines trust. Educational interventions must therefore go beyond theoretical awareness and cultivate critical engagement with the socio-technical dimensions of AI, emphasizing ethical reasoning, bias recognition, and equity-oriented decision-making.
-
E.
Barriers to implementation
Implementation hurdles included limited empirical evidence [18], infrastructural constraints [19], context-specific applicability challenges [20], and an over-reliance on conceptual frameworks [10]. The lack of unified teaching models and outcome-based assessments remains a significant obstacle.
These findings informed the creation of a conceptual framework for integrating artificial intelligence into medical education, depicted in Fig. 1. A cross-theme synthesis revealed that while AI integration strategies were broadly similar across countries, their implementation success varied significantly by geographic and economic context. High-income countries (e.g., USA, Australia, Germany) demonstrated more comprehensive curricular pilots, infrastructure support, and faculty readiness, whereas studies from LMICs (e.g., India, Saudi Arabia) emphasized conceptual interest but lacked institutional capacity and access to AI technologies. Contextual barriers such as resource limitations, cultural sensitivity, and institutional inertia appeared more pronounced in LMIC settings, influencing the feasibility and depth of AI adoption in medical education.
Based on the five synthesized themes, we developed a Comprehensive Framework for the Strategic Integration of AI in Medical Education (Fig. 2). This model incorporates components such as foundational AI literacy, ethical preparedness, faculty development, curriculum redesign, and contextual adaptability. It builds on and extends existing models such as the FACETS framework, the Technology Acceptance Model (TAM), and the Diffusion of Innovation theory. Unlike FACETS, which primarily categorizes existing studies, our framework is action-oriented and aligned with Kern’s curriculum development process, making it suitable for practical implementation. Compared to TAM and Diffusion of Innovation, which focus on user behavior and adoption dynamics, our model integrates educational design elements with implementation feasibility across diverse economic and institutional settings.

A comprehensive framework for the strategic integration of artificial intelligence in medical education
Table 3 shows a comparative synthesis of included studies evaluating AI integration in medical and health professions education using Kern’s six-step curriculum development framework. The analysis reveals that most studies effectively identify the need for AI literacy (Step 1) and conduct some form of needs assessment (Step 2), often through surveys, literature reviews, or scoping exercises. However, only a subset of studies explicitly define measurable educational goals and objectives (Step 3), and even fewer describe detailed instructional strategies (Step 4) or implement their proposed curricula (Step 5). Evaluation and feedback mechanisms (Step 6) were rarely reported, and when included, they were typically limited to short-term student feedback or pre-post knowledge assessments. Longitudinal evaluations and outcome-based assessments remain largely absent. The findings underscore a critical implementation gap and emphasize the need for structured, theory-informed, and empirically evaluated AI education models tailored to medical and allied health curricula.
This conceptual model is informed by thematic synthesis and integrates principles from existing frameworks (FACETS, TAM, Diffusion of Innovation) while aligning with Kern’s six-step approach for curriculum design.