TOKYO, Nov 28 (Reuters) – Securing energy from overseas, including from the Sakhalin Project, is extremely important for Japan’s energy security, Japan’s industry ministry said late on Thursday when asked about U.S. sanctions on a key shareholder in the Sakhalin-1 project.
Last month, Washington sanctioned Russian oil majors Rosneft (ROSN.MM), opens new tab, a Sakhalin-1 shareholder, and Lukoil (LKOH.MM), opens new tab in the most recent step to force the Kremlin to end the war in Ukraine. The waiver to end, opens new tab operations expired on November 21.
Sign up here.
“Japan government continues to recognize that securing energy from overseas, including the Sakhalin Project, is extremely important for Japan’s energy security,” the Ministry of Economy, Trade and Industry said in a statement to Reuters.
“We will take necessary measures to ensure that Japan’s stable energy supply is not compromised,” the statement added but declined to comment specifically on the sanctions’ impact on the project where METI is a shareholder.
U.S. ExxonMobil (XOM.N), opens new tab, which used to own a 30% stake in Sakhalin-1, left Russia in 2022 after the Kremlin’s full-scale invasion of Ukraine in February of that year.
Before Exxon’s exit, Rosneft and India’s ONGC Videsh (ONVI.NS), opens new tab owned a 20% stake in the project each and another 30% was controlled by the SODECO consortium involving METI, Marubeni (8002.T), opens new tab, Itochu (8001.T), opens new tab, Japan Petroleum Exploration (1662.T), opens new tab and Inpex (1605.T), opens new tab.
Reporting by Katya Golubkova; editing by Diane Craft
Our Standards: The Thomson Reuters Trust Principles., opens new tab
V2G in Europe: Reducing Costs and Unlocking New Value
In Europe, the Group is expanding its customer-centric energy solutions by introducing a commercialized V2G service in the Netherlands. As the first OEM to launch a customer-focused V2G service, this initiative builds on the Smart Charging (V1G) service introduced earlier this year. Customer recruitment for the V2G service will begin at the end of 2025.
The V2G service leverages bidirectional charging technology and chargers compatible with Hyundai and Kia vehicles. Customers subscribing to a tariff plan from the Group’s utility partners can benefit from automated V2G scheduling, which optimizes charging during low-rate periods and enables the sale of surplus energy back to the grid during peak-price times. This not only reduces electricity expenses for customers but also unlocks new value by actively participating in energy trading.
This initiative also underscores the Group’s contributions at the national level. In the Netherlands, where electricity prices are high and the power system is increasingly variable, the V2G service enhances Hyundai Motor and Kia EV accessibility while aiding in the stabilization of the electricity grid. Moreover, it plays a pivotal role in facilitating the expansion of renewable energy across the country by supporting grid flexibility and reliability.
Initially the service will be available for Kia EV9 and Hyundai IONIQ 9, with plans to expand coverage to other EV models. The Group also aims to roll out the V2G service to other European countries, further advancing customer convenience and supporting the region’s transition toward smarter energy systems.
V2H Service in the U.S.: Enhancing Energy Security and Savings
In the U.S., the Group will launch its Vehicle-to-Home (V2H) services in the near term, enabling EVs to provide energy solutions during natural disasters such as large wildfires, routine power outages, or peak-demand periods. The V2H service utilizes EV power as an emergency power source for homes during these critical times, further enhancing household energy resilience.
Kia’s V2H service — launched earlier this year in February 2025 — allows EV9 owners to use their vehicles as reliable household backup power sources. Hyundai Motor will introduce V2H functionality starting with IONIQ 9, while Kia expands its offering to include EV61).
The service enables EV owners to store electricity in their vehicle’s battery during off-peak hours and discharge back into their homes during peak periods, potentially reducing household electricity costs and enhancing energy resilience.
The Group is accelerating its V2X strategy, connecting EVs, energy systems and society in a unified ecosystem. These initiatives are key to transforming the customer ownership experience while promoting efficient, renewable energy use across major markets.
If you’re wondering whether BWG is trading at a bargain or premium right now, you’re in the right place. Let’s put its recent performance and valuation under the microscope.
BWG’s share price has been on a ride, dipping 4.9% in the past week and 2.3% over the last month. It is still up 155.3% over the past three years.
Investors are watching closely as recent sector acquisitions and shifts in regulatory sentiment have added fresh fuel to market expectations, raising both hopes and questions. These news headlines have clearly influenced recent price moves, indicating that the BWG story is far from settled.
BWG currently holds a valuation score of 5 out of 6 on our six-point checklist, suggesting it is undervalued in most key areas. Before drawing conclusions, stay with us as we break down the major valuation approaches and explore a more informed way to identify real value.
BWG delivered 2.5% returns over the last year. See how this stacks up to the rest of the Oil and Gas industry.
The Discounted Cash Flow (DCF) model is used to estimate the fair value of a business by projecting its expected future cash flows and then discounting them back to today’s value. This approach helps investors understand whether a stock’s market price reflects its underlying financial potential.
For BWG, the latest reported Free Cash Flow stands at $211 million. Looking ahead, analysts expect Free Cash Flow to reach $536.5 million in 2026 and $363 million in 2027, with further annual projections tapering off to around $186.6 million by 2035 as estimated by Simply Wall St. These cash flows, expressed in dollars, are all projected values before they are discounted to their present value.
Applying a 2 Stage Free Cash Flow to Equity model, the DCF analysis calculates an intrinsic value of $315.93 per share. Based on current market pricing, this implies that BWG is trading at a 60.6% discount to its intrinsic value, indicating the stock is substantially undervalued using this method.
Result: UNDERVALUED
Our Discounted Cash Flow (DCF) analysis suggests BWG is undervalued by 60.6%. Track this in your watchlist or portfolio, or discover 933 more undervalued stocks based on cash flows.
BWLPG Discounted Cash Flow as at Nov 2025
Head to the Valuation section of our Company Report for more details on how we arrive at this Fair Value for BWG.
The Price-to-Earnings (PE) ratio is a widely used valuation tool for profitable companies because it directly relates a company’s share price to its per-share earnings. It allows investors to gauge how much they are paying for a company’s current ability to generate profit, making it a practical measure for established and consistently profitable firms like BWG.
A “normal” or “fair” PE ratio depends largely on growth expectations and the level of risk associated with future earnings. Companies that are expected to grow earnings rapidly or present lower risk typically command higher PE multiples. In contrast, slower growth or greater uncertainty can justify lower ratios.
BWG trades at a PE ratio of 8.56x. By comparison, the Oil and Gas industry average stands at 13.22x, and direct peers average 9.07x. At first glance, BWG’s valuation might seem conservative relative to these benchmarks.
This is where Simply Wall St’s proprietary “Fair Ratio” comes in. Unlike a basic industry or peer comparison, the Fair Ratio (6.06x for BWG) tailors the expected multiple by accounting for the company’s earnings growth, risk profile, profit margins, market capitalization, and industry dynamics. This more nuanced perspective provides a much clearer signal of the stock’s valuation than simply comparing with peers or sector norms.
With BWG’s actual PE (8.56x) sitting above the Fair Ratio (6.06x) by a meaningful margin, the data points toward BWG trading at a premium to its fair value as calculated on a fundamentals-adjusted basis.
Result: OVERVALUED
OB:BWLPG PE Ratio as at Nov 2025
PE ratios tell one story, but what if the real opportunity lies elsewhere? Discover 1442 companies where insiders are betting big on explosive growth.
Earlier we mentioned that there is an even better way to understand valuation, so let’s introduce you to Narratives. A Narrative is your personal investment story. It is how you interpret BWG’s journey by combining your expectations of financial performance, such as revenue, profit margins, and fair value, with your unique perspective on the company’s prospects.
Unlike static ratios or models, Narratives connect a company’s story to a financial forecast and then to a fair value, letting you see how your own outlook translates into a concrete investment signal. Using Simply Wall St’s Community page, millions of investors easily create and update their Narratives, making the process accessible for all experience levels.
Narratives make buy or sell decisions clearer by constantly comparing your Fair Value with the current share price. Because they update automatically with fresh news, regulatory shifts, or earnings releases, your view of BWG adapts in real time. For example, some investors see BWG’s fair value near NOK185.02 based on optimistic demand and operational strength, while others are more cautious, setting fair value at NOK158.03 due to market uncertainties and risk factors.
Do you think there’s more to the story for BWG? Head over to our Community to see what others are saying!
OB:BWLPG Community Fair Values as at Nov 2025
This article by Simply Wall St is general in nature. We provide commentary based on historical data and analyst forecasts only using an unbiased methodology and our articles are not intended to be financial advice. It does not constitute a recommendation to buy or sell any stock, and does not take account of your objectives, or your financial situation. We aim to bring you long-term focused analysis driven by fundamental data. Note that our analysis may not factor in the latest price-sensitive company announcements or qualitative material. Simply Wall St has no position in any stocks mentioned.
Companies discussed in this article include BWLPG.OL.
Have feedback on this article? Concerned about the content? Get in touch with us directly. Alternatively, email editorial-team@simplywallst.com
Traders work on the floor at the New York Stock Exchange (NYSE) in New York City, U.S., Nov. 26, 2025.
Brendan McDermid | Reuters
Stock futures were little changed Thursday night during a holiday shortened week, with the Nasdaq Composite on track to end a seven-month winning streak.
Dow Jones Industrial Average futures rose just 10 points. S&P 500 futures and Nasdaq-100 futures traded just above the flatline.
Stocks are on pace for a losing month when trading resumes on Friday. A pullback in tech stocks have weighed on the major averages in November, as doubt swirled around the future profitability of AI companies.
Yet some investors are hopeful that this month’s slide will mean a year-end rally is in store for the major averages, as they step into buy stocks that have been unduly punished at more attractive valuations.
As of Wednesday’s close, the Dow and the S&P 500 were slightly lower on the week, each set to snap six straight months of gains. The Nasdaq fell 2%, on track to end a seven month advance.
Stocks are on pace to wrap up a winning week, following a turnaround in tech names. As of Wednesday’s close, the Dow was up more than 2%. The S&P 500 and Nasdaq Composite were higher by 3% and 4%, respectively.
The stock market was closed Thursday for Thanksgiving Day. It will close early at 1p.m. ET on Friday.
Samsung Electronics today announced that it has been named a winner in the Upright Cordless Vacuum Cleaner category at the Euroconsumers Awards 2025.
Samsung has proven its expertise in the cordless vacuum cleaner segment since 2019, when it introduced its Jet cordless vacuum lineup to the global market, and has continued unveiling new innovations each year. The award recognizes Samsung’s product excellence across the Jet lineup tested by Euroconsumers, which includes the Jet 75, Jet 85 and Jet 95 models — all providing powerful cleaning performance on various floor types in a lightweight design.
This award reflects Samsung’s strong performance in Euroconsumers’ thorough evaluation process which combines lab tested results on over 3,000 products with large scale consumer surveys across Europe. With outstanding reliability scores, Samsung has been named a winner in the Upright Cordless Vacuum Cleaner category.
“Samsung has a record of innovation in the stick vacuum category, including the launch of the world’s most powerful vacuum cleaner,”1 said Jeong Seung Moon, Executive Vice President and Head of the R&D Team for the Digital Appliances Business at Samsung Electronics. “We will continue to enhance consumer satisfaction through outstanding performance and AI-based convenience.”
The Euroconsumers Awards is organized by Euroconsumers, the world’s leading consumer group that represents national consumer organizations across Belgium, Italy, Portugal, Spain, Poland and Brazil — collectively giving voice to more than six million consumers worldwide. Until last year, Euroconsumers ran the BeXt Awards, which recognized brands across criteria such as Value for Money and Eco-Friendly. The 2025 edition marks a significant shift, as the awards now focus on product-based excellence in three key sectors: Hitech, Large and Small Household Appliances.
This recognition follows Samsung’s previous awards from Euroconsumers, including wins in the Hi-Tech Value for Money and the Eco-Friendly Award in large household appliances at the BeXt Awards 2024.
Continuing its commitment to delivering smart, high-performance home solutions, Samsung will showcase its latest vacuum cleaner innovations at CES 2026 in January.
(Bloomberg) — Asian stocks were set for a muted open Friday as a sharp rebound in global equities over the past week began to stall.
Equity index futures for Japan, Hong Kong and Australia all registered small declines early Friday in Asia. A gauge of global stocks ended Thursday flat, but remained on course for its best week since June as investors cheered signs of Federal Reserve rate cuts. US markets will resume trading following the Thanksgiving holiday Thursday.
In Europe, Germany’s DAX index rose 0.2% as Puma SE jumped 19% on takeover interest from multiple bidders, including China’s Anta Sports Products Ltd, according to people familiar with the matter.
Gains for global stocks over the past week partly reflected firming expectations for Fed easing, with futures markets pricing in roughly an 80% chance of a quarter-point cut next month and leaning toward three more by the end of 2026. A little more than a week ago, traders were projecting three cuts in total.
The rally in equity markets is likely to broaden outside the US, said Goldman Sachs strategist Peter Oppenheimer in an interview on Bloomberg TV. He anticipated further Fed easing but added that there is limited upside for stocks overall “because valuations are reasonably high.”
Oil edged higher as investors awaited the next steps in US-led efforts to end the war in Ukraine, and ahead of an OPEC+ gathering this weekend. Russian President Vladimir Putin said that US proposals for ending the war in Ukraine could be the basis for future agreements, although no final draft exists yet. OPEC+ nations will probably stick with a decision to pause oil production increases in early 2026, delegates said.
In Asia, data set for release includes industrial production for South Korea, unemployment and Tokyo inflation in Japan, and private sector credit in Australia. Taiwan will release third quarter gross-domestic product.
Chinese property developers were again in the spotlight. China Vanke Co. was rejected by at least two big local banks as it tried to secure a short-term loan to quell the default fears that have fueled a plunge in its bonds this week, according to people familiar with the matter.
Elsewhere, prosecutors have searched the homes of a former Taiwan Semiconductor Manufacturing Co. executive suspected of leaking trade secrets to Intel Corp., signaling an escalation in the government’s criminal probe into the high-profile dispute.
UK gilts gave back some of Wednesday’s rally that followed the Autumn budget. Chancellor of the Exchequer Rachel Reeves carved out a larger fiscal buffer, which buoyed sentiment, even though the tax-raising steps required cast a shadow over the outlook for economic growth. The pound and FTSE 100 were little changed.
“All told, we think the UK government did what it needed to do to keep UK bond markets on side,” wrote Bill Diviney, head of macro research at ABN AMRO. “While there is naturally some risk to this more backloaded fiscal consolidation round, it comes on top of an already considerable effort.”
In commodities, platinum touched its highest level in more than a month, supported by optimism over fresh demand after a Chinese exchange launched a new futures contract.
Some of the main moves in markets:
Stocks
Hang Seng futures were little changed as of 7:06 a.m. Tokyo time S&P/ASX 200 futures fell 0.3% Currencies
The Bloomberg Dollar Spot Index was little changed The euro was unchanged at $1.1596 The Japanese yen was little changed at 156.30 per dollar The offshore yuan was little changed at 7.0758 per dollar The Australian dollar was little changed at $0.6532 Cryptocurrencies
Bitcoin was little changed at $91,386.2 Ether was little changed at $3,032.27 Bonds
Australia’s 10-year yield advanced one basis point to 4.51% Commodities
Spot gold fell 0.1% to $4,157.61 an ounce This story was produced with the assistance of Bloomberg Automation.
Item 1 of 2 People shop ahead of Black Friday during Thanksgiving Day, in New York City, U.S., November 27, 2025. REUTERS/Brendan McDermid
[1/2]People shop ahead of Black Friday during Thanksgiving Day, in New York City, U.S., November 27, 2025. REUTERS/Brendan McDermid Purchase Licensing Rights, opens new tab
Nov 27 (Reuters) – Online sales in the U.S. on the Thanksgiving holiday are expected to rise 6% compared with last year to reach $8.6 billion, data from Salesforce showed on Thursday, suggesting shoppers were lapping up steep discounts from retailers to splurge amid tariff-induced macroeconomic uncertainty.
As of 2 p.m. ET (1900 GMT), Thanksgiving spending in the U.S. was 5.8% higher than at the same time last year, reaching $2.6 billion, the data showed.
Sign up here.
Thanksgiving and the day after, Black Friday, usher in the holiday shopping season, a critical stretch that typically delivers about a third of U.S. retailers’ yearly sales and profits.
This year’s kickoff comes amid economic uncertainty and heightened volatility from President Donald Trump’s tariffs on imported goods, which have raised costs for both retailers and consumers.
Latest results from American retailers have shown that shoppers are willing to buy gifts, electronics, clothes and more on good discounts, despite weak consumer sentiment overall.
Global spending has reached $13.1 billion so far on Thanksgiving, and digital sales are expected to reach $36 billion globally by the end of Thursday, Salesforce data showed.
Black Friday, which is the biggest day of the year for online shopping, is expected to drive $78 billion in online global sales and $18 billion in the U.S., according to Salesforce, a cloud-based software company.
Earlier this week, electronics retailer Best Buy (BBY.N), opens new tab said its computers, laptops and smartphones on discount were in demand among holiday shoppers while clothing retailers Gap (GAP.N), opens new tab and Abercrombie & Fitch (ANF.N), opens new tab also seemed upbeat.
Several holiday sales forecasts projected this year’s holiday season would be muted, with Mastercard data suggesting sales would be driven by promotions as consumers seek value for their money.
Salesforce in September projected U.S. online sales growth during the 2025 holiday season would slow from last year, with online spending between November 1 and December 31 expected to rise 2.1% to $288 billion, versus a 4% increase to $282 billion in the same period last year.
Reporting by Prerna Bedi and Sriparna Roy in Bengaluru; Editing by Nia Williams
Our Standards: The Thomson Reuters Trust Principles., opens new tab
Virtual patients (VP) are digital educational modalities that enable learners within health professions to interact with patient cases for learning purposes []. These educational tools complement real-life patient interactions by allowing health care learners to practice gathering medical histories and making clinical and diagnostic decisions in controlled, safe environments [,]. VPs can be developed using various technologies and are often used for clinical reasoning (CR) training purposes [,]. It has recently been demonstrated that VP-based educational tools can effectively improve medical students’ CR skills across multiple domains, including problem-solving and data gathering []. However, successful implementation depends on instructional design and quality features that support active learning []. While VPs offer standardized training opportunities, medical students report that conventional computer-based implementations often lack authenticity, potentially limiting their educational impact [,].
CR represents a fundamental cognitive process that guides diagnostic and management decisions in clinical practice [-]. While CR is important for patient safety and clinical outcomes [] and has been recommended to be explicitly addressed in health professions education [], traditional training approaches may not adequately capture the dynamic and complex nature of clinical decision-making []. VPs offer unique opportunities to train and assess CR skills using standardized, repeatable scenarios that simulate real clinical encounters [,]. However, many current VP platforms focus primarily on easily assessable CR components, with limited research examining how VP design characteristics can support the development of more complex aspects such as clinical presentations, generation of hypotheses, and justification of diagnostic decisions [].
Recent advances in artificial intelligence (AI), particularly the introduction of large language models (LLMs), have enhanced VP platforms and their ability to provide sophisticated clinical scenarios and realistic patient responses that better simulate the complexity of real CR challenges [-]. LLMs demonstrate potential to transform medical education through interactive simulations, individualized tutoring, and personalized feedback [,]. When LLMs are integrated into physical embodied agents such as social robotic interfaces, they have the potential to deliver multimodal interactions that enhance the cognitive engagement required for effective CR skill development []. Emerging evidence suggests that physical embodied AI enables multimodal dynamic learning and real-time feedback through direct environmental interaction, potentially offering advantages over screen-based learning [].
Our previous qualitative research demonstrated that sixth-semester medical students perceived AI-enhanced social robotic interfaces as more clinically authentic than traditional computer-based VPs for CR training []. While this qualitative evidence suggests that AI-enhanced social robotic VPs provide more authentic learning experiences compared with traditional computer-based platforms, no quantitative studies have hitherto examined whether these perceived advantages translate into measurably better VP design characteristics that support CR skill development.
Given the critical importance of CR skills in clinical practice and the challenges of providing authenticity using traditional VPs, quantitative evaluation is essential to determine the effectiveness of emerging VP technologies [,]. This study aimed to compare medical students’ experience of an AI-enhanced social robotic versus a conventional computer-based VP platform, regarding the extent to which the design characteristics of the respective platform facilitate training of CR skills.
Methods
Overview
We conducted an observational crossover cohort study to compare VP design elements that support CR skill training in medical education in an LLM-empowered social robotic platform, that is, an in-house developed social artificial intelligence–enhanced robotic interface (SARI) [-], and a computer-based VP platform, that is, the virtual interactive case system (VIC) []. This study reports according to the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) [] guidelines for observational studies and the GREET (Guideline for Reporting Evidence-Based Practice Educational Interventions and Teaching) guidelines [] for educational references.
All students experienced both VP platforms as a part of their clinical rotations within rheumatology. Platform order was determined by clinical rotation scheduling and practical logistics rather than random assignment. Each student served as their own control to minimize between-participant variability. We collected quantitative data using a questionnaire previously developed for VP platform evaluation (measuring authenticity, professional approach, coaching quality, learning effects, and overall judgment) within the context of CR [], combined with additional evaluation items developed by our research team to assess the preference of each VP platform for CR training. The complete questionnaire is provided in Figure S1 in .
Study Design and Setting
The study was conducted at Karolinska Institutet (KI), a major academic medical university in Stockholm, Sweden. Data collection occurred during clinical rotations at the Division of Rheumatology, Karolinska University Hospital, between the spring term of 2024 and the spring term of 2025. The clinical rotations within rheumatology are mandatory for all medical students at KI and take place during the sixth semester of medical studies.
Study Participants
Study participants comprised sixth-semester medical students enrolled in clinical courses at KI in Stockholm, Sweden. All medical students at KI participate in clinical rotations within rheumatology at the Karolinska University Hospital as a part of their curriculum. During these clinical rotations, students participate in an educational activity called “the virtual outpatient clinic,” where students encounter VPs.
We used convenience sampling, recruiting all sixth-semester medical students who completed their clinical rotation within rheumatology during the study period and consented to participate in the study. All students (N=421) were invited to participate by completing a survey after their educational experience with both VP platforms. A total of 178 agreed to participate, yielding a response rate of 42.3%. Students who did not complete the virtual outpatient clinic activity during the study period (eg, due to illness or absence for other reasons) were excluded from the study. Language proficiency was not an additional inclusion or exclusion criterion. However, students with insufficient English proficiency would have been unable to participate; no such cases were encountered. Participation was voluntary, and students provided written informed consent prior to enrolment, with the opportunity to withdraw at any time. Students received no reimbursement for participation.
VP Case Development and Practice
The VP cases had been developed and implemented for the virtual outpatient clinic before the study commenced, following specific recommendations for VP case development. Those included (1) structured case design models with difficulty levels adapted to students’ curriculum, (2) incorporation of CR strategies such as differential diagnostic formulation, and (3) interactive elements in history-taking and physical examination options [,]. All cases were developed in English to also ensure accessibility for international exchange students. A total of 10 cases were developed: 5 cases were presented using SARI and 5 cases using VIC. One case was identical between the 2 platforms, which served the comparison between the 2 VP platforms during the early stages of their development [,]. Students interacted with 5 cases on one platform and 4 on the other, experiencing a total of 9 unique VPs. The unequal distribution of VP cases (5 cases on one platform and 4 cases on the other) was therefore to avoid repetition of the identical case rather than deliberate research design. This resulted in some student groups encountering 5 cases on SARI, and some encountering 5 cases on VIC, with this selection being solely based on scheduling. The cases included representative patient scenarios of rheumatological conditions supporting the students’ learning curriculum, including polymyalgia rheumatica, rheumatoid arthritis, psoriatic arthritis, ankylosing spondylitis, systemic lupus erythematosus, and Sjögren disease. A detailed description of the case development process and the case contents has been described elsewhere [].
Each student attended the virtual outpatient clinic over a period of one and a half days, during which they encountered and interacted with all VP cases. Before training with the cases, the students received written information on generic medical history questions typically used at the rheumatology outpatient clinic to gather a structured history from patients and support diagnostic procedures (Figure S2 in ). The students were instructed to perform the VP sessions in pairs or small groups of 3 students to allow for interaction and active collaboration, which has been shown to favor CR skill training compared with experiencing the VP sessions alone []. Cases were initiated with a brief case presentation, which included the patient’s primary concern, age, and name. Next, the students explored the case environment in their preferred order. Cases were concluded when students perceived that they had gathered sufficient information to perform preliminary diagnostics and propose a suitable management plan for the VP. Students were allocated approximately 30 minutes per VP case interaction.
Following completion of each case, the students participated in case-specific follow-up seminars to discuss the case with a seminar leader, that is, a consultant rheumatologist at the Karolinska University Hospital, and pose questions. During these seminars, students were asked to summarize the VP case briefly in a structured manner to practice clinical communication and to propose their recommendations for further management. These seminar series comprised 2 to 3 student groups who had just performed the same VP case (ie, 4 to 9 students), and lasted approximately 15-20 minutes per case. While all students performed all 9 VP cases, platform order was determined by clinical rotation scheduling logistics. Of the 178 participants, 101 (56%) students began with SARI and 77 (44%) began with VIC on the first day of the virtual outpatient clinic. This distribution was not the result of random assignment but rather reflected practical scheduling for the clinical rotations. The students completed the virtual outpatient clinic assignment after finishing all VP cases along with their corresponding follow-up seminars.
Social AI-Enhanced Robotic Interface
The embodiment of SARI consisted of a social robot from Furhat Robotics, which projects an animated face onto a semi-transparent face mask. The face mask is placed on a plastic head, which is connected to a mechanical neck that allows natural head movements and adjustments of gaze direction to facilitate interaction with multiple users simultaneously, using sensors []. Furthermore, the robot displays facial expressions and provides affective responses along with sophisticated gaze behavior to indicate emotions during interaction []. SARI combines the Furhat software development kit (FurhatSDK) with the OpenAI GPT-3.5 turbo LLM []. To limit the risk of unwanted AI hallucinations from SARI, we used prompts to generate dialogue responses using specific instructions, including a detailed patient description along with the 10 latest dialogue turns. SARI was constrained to provide clinically appropriate information consistent with the VP case description. The LLM was also prompted to generate facial expressions to match the emotional state of the VP during dialogue, selecting from available expressions in the FurhatSDK at specific anchor points during the dialogues []. A VP prompt example is provided in Figure S3 in .
Before starting a VP case in SARI, students received brief written contextual information along with results from relevant laboratory tests with their corresponding reference values. SARI imposed no numerical limits on the questions students could ask during case interactions. Students could engage in natural language dialogue for as long as they deemed necessary to gather sufficient clinical information. A schematic illustration of the VP interaction between students and SARI is illustrated in Figure S4 in .
Virtual Interactive Case System
VIC is a web-accessible, computer-based VP platform where users freely interact with a VP case, while the initial presentation of the case and the conclusion remain fixed []. We incorporated interactive case elements primarily regarding medical history, physical examination options, laboratory tests, and relevant diagnostic imaging. There was no limit to the number of times students could revisit the prespecified questions or examination options. However, the students were encouraged to be selective in investigating information they deemed relevant to the case as more information became available. The cases concluded with students providing preliminary diagnoses and proposing management plans by selecting from multiple-choice responses.
Ethical Considerations
The Swedish Ethical Review Authority approved this study prior to initiation (registration number: 2022-04437-01). The study was conducted in accordance with the Declaration of Helsinki and Swedish regulations. All participants provided written informed consent before enrollment. Students were explicitly informed that participation was voluntary and would not affect their academic standing or evaluation in any way. Participants had the right to withdraw their participation in the study at any time without providing reasons and without consequences for their education. All study data were collected pseudonymously through coded questionnaires. No personally identifiable information was recorded in the dataset. Questionnaire responses were only linked to pseudonymized participant codes. Data were stored securely on password-protected servers at KI, with access restricted to researchers directly involved in the study. All data handling procedures complied with the European General Data Protection Regulation. Students received no financial compensation or academic credits for their participation in the study. Participation was entirely voluntary beyond the mandatory educational virtual outpatient activity, which all students completed regardless of their participation in the study.
Data Collection
Immediately after completion of the virtual outpatient clinic, students who had agreed to participate in this study completed a questionnaire to evaluate the VP platform design, with particular emphasis on CR training. The questionnaire consisted of the complete instrument for VP design evaluation developed and validated by Huwendiek et al [], which uses Likert-scale data based on statements regarding the VP experience and has demonstrated good content validity and internal consistency. To this, we added 2 project-specific questions designed by our research group to capture preferences regarding VP platforms for CR training. These additional items had not undergone formal validation, as they were tailored to the objectives of this study. The adapted questionnaire underwent pilot testing within our team and with a small group of students to ensure clarity and comprehensibility before use in the study. Internal consistency of the adapted questionnaire was not assessed in our sample. The complete questionnaire is provided in Figure S1 in .
Likert-scale data were divided into the following quantitative themes from the questionnaire by Huwendiek et al []: (1) authenticity of the patient encounter and the consultation, (2) professional approach in the consultation, (3) coaching during consultation, (4) learning effect of consultation, and (5) overall judgment of the case work-up. Within each theme, students evaluated a statement relating to their VP experience, for example, “while working through this case, I felt I had to make the same decisions a clinician would make in real life,” which was scored from 1 (strongly disagree) to 5 (strongly agree).
The two questions relating to the preferred VP platform were (1) “Overall, which of the platforms is preferable to you in relation to acquirement of clinical reasoning skills?” and (2) “On a scale from 0 to 10, where 0 is total preference for the social robot and 10 is total preference for the computer-based platform, how would you grade your preference of the virtual patient platforms compared with each other for acquirement of clinical reasoning skills?” The first question was answered using one of 3 responses: SARI, VIC, or equally preferred, while the second question had the structure of a Visual Analogue Scale (VAS), ranging between 0 and 10, where a lower score denoted a stronger preference for SARI, a higher score denoted a stronger preference for VIC, and a score of 5 denoted equal preference.
In addition to questions relating to the VP design and evaluations of the preferred platform for CR training, students provided information regarding their age and sex, whether they had any previous experience with VPs, and which platform they were first introduced to during the virtual outpatient clinic.
Statistical Analysis
The Wilcoxon signed-rank test was used to compare responses to Likert-scale data from quantitative themes in the questionnaire described by Huwendiek et al [] and VAS responses. For each theme, individual item scores were averaged to create a composite theme score for each student. Theme scores were next compared between platforms. VAS responses from the 2 platforms were compared with a hypothetical score of 5, denoting equal preference. The Fisher exact test with Monte Carlo simulation (10.000 iterations) was used to compare frequencies of categorical responses for VP platform preference based on students’ CR training experience. Results from Wilcoxon signed-rank tests are presented as medians and the corresponding IQR, test statistics (W), effect size (r), and P value. Results from Fisher exact tests are presented as frequencies and the corresponding percentage, odds ratio (OR), 95% CI, and P value.
Missing data were minimal across all variables. Demographic variables had no missing values except platform order (1 missing value, 0.56%). For Likert scale items evaluating VP design, missing data were 1.68% overall (range: 0%-3.91% per item; maximum 7 missing responses out of 178). VAS preference scores had 1.12% missing values (2 of 178 responses), and categorical platform preference had 1.68% missing values (3 of 178 responses).
Little’s MCAR test on Likert data indicated that the data may not be completely at random (χ2108=157.12; P=.001); however, given the very low proportion of missing data (<2% overall) and the paired nature of our crossover design, complete case analysis with pairwise deletion was used. No systematic differences in missingness across demographic groups were observed for any variable (all P>.05). The number of participants with available data is reported for each analysis ( and ). “Not applicable” responses (coded as 6 in the questionnaire; up to 9.5% for some items) were treated as valid response categories in frequency distributions but were excluded from statistical tests, as they do not represent a position on the agree-disagree scale. Our sample size was determined by the total number of sixth-semester medical students completing their clinical rotation in rheumatology during the study period. Post hoc power analysis indicated that this sample size provided >80% power to detect moderate effect sizes (Cohen d≥0.5) at of 0.05 for paired comparisons. All statistical analyses were performed using R (version 4.3.3; R Foundation for Statistical Computing, Vienna, Austria). Differences yielding P values <.05 were considered statistically significant.
Table 1. Comparison of median scores using the Wilcoxon signed-rank test within each question of the clinical reasoning questionnaire based on Likert-scale data. Comparison of median Likert scale scores (1=strongly disagree to 5=strongly agree) between social artificial intelligence–enhanced robotic interface (SARI) and virtual interactive case (VIC) for individual questionnaire items evaluating virtual patient design in an observational crossover cohort study of 178 sixth-semester medical students at Karolinska Institutet, Stockholm, Sweden, between the Spring of 2024 and the Spring of 2025. Students completed questionnaires evaluating both platforms after experiencing them during their clinical rotation within rheumatology. Five domains were assessed as follows: (1) authenticity of patient encounter, (2) professional approach in consultation, (3) coaching during consultation, (4) learning effect of consultation, and (5) overall judgment of case work-up. Data were analyzed using the Wilcoxon signed-rank test for paired comparisons. Missing data rates were low for all items (overall: 1.68%; range: 0%-3.91% across items). “Not applicable” responses were excluded from statistical analyses but included in frequency distributions (Tables S1-S4 in Multimedia Appendix 2). Data are presented as the median score (IQR), test statistic (W), and effect size (r). The total number of study participants was 178. In case of missing values, the number of participants with available data is indicated.
Theme and variable
SARI, median (IQR)
VIC, median (IQR)
W
r
P value
Authenticity of patient encounter
Q1a (n=172)
4.0 (3.0-4.0)
3.0 (3.0-4.0)
5306
0.34
<.001b
Q2 (n=173)
4.0 (3.0-4.0)
3.0 (2.0-3.0)
6936
0.58
<.001b
Professional approach in the consultation
Q1 (n=172)
5.0 (4.0-5.0)
4.0 (3.0-5.0)
4011
0.43
<.001b
Q2 (n=173)
4.0 (4.0-5.0)
4.0 (4.0-5.0)
1378
0.28
<.001b
Q3 (n=169)
4.0 (3.0-5.0)
4.0 (3.0-4.0)
2146
0.43
<.001b
Q4 (n=171)
5.0 (4.0-5.0)
4.0 (4.0-5.0)
868
0.27
<.001b
Coaching during consultation
Q1 (n=174)
5.0 (4.0-5.0)
5.0 (4.0-5.0)
248
0.11
.119
Q2 (n=158)
4.0 (4.0-5.0)
4.0 (3.3-5.0)
1123
0.29
<.001b
Q3 (n=152)
4.0 (4.0-5.0)
4.0 (3.0-5.0)
1268
0.31
<.001b
Learning effect of consultation
Q1 (n=175)
4.0 (4.0-5.0)
4.0 (3.0-4.0)
1491
0.35
<.001b
Q2 (n=173)
4.0 (4.0-5.0)
4.0 (3.0-4.0)
1515
0.39
<.001b
Overall judgment of case work-up
Q1 (n=175)
5.0 (4.0-5.0)
4.0 (4.0-5.0)
2351
0.35
<.001b
aQ: question.
bStatistically significant values.
Table 2. Comparison of median composite theme scores between platforms using the Wilcoxon signed-rank test. Composite scores were calculated by averaging each student’s responses within themes. Comparison of median composite theme between social artificial intelligence–enhanced robotic interface (SARI) and virtual interactive case (VIC) using the Wilcoxon signed-rank test in an observational crossover cohort study of 178 sixth-semester medical students at Karolinska Institutet, Stockholm, Sweden, between the Spring of 2024 and the Spring of 2025. Students completed questionnaires evaluating both platforms after experiencing them during their clinical rotation within rheumatology. Composite scores were calculated by averaging each student’s responses within the five themes of a validated virtual patient design characteristics questionnaire: (1) authenticity of patient encounter, (2) professional approach in consultation, (3) coaching during consultation, (4) learning effect of consultation, and (5) overall judgment of case work-up. Data were analyzed using the Wilcoxon signed-rank test for paired comparisons. Missing data rates were low for all items (overall: 1.68%; range: 0%-3.91% across items). “Not applicable” responses were excluded from statistical analyses but included in frequency distributions (Tables S1-S4 in Multimedia Appendix 2). Data are presented as the median score (IQR), test statistic (W), and effect size (r). The total number of study participants was 178. In case of missing values, the number of participants with available data is indicated.
Theme
SARI, median (IQR)
VIC, median (IQR)
W
r
P value
Authenticity of patient encounter (n=173)
4.0 (3.5-4.5)
3.0 (2.5-3.5)
9253
0.54
<.001a
Professional approach in the consultation (n=173)
4.5 (4.0-4.8)
4.0 (3.5-4.5)
6717
0.54
<.001a
Coaching during consultation (n=174)
4.3 (4.0-4.7)
4.0 (3.7-4.7)
4092.5
0.32
<.001a
Learning effect of consultation (n=175)
4.4 (4.0-5.0)
4.0 (3.5-4.5)
2589
0.42
<.001a
Overall judgment of case work-up (n=175)
5.0 (4.0-5.0)
4.0 (4.0-5.0)
2351
0.35
<.001a
aStatistically significant values.
Results
Demographic and Study-Specific Characteristics
Of the 178 students who participated in the questionnaire, 93 (52%) were women and 86 (48%) were men. Most students had no previous experience with VP platforms (150 students, 84%), while 29 (16%) reported prior experience. The mean age was 25.3 (SD 5.4) years. Regarding platform order, 101 (56%) students started with SARI, and 77 (43%) started with VIC.
VP Platform Design Evaluation for Clinical Reasoning Training
Students consistently rated SARI as superior to VIC across multiple domains of VP design that support CR training. Results from Likert-scale data are illustrated in and , and , with detailed response frequencies provided in Tables S1-S5 in .
Figure 1. Violin plots illustrating results from the Wilcoxon signed-rank test on Likert-scale data (1=strongly disagree to 5=strongly agree) from student evaluations of virtual patient platform design with an emphasis on clinical reasoning training in an observational crossover cohort study of 178 sixth-semester medical students at Karolinska Institutet, Stockholm, Sweden, between the Spring of 2024 and the Spring of 2025. Students completed questionnaires evaluating both platforms after experiencing them during their clinical rotation within rheumatology. Comparisons between social artificial intelligence–enhanced robotic interface (blue color) and virtual interactive case (red color) are illustrated. Violin plots show score distributions; white boxes indicate the median and IQR, and numbers indicate mean scores. Panel A denotes theme A: authenticity of patient encounter (question 1: “I felt I had to make the same decisions a doctor would make in real life”; question 2: “I felt I was the doctor caring for this patient”). Panel B denotes theme B: professional approach in consultation (question 1: “I was actively engaged in gathering the information I needed”; question 2: “I was actively engaged in revising my reasoning as new information became available”; question 3: “I was actively engaged in creating a summary of the patient’s problems using medical terms”; question 4: “I was actively engaged in thinking about which findings supported or refuted each diagnosis I was considering”). Panel C denotes theme C: coaching during consultation (question 1: “I felt that the case was at the appropriate level of difficulty for my training”; question 2: “the questions I was asked while working through this case helped me to learn”; question 3: “the feedback I received was helpful in enhancing my diagnostic reasoning process”). Panel D denotes theme D: learning effect of consultation (question 1: “I feel better prepared to confirm a diagnosis and exclude differential diagnoses in a real patient with this complaint”; question 2: “I feel better prepared to care for a real patient with this condition”). Q: question. Figure 2. Violin plots illustrating results from the Wilcoxon signed-rank test on Likert-scale data (1=strongly disagree to 5=strongly agree) from student evaluations of virtual patient platform design with an emphasis on clinical reasoning training regarding the theme overall judgment in an observational crossover cohort study of 178 sixth-semester medical students at Karolinska Institutet, Stockholm, Sweden between the Spring of 2024 and the spring of 2025. Students completed questionnaires evaluating both platforms after experiencing them during their clinical rotation within rheumatology. A comparison between social artificial intelligence–enhanced robotic interface (blue color) and virtual interactive case (red color) is illustrated. The questionnaire item reads: Overall, working through this case was a worthwhile learning experience. Violin plots show score distributions; white boxes indicate the median and IQR, and numbers indicate mean scores. Q: question.
Authenticity of Patient Encounters
SARI demonstrated higher authenticity ratings than VIC. Students experienced that they had to make decisions as a real-life clinician to a greater degree with SARI compared with VIC (median 4.0, IQR 3.0-4.0 vs 3.0, IQR 3.0-4.0; W=5306; r=0.34; P<.001) and felt with SARI more like a clinician being responsible for the care of the VP (median 4.0, IQR 3.0-4.0 vs 3.0, IQR 2.0-3.0; W=6936; r=0.58; P<.001). The overall theme score remained significantly greater for SARI (median 4.0, IQR 3.5-4.5 vs 3.0, IQR 2.5-3.5; W=9253; r=0.54; P<.001).
Professional Approach During Consultation
Students reported significantly greater active engagement in CR processes when using SARI compared with VIC. This included more active engagement in gathering necessary clinical information (median 5.0, IQR 4.0-5.0 vs 4.0, IQR 3.0-5.0; W=4011; r=0.43; P<.001), revising their clinical impression as new information became available (median 4.0, 4.0-5.0 vs 4.0, IQR 4.0-5.0; W=1378; r=0.28; P<.001), creating structured patient summaries using medical terminology (median 4.0, IQR 3.0-5.0 vs 4.0, IQR 3.0-4.0; W=2146; r=0.43; P<.001), and actively considering findings that support or refute differential diagnoses (median 5.0, IQR 4.0-5.0 vs 4.0, IQR 4.0-5.0; W=868; r=0.27; P<.001). The overall theme score was significantly greater for SARI (median 4.5, IQR 4.0-4.8 vs 4, IQR 3.5-4.5; W=6717; r=0.54; P<.001).
Coaching During Consultation
Students perceived no significant difference between platforms regarding case difficulty appropriateness for their training level (median 5.0, IQR 4.0-5.0 vs 5.0, IQR 4.0-5.0; W=248; r=0.11; P=.12). However, SARI was rated significantly better in facilitating helpful interactions that enhanced diagnostic reasoning (median 4.0, IQR 4.0-5.0 vs 4.0, IQR 3.3-5.0; W=1123; r=0.29; P<.001) and for system feedback that supported diagnostic reasoning (median 4.0, IQR 4.0-5.0 vs 4.0, IQR 3.0-5.0; W=1268; r=0.31; P<.001). The overall theme score favored SARI (median 4.3, IQR 4.0-4.7 vs 4.0, IQR 3.7-4.7; W=4092.5; r=0.32; P<.001).
Learning Effect of Consultation
Students felt significantly better prepared for real clinical encounters after having used SARI compared with VIC. This included feeling better prepared to confirm the diagnosis and exclude differential diagnoses in real patients with similar complaints (median 4.0, IQR 4.0-5.0 vs 4.0, IQR 3.0-4.0; W=1491; r=0.35; P<.001) and feeling better prepared to provide care for real patients with similar conditions (median 4.0, IQR 4.0-5.0 vs 4.0, IQR 3.0-4.0; W=1515; r=0.39; P<.001). The overall theme score significantly favored SARI (median 4.0, IQR 4.0-5.0 vs 4.0, IQR 3.5-4.5; W=2589; r=0.42; P<.001).
Overall Judgment
Students rated the interaction with SARI as a significantly more worthwhile learning experience compared with VIC (median 5.0, IQR 4.0-5.0 vs 4.0, IQR 4.0-5.0; W=2351; r=0.35; P<.001). However, both platforms received generally positive evaluations from the students.
VP Platform Preference for CR Training
Visual Analogue Scale Ratings
Students reported an overall strong preference for SARI over VIC for CR training (median 3.0, IQR 2.0-5.0; W=1604.5; r=0.60; P<.001). This preference pattern remained statistically significant across all subgroups of interest, that is, if students were female or male, if they had or did not have prior experience with VPs, and if they had started with SARI or VIC ( and Tables S6-S9 in ).
Figure 3. Density plots illustrating distributions of Visual Analogue Scale responses (0-10 scale; 0=total preference for SARI, 10=total preference for VIC, 5=equal preference of platforms) regarding virtual patient platform preference for CR training in an observational crossover cohort study of 178 sixth-semester medical students at Karolinska Institutet, Stockholm, Sweden, between the Spring of 2024 and the Spring of 2025. Results from Wilcoxon signed-rank tests are shown for comparisons of scores with a hypothetical score of 5 (equal preference of platforms) for each student. The panels show, from top to bottom, the overall distribution of responses in the entire cohort of students, overlayed distributions in women and men, overlayed distributions in students with and without prior experience of virtual patients, and overlayed distributions in subgroups of students starting with SARI or VIC. All students experienced both SARI and VIC during their clinical rotation within rheumatology, with platform order being determined by rotation scheduling. SARI: social artificial intelligence–enhanced robotic interface; VIC: virtual interactive case.
Categorical Preference Responses
When asked to choose their preferred platform, students strongly favored SARI over VIC (72% vs 14%; OR 27.1, 95% CI 14.3-53.7; P<.001). The preference for SARI over equal preference was also statistically significant (72% vs 15%; OR 23.1, 95% CI 8.5-54.6; P<.001). When comparing SARI to VIC or equal preference combined, SARI remained strongly favored (72% vs 28%; OR 6.3, 95% CI 3.9-10.4; P<.001).
Similar patterns were observed across student subgroups in most comparisons. However, the preference difference did not reach statistical significance in the comparison between SARI and VIC or equal preference combined among students with prior VP experience (62% vs 38%; OR 2.6, 95% CI 0.8-8.9; P=.11) and those introduced to VIC first (55% vs 45%; OR 1.5, 95% CI 0.7-2.9; P=.33), although a numerical preference for SARI was still evident in both groups. Results are illustrated in .
Figure 4. Forest plots illustrating results from Fisher exact test with Monte Carlo simulation (10,000 iterations) on categorical preferences for virtual patient platforms for clinical reasoning training in an observational crossover cohort study of 178 sixth-semester medical students at Karolinska Institutet, Stockholm, Sweden, between the Spring of 2024 and the Spring of 2025. Proportion of students preferring SARI versus comparator. Panel A shows comparisons between SARI and VIC (blue color). Panel B illustrates comparisons between SARI and equal preference (green color). Panel C shows comparisons between SARI and Not SARI (VIC or equal preference combined; red color). Circles denote ORs and whiskers denote 95% CIs on a logarithmic scale. All students experienced both SARI and VIC during their clinical rotation within rheumatology, with platform order being determined by rotation scheduling. Subgroups analyzed: overall cohort, female and male students, students with and without prior virtual patient experience, and students starting with SARI and students starting with VIC. OR: odds ratio; SARI: social artificial intelligence–enhanced robotic interface; VIC: virtual interactive case.
Discussion
Principal Findings
This observational crossover cohort study aimed to compare medical students’ experience of an AI-enhanced social robotic (SARI) versus a conventional computer-based VP platform (VIC), regarding the extent to which the design characteristics of the respective platform facilitate training of CR skills. Consistent with our hypothesis based on prior results from qualitative work [,], students perceived SARI as superior across all 5 domains of VP design, indicating that AI-enhanced social robotic VPs offer significant advantages over conventional computer-based platforms for CR training in medical education. Students also expressed a strong overall preference for SARI over VIC for CR training, suggesting that embodied AI systems may better support the cognitive processes underpinning effective CR skill development.
The effect sizes observed (r ranging from 0.27 to 0.60) represent moderate to large effects according to Cohen conventions, with the strongest effect seen for overall platform preference (r=0.60). While the Likert-scale ratings showed moderate advantages for design characteristics in SARI, the categorical preference data revealed much stronger student preference (OR 27.1), suggesting that the overall educational experience of SARI extends beyond individual distinct design elements captured by the Likert-scales of the study questionnaire.
Students perceived SARI as significantly more authentic than VIC in terms of feeling like they had to make decisions as real-life clinicians and feeling like the clinician was responsible for the care of the patient. Importantly, according to educational theory, authentic learning environments are crucial for effective CR development []. Our findings validate previous qualitative research from our group, which also demonstrated that SARI yields greater perceived authenticity compared with VIC []. The physical embodiment, combined with natural gaze behavior, facial expressions, and conversational AI, appears to create a more immersive clinical simulation that better prepares students for real patient encounters.
The superiority of SARI in promoting a professional approach during consultations suggests that social robotic embodiment facilitated more active cognitive engagement in CR processes. Students consistently reported being more actively engaged in gathering clinical information, revising their clinical impressions, creating structured patient summaries, and considering differential diagnoses when using SARI compared with VIC. This finding is particularly important, taking into consideration that effective VP platforms should specifically support active cognitive engagement rather than passive information processing [,]. The multimodal nature of SARI in combining visual, auditory, and conversational elements may activate multiple cognitive pathways that enhance learning retention and skill transfer [-].
The advantages of SARI in providing helpful interactions and feedback that promoted diagnostic reasoning suggest that AI-enhanced conversational interfaces can provide more adaptive and contextually appropriate educational support than traditional platforms. Students felt better prepared for real clinical encounters after having used SARI compared with VIC, indicating potential for improved skill transfer from digital to actual clinical practice. This finding addresses a critical challenge in medical education, which involves ensuring that VP experiences translate into improved real-world clinical competencies [].
The superiority of SARI was generally robust across student subgroups. However, 2 notable exceptions warrant consideration. Students with prior VP experience showed only a numerical preference for SARI that did not reach statistical significance in the comparison with VIC or equal preference combined, suggesting that familiarity with VP technology may attenuate the perceived benefits of a new educational technology, such as the AI-enhanced social robotic interface introduced in this study. Additionally, students first introduced to VIC showed a smaller, nonsignificant preference margin for SARI, which may reflect first-impression or carryover effects, or both operating simultaneously. Such order effects are inherent limitations of crossover designs, where complete washout is neither feasible nor appropriate. Nevertheless, this finding is consistent with the earlier observation that familiarity may influence the perceived benefits of a new technology. Taken together, these findings indicate that while SARI generally offers advantages, perceived benefits may be influenced by students’ prior exposure to similar technologies, as well as the order in which technologies are encountered. While the crossover design mitigated learning effects, some degree of order effect likely remained, as complete elimination is inherently challenging.
Our results extend previous research on VPs in medical education by demonstrating quantifiable advantages of AI-enhanced social robotics. A recent systematic review identified the need for VP platforms that specifically target CR components such as problem presentation, hypothesis generation, and diagnostic justification []. Our findings suggest that AI-enhanced social robotic platforms may be well-suited for addressing these educational needs through their ability to provide dynamic, responsive interactions that mirror real clinical encounters more closely than traditional computer-based systems.
It is important to note that our results did not favor SARI over VIC consistently across all student subgroups. While this likely reflects limitations in statistical power, individual learner characteristics and contextual implementation factors may also impact students’ perceptions of the effectiveness of new educational modalities. Long-term studies examining sustained educational benefits would strengthen the evidence for AI-enhanced VP platforms. We would also like to emphasize that the 2 VP platforms incorporate different design features, and that the purpose of this study was to explore students’ perspectives on these design elements rather than evaluate the specific contributions of each platform to distinct aspects of CR, especially since no assessment of implemented CR skills was undertaken. We also acknowledge that our comparison between SARI and VIC concerns 2 specific platforms, and that results might differ if another AI-enhanced social robotic interface or a different conventional computer-based VP platform were used. Future research could further investigate the added value of platform-specific differences for CR training outcomes in implementation settings, for example, through examination-based evaluations.
Limitations and Strengths
Several limitations warrant consideration when interpreting our results. First, our findings are based on students’ perceptions of VP design elements rather than objective measures of CR skill acquisition. Moreover, the internal consistency of the adapted questionnaire was not assessed in our study population. Future research that incorporates validated CR assessments and longitudinal evaluation of skill transfer to real clinical encounters is warranted []. While students consistently favored SARI, it is important to acknowledge that positive reception of novel technology does not always correlate with improved educational outcomes []. Novelty effects may have influenced students, as this was the first exposure to SARI for most participants, whereas computer-based platforms might already have been familiar to some.
Second, the single-center and discipline-specific design within rheumatology may limit the generalizability of the findings to other educational contexts or medical disciplines. Third, the use of English in case interaction rather than the students’ native language (Swedish for a vast majority) may have affected perceived authenticity and interaction quality, though this limitation likely affected both platforms similarly. Thus, we did not collect data on the students’ native language, precluding subgroup analysis based on language background to investigate how this might have influenced platform preferences.
Fourth, the use of a single LLM version (GPT-3.5-turbo) limits the generalizability of our findings across different AI models and their reproducibility over time. However, this choice ensured homogeneity across student groups in the present study. Given that LLM performance and behavior evolve with updates, newer models with enhanced capabilities may yield more reliable or nuanced results. Last, group size (pairs vs groups of 3) was not investigated, and we cannot exclude the possibility that group dynamics influenced individual perceptions, though students experienced both platforms in the same group configuration.
Our study also has several strengths. The observational crossover design allowed students to serve as their own controls, minimizing between-participant variability. We used a validated questionnaire specifically designed for VP design evaluation with emphasis on CR training. The study included a substantial sample (n=178) representing 42% of eligible students, which supports the generalizability of the findings within the target population. Furthermore, this is the first quantitative comparison of self-perceived VP design characteristics between an AI-enhanced social robotic VP platform and a conventional computer-based VP software for CR training within medical education.
Implications
The findings of the present study have important implications for curricular development within medical education. AI-enhanced social robotic VP platforms may be considered superior alternatives to conventional computer-based systems, particularly for CR training. However, implementation decisions should also account for economic factors, technical infrastructure requirements, and faculty training needs. The integration of LLMs with social robotics represents a significant technological advancement that may justify investment despite potentially higher initial costs compared with traditional platforms. Over a longer term, the initial higher cost may mitigate costs from medical errors, through contribution to better preparation of students in advanced yet safe patient simulation environments.
Conclusions
This study provides quantitative evidence that medical students may perceive AI-enhanced social robotic VP platforms as offering advantages in design characteristics over conventional computer-based platforms for CR training. The consistent superiority of SARI across multiple domains of VP design, combined with strong student preferences regardless of individual characteristics, suggests that embodied AI platforms represent a meaningful advancement in medical pedagogy technology. These findings support the integration of LLMs with social robotics as a promising approach for developing more effective VP simulations that better prepare medical students for real clinical encounters and warrant future research to examine objective CR performance outcomes and long-term learning retention. To our knowledge, this is the first quantitative head-to-head comparison of VP design characteristics between 2 technological approaches, one of which is a social robotic VP platform, thereby extending beyond prior qualitative and single-platform evaluations.
The authors would like to thank the medical students who participated in the study, as well as the medical staff at the Division of Rheumatology at the Karolinska University Hospital, Stockholm, Sweden. The authors used ChatGPT (OpenAI) for language refinement and grammar checking in specific sections of the manuscript during the writing process. All scientific content, study design, data analysis, statistical interpretation, and conclusions were developed entirely by the authors. The authors take full responsibility for the accuracy, integrity, and scientific validity of all content in this manuscript.
The anonymized datasets generated and analyzed during this study are available from the corresponding author upon reasonable request. Access to data will be granted following appropriate ethical review and data sharing agreements and will require completion of a data transfer agreement and approval from the Swedish Ethical Review Authority, as per Swedish data protection regulations and the European General Data Protection Regulation.
This work was supported by grants from Region Stockholm ALF Pedagogy (FoUI-977096; FoUI-1024895), Karolinska Institutet Pedagogical Project Funding (FoUI-964139; FoUI-1026178), the Swedish Rheumatism Association (R-1013624), King Gustaf V’s 80-year Foundation (FAI-2023-1055), Swedish Society of Medicine (SLS-974449), Nyckelfonden (OLL-1023269), Professor Nanna Svartz Foundation (2021-00436), Ulla and Roland Gustafsson Foundation (2024-43), Region Stockholm (FoUI-1004114), and Karolinska Institutet.
AB, MR, SE, CaG, GS, and IP contributed to the conception and design of the work. Data were acquired by AB, JS, WI, CiG, VH, AH, BJ, FE, and IP. Statistical analysis and interpretation of data were conducted by AB, MR, SE, CaG, GS, and IP. AB and IP prepared the original draft of the manuscript, and all authors critically revised it for important intellectual content. All authors reviewed and approved the final version of the manuscript prior to submission and agree to be accountable for all aspects of the work.
IP has received research funding and honoraria from Amgen, AstraZeneca, Aurinia, Biogen, BMS, Elli Lilly, Gilead, GSK, Janssen, Novartis, Otsuka, Roche, UCB, and Viatris. GS, who is a co-founder and Chief Scientist at Furhat Robotics, provided technical expertise regarding the robotic platform but was not involved in study design, data collection, data analysis, or interpretation of results. All other authors declare that they have no conflicts of interest.
Edited by S Brini; submitted 17.Aug.2025; peer-reviewed by A Peralta Ramirez, K-H Lin; comments to author 16.Oct.2025; revised version received 29.Oct.2025; accepted 03.Nov.2025; published 27.Nov.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
BC Rugby is pleased to announce a new partnership with Weissach INEOS Grenadier, connecting the province’s rugby community with one of the most capable and purpose-built 4×4 vehicles on the market today.
Engineered for rugged terrain and designed to handle the world’s toughest conditions, the INEOS Grenadier mirrors the resilience, grit, and adventurous spirit that define rugby in British Columbia.
The partnership brings together two organisations that share a commitment to community, performance, and pushing boundaries – whether on the pitch or across BC’s wild backroads. Weissach INEOS Grenadier, the exclusive retailer for the region, welcomes the opportunity to support BC Rugby members and the wider rugby culture throughout the province.
As part of this launch, BC Rugby Members will receive exclusive benefits, including special Member-only perks with the purchase of a new Grenadier:
Special access to the West Coast Grenadier Community, and
$500 in-store credit toward West Coast Grenadier Expeditions – immersive 4×4 experiences exploring British Columbia’s diverse terrain.
Get a glimpse of the expedition here!
The INEOS Grenadier is designed to go where few vehicles can; across mountain passes, through unpredictable weather, and into the heart of BC’s wilderness. For BC Rugby members, families, and supporters who live active, outdoor lives, the Grenadier offers strength, reliability, and a spirit of exploration that matches the game we love.
Exclusive instructions for claiming BC Rugby Member benefits will be provided via email.
BC Rugby looks forward to working closely with Weissach INEOS Grenadier throughout the season and beyond, bringing new opportunities and experiences to the rugby community across the province.