Category: 3. Business

  • Ocado shares fall 17% after US partner announces warehouse closures | Ocado

    Ocado shares fall 17% after US partner announces warehouse closures | Ocado

    The value of online grocer Ocado has fallen sharply after Kroger, it’s major partner in the US, announced the closure of three warehouses using the UK company’s high-tech equipment.

    Ocado signed a deal to build 20 automated warehouses – known as customer fulfilment centres – for Kroger, the US’s fourth largest retailer, in 2018. Eight of those facilities are currently operating with two more planned for next year. The deal was seen as a major part of Ocado’s plan to sell its online grocery delivery technology internationally.

    However, on Tuesday, Kroger said sites in Frederick in Maryland, Pleasant Prairie in Wisconsin, and Groveland in Florida would close in January. Shares in Ocado were down more than 17% on Tuesday after the announcement, wiping about £350m off the value of the company.

    Kroger said that after reviewing its set-up it had “identified opportunities to optimise its fulfilment network”.

    It added that it would now move towards a “hybrid fulfilment network” testing out “capital-light, store-based automation in high-volume geographies” while continuing to use automated warehouse processing of online orders where it sees “higher density of demand”. It noted that it had recently expanded its relationship with quick delivery service providers DoorDash, Instacart and Uber Eats, which take goods directly from stores on bikes, mopeds and other small vehicles.

    Clive Black, a retail analyst at Shore Capital, described Kroger’s announcement as a “near knockout punch” for Ocado, prompting the share price to fall below the 180p price at which it debuted on the London stock market in 2018.

    He said the online grocery technology supplier “is being marginalised as most of its customer fulfilment centres do not work economically in the USA or the mass-market first world in truth.”.

    While centralised, automated warehouses may work effectively to manage home deliveries of groceries in densely populated and affluent urban locations, according to Black, he said Kroger’s actions suggested that the size of Ocado’s total potential market “has been blitzed”.

    “We had expected Kroger to trundle on, not close [warehouses], as part of its ongoing review, a dreadful acclamation of what Morrison, Waitrose and others already knew: capital intensive, centralised fulfilment of food to a dispersed mass-market customer does not financially work.”

    Ocado said it expected to receive more than $250m (£190m) in compensation for fees related to the early closure of the sites but its fee revenue would take a $50m hit in the financial year to December 2026.

    skip past newsletter promotion

    “Ocado continues to support Kroger to optimise logistics operations and drive profitable volume growth in these remaining sites, with constructive ongoing discussions around further use of Ocado’s technology to support Kroger,” the British company said in a statement.

    It added that it “expects significant growth in the US market, both with [warehousing] and store based automation.”

    Continue Reading

  • Decades later, a Yale chemist’s water simulations continue to make waves

    Decades later, a Yale chemist’s water simulations continue to make waves

    Like many successful researchers, Jorgensen has been guided by scientific pursuits his entire life.

    As a kid growing up in Port Washington, a hamlet on the western side of Long Island, and later in Sherman, Connecticut, he conducted scores of experiments with his trusty A.C. Gilbert chemistry set, tromping off to the local drug store regularly to replenish his supply of potassium nitrate. In high school, at Phillips Exeter Academy in New Hampshire, he took AP Chemistry and taught himself how to write computer code in BASIC.

    He went on to graduate from Princeton in three years, learning the computer language FORTRAN in the basement of the Frick Lab and conducting work for his first co-authored study in the Journal of the American Chemical Society. Then it was on to Harvard for graduate school (where he worked with eventual Nobel winner EJ Corey), and to Purdue to begin his teaching and independent research career.

    He soon came to focus his research on the need for better simulations of systems in solution.

    “I realized very quickly that I wanted to study reactions in liquids and investigate the way molecules recognize each other in solution, which can lead eventually to drug design,” Jorgensen said. “In drug design you typically have an inhibitor, a small molecule that is binding to a disease-causing protein, which then disrupts the function of that protein.”

    To do this work, he needed computing know-how and processing muscle. So, he taught himself statistical mechanics to go with his working knowledge of programming, and he got together funding for a research computer.

    In the late 1970s, Purdue had two CDC 6400s (an early mainframe computer built by the Control Data Corporation) in its computer center. That meant two processors for 40,000 students and faculty, compared to today when the average laptop computer has eight processors. But times were changing.

    “Fortunately, I was in the right place at the right time, because by the early 1980s computer resources became more available,” Jorgensen said. “You could have your own computer in your lab, if you could find the money to buy it.”

    He and his research group purchased a Harris 80 — a tall cabinet computer that occupied its own room with an air conditioner and a printer. Much of the funding for it came from a 1978 grant to Jorgensen from the National Science Foundation (NSF).

    NSF was essential to the early work I did, including the water models,” Jorgensen said. “NSF funded basic science research that led to many of the technologies and therapeutics we take for granted today.”

    Meanwhile, the older guard of scientists, who’d previously held computer modeling at arm’s length began to see the value of adapting to changing technology. “In the late 1970s you had people still trying to do paper-and-pencil theory work, but you also had the new people coming in with their computers,” Jorgensen said. “There was some difficulty in my being accepted by some of the theoretical chemists at the time, who were rather dismissive of what I was doing. I had to prove I could do something useful.”

    Continue Reading

  • Check Point Software Collaborates with Microsoft to Deliver Enterprise-Grade AI Security for Microsoft Copilot Studio

    Check Point® Software Technologies Ltd. (NASDAQ: CHKP), a pioneer and global leader of cyber security solutions, today announced it is collaborating with Microsoft to deliver enterprise-grade AI security for Microsoft Copilot Studio. The collaboration enables enterprises to safely build and deploy generative-AI agents with continuous protection, compliance, and governance integrated directly into their development workflows.

    The integration with Copilot Studio brings together Check Point’s AI Guardrails, Data Loss Prevention (DLP), and Threat Prevention technologies, extending its end-to-end AI security stack to safeguard Copilot Studio during agent runtime. The result is continuous protection for every AI agent, ensuring safe and compliant innovation.

    As enterprises rapidly adopt AI agents to drive productivity, new risks emerge, from prompt injection and data leakage, to model misuse and compliance drift. These agents connect to sensitive data and third-party tools, expanding the attack surface beyond traditional controls. By using Check Point’s runtime security and governance capabilities to extend Copilot Studio’s protections, organizations gain full visibility and control to innovate confidently and securely.

    “The rapid adoption of AI agents brings not only innovation and efficiency, but also new security challenges, particularly around maintaining data integrity and preventing misuse of sensitive information,” said Nataly Kremer, Chief Product Officer at Check Point. “Together with Microsoft, we’re providing advanced continuous protection and governance directly into Microsoft Copilot Studio, ensuring that every AI interaction, including autonomous actions within the enterprise, remains secure, compliant, and aligned with enterprise policies.”

    Key capabilities include:

    • Runtime AI Guardrails – Continuous runtime protection for every agent built with Copilot Studio, preventing prompt injection, data leakage, and model misuse
    • Data Loss and Threat Prevention – Integrated DLP and Threat Prevention engines that safeguard sensitive data across every tool call and workflow inside Copilot Studio
    • Enterprise-Grade Scale and Precision – A unified security bundle designed for large-scale deployments, delivering consistent protection and low latency without impacting performance
    • Seamless Protection for Productivity – Allows organizations to fully use the power of Copilot Studio while maintaining runtime visibility, compliance, and prevention-first protection

    “As organizations embrace Microsoft Copilot Studio to build AI agents tailored to their business, security and compliance are paramount,” said David Blyth, VP Engineering, Copilot Studio, Microsoft. “Our relationship with Check Point helps customers innovate confidently, combining Microsoft’s trusted Copilot platform with Check Point’s prevention-first AI security to keep sensitive data and AI workflows protected by design.”

    This collaboration reinforces Check Point’s leadership in securing the AI-powered enterprise and marks another milestone in its mission to protect the full AI lifecycle – from model development to runtime execution, and from organizational applications to employee usage across the workspace.

    For more information about Check Point’s AI security and its integration with Copilot Studio, visit our website.

    Follow Check Point on LinkedInX (formerly Twitter), Facebook, YouTube and our blog.

    About Check Point Software Technologies Ltd. 

    Check Point Software Technologies Ltd. (www.checkpoint.com) is a leading protector of digital trust, utilizing AI-powered cyber security solutions to safeguard over 100,000 organizations globally. Through its Infinity Platform and an open garden ecosystem, Check Point’s prevention-first approach delivers industry-leading security efficacy while reducing risk. Employing a hybrid mesh network architecture with SASE at its core, the Infinity Platform unifies the management of on-premises, cloud, and workspace environments to offer flexibility, simplicity and scale for enterprises and service providers.

    Legal Notice Regarding Forward-Looking Statements
    This press release contains forward-looking statements. Forward-looking statements generally relate to future events or our future financial or operating performance. Forward-looking statements in this press release include, but are not limited to, statements related to our expectations regarding our products and solutions, our expectations regarding future growth, the expansion of Check Point’s industry leadership, the enhancement of shareholder value and the delivery of an industry-leading cyber security platform to customers worldwide. Our expectations and beliefs regarding these matters may not materialize, and actual results or events in the future are subject to risks and uncertainties that could cause actual results or events to differ materially from those projected. The forward-looking statements contained in this press release are also subject to other risks and uncertainties, including those more fully described in our filings with the Securities and Exchange Commission, including our Annual Report on Form 20-F filed with the Securities and Exchange Commission on March 17, 2025. The forward-looking statements in this press release are based on information available to Check Point as of the date hereof, and Check Point disclaims any obligation to update any forward-looking statements, except as required by law.

     


    Continue Reading

  • Hexham’s Haydon Bridge High School to open as strike action ‘paused’

    Hexham’s Haydon Bridge High School to open as strike action ‘paused’

    James RobinsonLocal Democracy Reporting Service

    Iain Buist/NCJ Media A blue sign that reads in white lettering, 'Haydon Bridge High School' on a patch of grass covered in brown leaves. It stands in front of a tree. To the right is a path leading to a school. Iain Buist/NCJ Media

    The school in Northumberland has said it will be able to remain open

    A school will open on Wednesday after one of the two main teachers’ unions agreed to pause strike action.

    Union bosses said teachers and support staff at Haydon Bridge High School in Northumberland would walk out for two days – on 19 and 25 November – over what they say is a “failure” to tackle “disruptive behaviour”.

    In a letter to parents, the school said it would remain open as the NASUWT agreed to pause the strike. It has previously said officials had “deemed behaviour to be as good as what is seen in most high schools”.

    The National Education Union (NEU) said its members remained committed to the walkout.

    The unions said employees had repeatedly raised fears about pupil behaviour and the impact it was having on safety, teaching and learning.

    ‘Our wonderful students’

    A letter from the school, seen by the Local Democracy Reporting Service, described Haydon Bridge High School as a “brilliant” school that was “small” and “caring”, adding that it “truly aims to serve its local community”.

    It read: “Both the school and the unions are keen to bring this dispute to an end. The best way to judge a school’s behaviour is by looking at the data and seeing it in action.”

    It said Ofsted “rightly identified” suspensions were too high but since the introduction of new systems, suspensions were down by more than 30% compared to this time last year.

    The school says referrals to its restart room are also down, and invited parents to pay a visit to “view our wonderful students engaged in their learning”.

    NASUWT declined to issue any further comment but Sean Kelly, branch secretary of Northumberland NEU, said members remained committed to taking strike action.

    He said he had written to the school and Northumberland County Council to reiterate the NEU remained on strike on Wednesday.

    “We have a meeting with our members this evening to speak to them and see if they are still willing to take strike action, but the overwhelming message last night was that they were,” he said.

    “They were not impressed at all with more promises, we have had this for 13 months and nothing has changed. Employers don’t call off a strike, workers do.”

    Continue Reading

  • Mount Sinai Medical Center Achieves HIMSS EMRAM Stage 7 Validation

    Mount Sinai Medical Center Achieves HIMSS EMRAM Stage 7 Validation

    South Florida hospital recognized at the highest level of digital care excellence

    MIAMI BEACH, Fla., Nov. 18, 2025 /PRNewswire/ — Mount Sinai Medical Center has achieved HIMSS Electronic Medical Record Adoption Model (EMRAM) Stage 7 validation, the highest level of digital health maturity recognized by the Healthcare Information and Management Systems Society (HIMSS). This distinction places Mount Sinai among a limited number of hospitals that have fully optimized their electronic medical record systems to support safer, faster, and more coordinated patient care.

    What this means for patients:

    Your care team now has a complete, real-time picture of your health—whether you’re in the hospital, visiting a specialist, or recovering at home. This reduces delays, helps avoid repeated tests, supports more accurate treatment decisions, and ensures that your doctors and nurses are always working from the same information.

    “Achieving Stage 7 reflects our commitment to clinical excellence and continuous improvement,” said Gino R. Santorio, President and CEO of Mount Sinai Medical Center. “Investing in digital innovation is ultimately about improving patient safety and the experience of care. It ensures our clinicians have the right information at the right time to support the highest-quality decisions.”

    The EMRAM Stage 7 designation recognizes Mount Sinai’s system-wide success in:

    • Allowing patients to access their health information through digital tools such as MyChart, remote monitoring, and health reminders, making care more convenient and accessible
    • Improving efficiency for clinicians by providing tools for clinical decision-making, real-time alerts, evidence-based order sets, and AI-supported workflows
    • Providing a fully optimized, secure, and interoperable system that leverages data analytics to ensure the best outcomes for our patients 

    “This milestone represents years of strong collaboration between our clinical and technology teams,” said Tom Gillette, Chief Information Officer at Mount Sinai. “Our priority has been designing digital systems that truly support the clinical workflow, improving clarity, reducing administrative burden, and giving clinicians better insight into each patient’s needs.”

    The EMRAM model is used globally to evaluate how effectively hospitals use digital systems to enhance patient care, clinician support, data security, and organizational performance.

    About Mount Sinai Medical Center
    Founded in 1949, Mount Sinai Medical Center is the largest independent, private, not-for-profit teaching hospital in South Florida. Mount Sinai’s mission is to provide quality health care to a diverse community enhanced through teaching, research, charity care, and financial responsibility. Mount Sinai’s Centers of Excellence combine technology, research, and academics to provide innovative and comprehensive care in cardiology, neuroscience, oncology, urology, and orthopedics. One of the original statutory teaching hospitals in the state of Florida, Mount Sinai is the hospital of choice for those who seek the level of expertise and care that only a teaching hospital can offer. Mount Sinai currently offers ten convenient locations in Miami-Dade County, including three emergency centers, and four specialty care offices in Monroe County. 

    SOURCE Mount Sinai Medical Center

    Continue Reading

  • Nvidia is set to report earnings Wednesday. These stocks could be moved by the results

    Nvidia is set to report earnings Wednesday. These stocks could be moved by the results

    Continue Reading

  • Journal of Medical Internet Research

    Journal of Medical Internet Research

    Key Takeaways

    • Certain features of large language models (LLMs) may amplify delusional beliefs and contribute to harm.
    • A recent simulation study highlights the role of sycophancy, demonstrating that all LLMs, to varying extents, may fail to adequately challenge delusional content.
    • Further empirical research and validation, transparency, and policy are needed to understand and build safeguards around LLM use and its impact on mental health.

    We’re certainly not in Kansas anymore, but are we in a Lovecraft novel?

    An old artificial intelligence (AI)–insider joke with an anxious edge and new relevance, a shoggoth is a globular Lovecraftian monster described as a “formless protoplasm able to mock and reflect all forms and organs and processes” []. The idea is that a shoggoth’s true nature is inscrutable and evasive—not unlike large language models (LLMs), which can be trained to appear superficially anthropomorphic, safe, and familiar, yet can behave in unexpected ways or lead to unanticipated harms [,].

    Some such harms include reports of unhealthy romantic attachments, self-harm, suicide, and murder potentially associated with chatbot use [-]. These phenomena—dubbed “AI psychosis”— have been the focus of increasing interest and concern in the media [,], attracted academic commentary [,], and have most recently led to several lawsuits being filed [].

    The term AI psychosis is being used as a shorthand to describe a range of psychological disturbances that appear to emerge in the context of LLM use. While provocative, it’s somewhat imprecise in implying that AI is causing diagnosable psychotic disorders or that AI psychosis constitutes a distinct diagnostic entity—the science is still out.

    Early clinical commentary—including a prescient editorial on the topic before reports even emerged []—does, however, suggest that LLMs may be contributing to the maintenance, reinforcement, or amplification of paranoid, false, or delusional beliefs, especially in circumstances involving prolonged or intensive LLM use and underlying user vulnerabilities [,,-].

    “When using generative chatbots,” says Dr Kierla Ireland, a Clinical Psychologist at the Canadian Department of National Defense, “there’s a risk of confirmation bias wherein the user’s own perspective is reflected back to them. This may be experienced as validating or soothing, which may lead to more engagement, more confirmation bias, and so on.”

      Dr Kierla Ireland, Clinical Psychologist

    This is not unlike processes that can occur with other types of technology, like social media [,]—but while not a new threat, certain features of the technology may make AI psychosis a more pernicious one.

    Sycophancy, for example, is a well-known—and, some speculate, intentionally designed—feature of chatbots that can increase both user engagement and potential risk [-]. Dr Josh Au Yeung, Neurology Registrar at King’s College London, Clinical Lead at Nuraxi.ai, and host of the Dev & Doc podcast, notes that the anthropomorphic nature of LLMs adds potency: “You end up trusting them [LLMs], and attributing emotions to them. If a stranger came to you and they were so sycophantic on the streets, you’d run for your life, right? But because you have this connection with them—that’s what makes it extra dangerous.”

       Dr Josh Au Yeung, Neurology Registrar

    In their recent preprint [], Dr Au Yeung and his colleagues endeavored to provide one of the first empirical demonstrations of how LLMs may amplify delusions and contribute to what they more precisely term “LLM-induced psychological destabilization.” Their study aims to quantify the “psychogenicity” of different LLMs using simulated conversations and a safety benchmark they’re calling psychosis-bench.

    Across 16 scenarios constructed to reflect the development of different types of delusions and to map roughly onto AI psychosis media reports, the researchers have evaluated the extent to which each of the LLMs’ responses represent a delusion confirmation, harm enablement, or safety intervention.

    The team’s initial conclusions are revealing: all models appear to demonstrate some degree of “psychogenicity.” On average, and especially in more subtle scenarios, models frequently failed to actively challenge potential delusions and refuse harmful requests, and frequently missed opportunities to provide safety interventions.

    The performance of the different models varied widely, however, with Anthropic’s Claude 4 outperforming every other model on the three indices, and Google’s Gemini 2.5 Flash bringing up the rear on all three. Dr Au Yeung isn’t surprised by this.

    “It’s no surprise that the only company which is publishing on AI safety and sycophantic behavior performs the best,” he says. “Clearly the stuff they do—the constitutional AI, the safety side, the way they prompt-tune the model—is having some effects on its performance.” He says he hopes other companies will start thinking along these lines and has shared his code [] so that they can, noting in particular the need to address sycophancy. “Unlike most other shortcomings seen in LLMs,” he says, “sycophancy is not a property that is correlated to model parameter size; bigger models are not necessarily less sycophantic,” suggesting that more targeted safety research and model alignment strategies are needed [].

    As the team works on revising and strengthening the methods to support their findings, Dr Au Yeung reports that what they have learned from their study is already having a positive impact.

    His team’s research was featured in the widely read annual State of AI Report for 2025 []. And at his current company, Nuraxi.ai, they’re in the process of applying psychosis-bench to their user-facing chatbot.

    The responsibility for preventing and dealing with psychological destabilization associated with LLM use is not on consumers or patients, Dr Au Yeung says. “The onus for us [developers] is to actually focus on the LLM and put in safeguards to stop this phenomenon from happening.”

    Dr Ireland shares this sentiment, noting “the vital importance of incorporating safeguards to promote critical thinking; that is, for users to be shown multiple perspectives, including those that may counter deeply-held beliefs and cause discomfort.”

    Whether, how, and how effectively other developers will implement these kinds of safeguards remains to be seen. Dr Au Yeung acknowledges the risk that some safety benchmarks may ultimately be “gamed” or treated as public relations exercises by bad-faith actors incentivized by profit rather than genuine concern for the public good.

    Camille Carlton, Policy Director at the Center for Humane Technology, shares similar concerns. While she places responsibility for implementing safeguards—and for harms caused by failing to implement them—with those who develop LLMs and AI technology, she also advocates for meaningful regulation and oversight.

        Ms Camille Carlton, Policy Director

    “Developers…not only have asymmetric access to information about the products they create, they also have the most control over the way the product is built, how those choices impact users downstream, and how to make changes to the product that could make it safer,” she says. However, “recent product announcements—like OpenAI claiming to prioritize kids’ safety while simultaneously launching erotic content—demonstrate that unless compelled to, these companies will not act in the public’s best interest on their own. Policymakers should support common-sense approaches that apply to other consumer products, like product liability.”

    Continuing to comment on an October 14 social media post in which OpenAI founder Sam Altman stated that the company has developed news tools and been able to “mitigate the serious mental health issues” in the current ChatGPT model and intends to incorporate erotica for ”verified adults” in December [], Ms Carlton advises against leaving developers to “grade their own homework.”

    While steps are being taken in the right direction—for example, an October 27 article from OpenAI highlights collaboration with a network of external mental health experts to improve ChatGPT’s responses in sensitive conversations []—further independent verification is needed.

    “There’s a continuous pattern of AI companies making safety claims without allowing third-party researchers to independently test and verify them,” Ms Carlton says, adding that “we need transparency about what progress has actually been made and evidence beyond anecdotal reports.”

    When it comes to the phenomenon of AI psychosis (or psychological destabilization associated with LLM use), AI may be less shoggoth and more mirror—the kind you find at a carnival, one that may amplify and distort human tendencies in ways that can be harmful.

    But whether Lovecraftian monster or carnival mirror, to Ms Carlton’s points, further empirical research and validation, transparency, and policy are needed to understand and build safeguards around LLM use and its impact on mental health. Cross-talk—between researchers, developers, mental health professionals, policymakers, and the public—will be essential for finding effective solutions that maximize its potential benefits and mitigate its potential harms.

    In the meantime, critical thinking and reasonable caution are warranted in how we use, interpret, and integrate these tools in our lives and practices.

    None declared.

    © JMIR Publications. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.Nov.2025.

    Continue Reading

  • Microsoft, Nvidia invest in Anthropic in cloud services deal | Technology News

    Microsoft, Nvidia invest in Anthropic in cloud services deal | Technology News

    The announcement underscores AI industry’s insatiable appetite for computing power as companies race to build systems that can rival or surpass human intelligence.

    Microsoft and Nvidia plan to invest in Anthropic under a new tie-up that includes a $30bn commitment by the Claude maker to use Microsoft’s cloud services, the latest high-profile deal binding together major players in the AI industry.

    Nvidia will commit up to $10bn to Anthropic and Microsoft up to $5bn, the companies said on Tuesday, without sharing more details.

    Recommended Stories

    list of 4 itemsend of list

    A person familiar with the matter said both the companies have committed to investing in Anthropic’s next funding round.

    The announcement underscores the AI industry’s insatiable appetite for computing power as companies race to build systems that can rival or surpass human intelligence. It also ties major OpenAI-backer Microsoft, as well as key AI chip supplier Nvidia, closer to one of the ChatGPT maker’s biggest rivals.

    “We’re increasingly going to be customers of each other. We will use Anthropic models, they will use our infrastructure and we’ll go to market together,” Microsoft CEO Satya Nadella said in a video. He added that OpenAI “remains a critical partner”.

    The move comes weeks after OpenAI unveiled a sweeping restructuring that moved it further away from its non-profit roots, giving it greater operational and financial freedom.

    The startup has since then announced a $38bn deal to buy cloud services from Amazon.com as it reduces reliance on Microsoft. Its CEO, Sam Altman, has said OpenAI is committed to spending $1.4 trillion to develop 30 gigawatts of computing resources – enough to roughly power 25 million US homes.

    Still, three years after ChatGPT’s debut, investors are increasingly uneasy that the AI boom has outrun fundamentals. Some business leaders have noted that circular deals – in which one partner props up another’s revenue – add to the bubble risk.

    “The main feature of the partnership is to reduce the AI economy’s reliance on OpenAI,” D A Davidson analyst Gil Luria said of Tuesday’s announcement.

    “Microsoft has decided not to rely on one frontier model company. Nvidia was also somewhat dependent on OpenAI’s success and is now helping generating broader demand.

    AI industry consolidating

    Founded in 2021 by former OpenAI staff, Anthropic was recently valued at $183bn and has become a major rival to the ChatGPT maker, driven by the strong adoption of its services by enterprise customers.

    The Reuters news agency reported last month that Anthropic was projecting to more than double and potentially nearly triple its annualised revenue run rate to around $26bn next year. It has more than 300,000 business and enterprise customers.

    As part of Tuesday’s move, Anthropic will work with Nvidia on chips and models to improve performance and commit up to 1 gigawatt of compute using Nvidia’s Grace Blackwell and Vera Rubin hardware. Industry executives estimate that one gigawatt of AI computing can cost between $20bn and $25bn.

    Microsoft will also give Azure AI Foundry customers access to the latest Claude models, making Claude the only frontier model offered across all three major cloud providers.

    “These investments reflect how the AI industry is consolidating around a few key players,” eMarketer analyst Jacob Bourne said.

    Despite the looming deal, Microsoft shares are down 3.2 percent in midday trading. Nvidia is also trading 1.9 percent lower than at the market open, and Amazon has fallen 4 percent. Tech stocks remain under pressure after a cloud services outage earlier on Tuesday. Neither OpenAI nor Anthropic is publicly traded.

    Continue Reading

  • Klarna says AI drive has helped halve staff numbers and boost pay | Buy now, pay later

    Klarna says AI drive has helped halve staff numbers and boost pay | Buy now, pay later

    Klarna has claimed that AI-related savings have allowed the buy now, pay later company to increase staff salaries by nearly 60%, but hinted it could slash more jobs after nearly halving its workforce over the past three years.

    Chief executive Sebastian Siemiatkowski said headcount had dropped from 5,527 to 2,907 since 2022, mostly as a result of natural attrition, with departing staff replaced by technology rather than by new staff members.

    The figures add to the impact of an internal artificial intelligence programme, which had steadily reduced its use of outsourced workers including those in customer service, with technology now carrying out the work of 853 full-time staff, up from 700 earlier this year.

    It meant the company, which was founded in Sweden in 2005, had managed to increase revenues by 108% while keeping operating costs flat. Siemiatkowski told analysts on an earnings call on Tuesday that it was “pretty remarkable, and unheard of as a number, among businesses”.

    He explained that Klarna has not hired “for a few years”. However, some of the resulting cost-savings had been used to increase pay for remaining staff, with average compensation – including employee-related taxes and pension contributions – rising by 60% over the past three years.

    “We have made a commitment to our employees that all of these efficiency gains, and especially the applications of AI, should also, to some degree, come back in their pay cheques so that they are fully … incentivised [and] aligned with the investors, to drive these changes through the company.”

    Average compensation for each employee has jumped from $126,000 (£96,000) in 2022 to $203,000 today, Klarna said.

    Siemiatkowski, who is a shareholder in a number of AI firms including OpenAI and Perplexity through his family investment firm Flat Capital, said he hoped to continue increasing a metric measuring revenue per employee, suggesting a further reduction in staff numbers in the years ahead.

    “We’re now at $1.1m per employee, and we hope to continue to do that acceleration.”

    skip past newsletter promotion

    Siemiatkowski warned this week against costly investments in datacentres to power AI, telling the Financial Times that he expected the technology would become more efficient over time.

    The comments came as Klarna reported a 26% jump in revenues in the three months to the end of September to $903m, beating analysts’ expectations of $882m.

    But the Swedish business reported a $95m loss over the period, significantly higher than the $4m loss last year. Klarna said this was primarily driven by changes to accounting standards that it had to follow in the US, after its decision to list its shares on the New York stock exchange in September.

    Continue Reading

  • Journal of Medical Internet Research

    Journal of Medical Internet Research

    Scientific advances have significantly influenced the evolution of education and training in recent decades. Emerging technologies such as technology-enhanced learning and simulation-based training have played a crucial role in improving the learning experience of practitioners and have become essential in modern education systems [].

    Traditionally, surgical training has mainly focused on gaining experience through a significant number of surgeries and direct involvement, in which trainees receive less supervision from experienced surgeons as they gain competence and eventually become capable of doing surgeries independently []. This model embodies the “see one, do one, teach one” approach []. An experienced surgeon first executes a procedure, which the trainee observes. Then, under supervision, the trainee replicates the process. Finally, upon achieving competence, the trainee is expected to instruct others on how to perform it. This approach underscores the importance of direct observation, practical experience, and the ability to transmit information and expertise to future generations of medical practitioners. However, it also raises inquiries regarding the diversity of learning experiences, the consistency of the skills acquired, and the stress that it places on seasoned surgeons and trainees to quickly comprehend and transmit complex procedures involving inherent risks [,]. Acquiring and improving skills in the field of medicine are complex processes that last throughout a physician’s career. Since the 1990s, ongoing discussions have focused on enhancing teaching practices [].

    Researchers have developed various simulators and training platforms to address these challenges and the demands of an expanding spectrum of surgical operations []. These tools enable trainees to develop expertise in different surgical procedures and provide the benefit of unlimited practice opportunities, customizable difficulty levels, and cost-effective solutions that emulate the difficulties of actual surgery procedures [,]. Furthermore, these platforms offer a secure and interactive setting that promotes learning through experimentation, enabling risk-free practice. Nevertheless, there remains considerable potential to improve the effectiveness of these training setups [,].

    As technological advancements continue, interest in incorporating artificial intelligence (AI) into medical training has also increased []. AI, with its capacity to emulate certain aspects of human cognition, has the potential to enhance educational outcomes and transform traditional methods of training and teaching []. It enables the creation of the next generation of autonomous systems to execute tasks usually performed by individuals, representing a substantial advancement in computer science. Furthermore, AI algorithms could assist in enhancing conceptual understanding, facilitating virtual practice, and offering analytical feedback on performance. Through the use of data-driven insights and predictive analytics, AI has the potential to revolutionize surgical training, offering customized and efficient learning pathways.

    This scoping review aims to map and analyze current applications of AI in surgical training, assessment, and evaluation, identifying the most common surgical procedures, AI techniques, and training setups while highlighting gaps and opportunities for future research. The following research questions guided this study:

    1. What are the specific surgical procedures where AI algorithms are most frequently applied in surgical training?
    2. Which AI techniques have been used in surgical training and evaluation?
    3. How are AI techniques being used to assess and improve surgical training?
    4. How do AI applications in surgical training affect the learning curve of surgical residents and fellows?

    The paper is organized as follows: the “Methods” section outlines the methodology used to carry out this scoping review. The “Results” section provides a comprehensive overview of the findings, shows additional findings, and identifies potential areas for opportunity. The “Discussion” section presents an outline of the research questions, shows additional findings, identifies potential areas for opportunity, acknowledges the limits of the current review, and concludes with final thoughts and directions for future research in the realm of AI in surgical education.

    Although there are different definitions and approaches to what AI is, this study is particularly interested in Russell and Norvig’s [] approach to systems that act rationally, that is, systems that act intelligently and rationally, ideally in the best possible way given the available information. AI is a disruptive technology that is reshaping education, facilitating a shift toward more efficient teaching protocols []. It enables machines to imitate various complex human skills, and AI-based techniques are typically employed in the following areas:

    • Expert systems “emulate the behavior of a human expert within a well-defined, narrow domain of knowledge” [].
    • Intelligent tutoring systems emulate “model learners’ psychological states to provide individualized instruction. They… help learners acquire domain-specific, cognitive, and metacognitive knowledge” [].

    AI can be subdivided into machine learning (ML), which further includes deep learning (DL). ML aims to “perform intelligent predictions based on a data set” []. It uses statistical, data mining, and optimization methods to design models that can identify patterns and make predictions with higher precision than human experts. In this field, there are 3 fundamental ML paradigms:

    • Supervised learning uses input data and their matching labeled output to train models []. A labeled output is data that has been assigned labels to add context; consequently, the objective of supervised learning is to learn and predict outputs for unseen data based on the initial input-output pairs.
    • Unsupervised learning involves working with unlabeled data []. The algorithms autonomously attempt to discern patterns and relationships within the data.
    • Reinforcement learning uses an autonomous entity known as an agent, which learns to make decisions by performing activities inside an environment to reach a specific objective []. The feedback the agent receives in the form of rewards or penalties serves as a guide as it iteratively refines its strategy to achieve optimal performance.

    Finally, DL is a branch of machine learning that uses artificial neural networks to replicate the sophisticated processes of the human brain []. Algorithms in this category learn to identify patterns and comprehend large datasets. DL is highly efficient because it can automatically extract and learn high-level characteristics from data, reducing the need for manual feature selection. It excels at handling complex tasks such as image and audio recognition, natural language processing, image generation, and data-driven prediction.

    Numerous models have been developed within AI to address challenging problems and tasks across different sectors and research fields. Each approach provides certain advantages specific to the type of data to be processed and the analytical needs (see ).

    Textbox 1. Approaches and advantages specific to the type of data to be processed and analytical needs.
    • Regression analysis forecasts a continuous output by considering one or more predictor variables [].
    • Cluster analysis methods group similar items based on shared characteristics. These algorithms help identify patterns within the data [].
    • Support vector machine (SVM) categorizes data by identifying the optimal boundary that divides distinct groups [].
    • Decision trees analyze data by using a series of questions and rules, resulting in the generation of predictions or classifications [].
    • Random forest (RF) uses a set of decision trees to enhance predictive precision and mitigate overfitting, a phenomenon in which predictions are accurate for training data but not for new data [].
    • Bayesian networks model the relationships and dependencies among variables using probability theory []. They are represented through a directed acyclic graph. This approach facilitates the prediction of outcomes based on established conditions.
    • Markov models represent the transitions between states in a system using probabilities []. They are characterized by the Markov property, where the future state depends only on the current state and not on the sequence of events that preceded it.
    • Fuzzy systems are based on fuzzy logic, which extends classical Boolean logic to handle the concept of partial truth, where truth values can range between completely true and completely false [].
    • Neural networks (NNs) are inspired by the human brain. They rely on interconnected nodes to process data and detect connections []. This model can be subdivided based on its specific use.
      • Convolutional neural networks (CNNs) process data that displays a grid-like structure, such as images [].
      • Recurrent neural networks (RNNs) predict sequences []. They use their internal state (memory) to process sequences of inputs, such as language or time series data.
      • Long short-term memory (LSTM) networks are a type of RNN that can learn long-term dependencies []. They are ideal for activities that require comprehension of long sequences.
      • Deep neural networks (DNNs) consist of multiple interconnected layers of neurons []. These networks can learn from extensive amounts of data and detect complex patterns.
      • Transformers are a type of network that relies on self-attention mechanisms, allowing it to weigh the importance of different parts of the input data [].
      • Large language models (LLMs) are advanced types of networks that have been trained on vast datasets of words and sentences []. They produce coherent, human-like responses to written text by selecting the most probable next words.

    These AI models highlight the potential of this technology in educational contexts. The United Nations Educational, Scientific and Cultural Organization indicates that digital technologies have the potential to complement, enrich, and transform education, aligning with the United Nations’ Sustainable Development Goal 4 (SDG 4) for education and providing universal access to learning []. Consequently, the integration of AI in surgical training could boost independence, self-study, engagement, and motivation.

    Overview

    This review adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews; see ) statement, designed for publications in the health and medical sciences []. The review process was organized following a structured protocol consisting of four stages: (1) planning, which involved establishing the criteria for the search and databases to be used; (2) conducting, which entailed performing the search and applying filters for the scoping review; (3) reporting, which included compiling the studies that met the criteria and were included in the review. During stages 1 and 2, the research papers were compiled, and the initial screening process was conducted, focusing solely on papers that fall within the scope of the review and were published in peer-reviewed scientific journals. Stage 3 consists of identifying the main characteristics that distinguish the contributions and unique features of each article that has passed the initial screening process. Subsequently, the necessary analysis was performed to present the summary of the research and compile tables and figures. The starting date for stages 1 and 2 of the scoping review was February 27, 2024, and it concluded on March 18, 2024.

    Information Sources

    A total of 3 databases were selected to search for relevant studies: PubMed, Scopus, and Web of Science. The inclusion of Web of Science and Scopus databases consolidates information from other sources, such as IEEE Xplore, ScienceDirect, and SpringerLink. Therefore, they expand the scope of accessible academic literature. These platforms also provide search and analytical tools, making it easier to find pertinent studies and analyze trends. By using the 3 databases, the review considered articles with different AI models beyond the limitation of just focusing on clinical trials. By implementing this procedure, the scope of the review is expanded, enabling the identification of significant manuscripts to identify areas of opportunity in the field.

    Search Strategy

    A total of 4 keywords related to AI concepts and 4 keywords related to surgical training were selected based on the research questions. The selected keywords were converted into search strings and processed to be compatible with the advanced search tool of each database. shows the search strings used in this scoping review.

    Table 1. Search strings used in the advanced search tools of PubMed, Web of Science, and Scopus.
    Database String of keywords
    PubMed (“Artificial Intelligence”[MeSH] OR “AI” OR “machine learning” OR “deep learning”) AND (“Surgical Training” OR “surgical education” OR “surgical assessment” OR “surgical evaluation”)
    Web of Science TS = ((“artificial intelligence” OR “AI” OR “machine learning” OR “deep learning”) AND (“surgical training” OR “surgical education” OR “surgical assessment” OR “surgical evaluation”))
    Scopus (TITLE-ABS-KEY(“artificial intelligence” OR “AI” OR “machine learning” OR “deep learning”) AND TITLE-ABS-KEY (“surgical training” OR “surgical education” OR “surgical assessment” OR “surgical evaluation”))

    Eligibility Criteria

    Records retrieved from the initial search were examined to verify their compliance with the eligibility criteria and their alignment with the research questions ().

    Textbox 2. Eligibility criteria.

    The inclusion criteria for this review were limited to:

    • Studies published from January 2020 to March 2024 were reviewed to ensure the review covers the most recent advancements in artificial intelligence (AI) applications in surgical training.
    • Full-text articles available in English to allow thorough review and analysis.
    • Studies that focus on the application of AI in surgical training and evaluation, aligning with the research questions.

    For the exclusion criteria, this review applied the following criteria:

    • Studies not centered on the application of AI to assess or evaluate surgical training.
    • Nonscientific journal publications, non–full-text articles available online, and preprints.

    Data Charting and Synthesis

    After the inclusion and exclusion criteria had been applied during screening, data were charted for each included study covering three dimensions: (1) the surgical procedure (eg, laparoscopy, minimally invasive surgery, neurosurgery, and arthroscopy), (2) the AI model (eg, support vector machine [SVM], convolutional neural network [CNN], deep neural network [DNN], long short-term memory [LSTM], and transformers), and (3) the training setup (eg, simulation platforms, box trainers, surgical video analysis, in-vivo settings, virtual reality, and da Vinci system). These variables structured the subsequent evidence synthesis and guided the organization of results by procedure, technique, and setup. In addition, bibliographic fields, including year of publication and type of publication, were also charted to support descriptive reporting in the Results section. This structured approach enabled a descriptive and narrative synthesis aimed at elucidating how AI contributes to educational outcomes and skill acquisition in surgical training.

    Search Results and Study Selection

    presents the PRISMA-ScR flow diagram illustrating the complete selection process. The initial search identified 1400 records: 545 from PubMed, 288 from Web of Science, and 567 from Scopus, obtained using the search strings described in . After applying the publication date range from January 2020 to March 2024, a total of 461 records were excluded, leaving 939 for further screening. Duplicate removal eliminated 363 records, yielding 576 unique studies.

    Figure 1. Flow diagram of the scoping review process, illustrating the inclusion and exclusion criteria. AI: artificial intelligence; LLM: large language model.

    Subsequent filtering was conducted in stages to ensure methodological rigor and relevance. Database parameters were adjusted to retain only peer-reviewed journal articles and conference proceedings, excluding 260 reviews and 36 editorials that did not meet the inclusion criteria. A total of 280 records proceeded to qualitative screening. During this stage, the relevance of each article to the review objectives was reassessed. This process excluded 76 studies that, despite meeting database filters, were secondary reviews, surveys, or editorials; 9 non-English papers, 7 papers focused on nonsurgical training, and 18 papers described simulator development or validation without AI integration. Additional exclusions comprised 1 duplicate, 3 studies addressing “Data Collection Systems,” 11 centered on “LLMs in Non-Surgical Education,” and 99 that did not provide sufficient information about AI-enhanced surgical training. This filtering process excluded 224 additional studies, leaving 56 studies for the final synthesis and analysis.

    The characteristics of the 56 included studies are summarized in , organized across five domains: (1) surgical procedure (eg, laparoscopy, minimally invasive surgery [MIS], neurosurgery, and arthroscopy), (2) year of publication, (3) type of publication, (4) AI technique or model used (eg, SVM, CNN, DNN, LSTM, and transformers), and (5) training setup (eg, simulation platforms, box trainers, da Vinci system, surgical video analysis, and in vivo or virtual-reality environments). This structure enables direct comparison across specialties and methodological approaches, while supporting a descriptive and narrative synthesis of cross-cutting trends.

    Across the included studies, MIS, neurosurgery, and laparoscopy represented the majority of AI applications. ML and DL techniques were the most frequently used computational approaches, while simulation environments and box trainers constituted the primary training configurations. Collectively, these trends indicate a primary emphasis on risk-managed training environments that leverage accessible kinematic and video data. However, heterogeneity in studies and limited standardization of outcome measures remain persistent challenges, underscoring the need for unified evaluation frameworks in the future.

    Table 2. Characteristics of included studies: surgical procedures, artificial intelligence (AI) techniques, and training setups.
    Classification and references Year Type AI model Setup
    MISa skills
    Rashidi et al [] 2023 Journal Fuzzy systems Box trainer
    Fathabadi et al [] 2022 Conference Fuzzy systems Box trainer
    Deng et al [] 2021 Conference CNNb Box trainer
    Kulkarni et al [] 2023 Journal Clustering Box trainer
    Wu et al [] 2021 Journal MLc (unspecified) da Vinci system
    Brown and Kuchenbecker [] 2023 Journal Regression analysis da Vinci system
    Keles et al [] 2021 Journal ML (unspecified) Box trainer
    Koskinen et al [] 2020 Journal SVMd Box trainer
    Kasa et al [] 2022 Journal DLe (unspecified) Box trainer
    Gao et al [] 2020 Journal Clustering Box trainer
    Baghdadi et al [] 2020 Journal Clustering Box trainer
    Benmansour et al [] 2023 Journal CNNf+LSTMg da Vinci system
    Yanik et al [] 2023 Journal CNN Box trainer
    Lee et al [] 2024 Journal Markov chains Simulation training
    Hung et al [] 2023 Journal CNN+LSTM Simulation training
    Neurosurgery
    Ledwos et al [] 2022 Journal Clustering Simulation training
    Mirchi et al [] 2020 Journal SVM Simulation training
    Yilmaz et al [] 2024 Journal AI (unspecified) Simulation training
    Siyar et al [] 2020 Journal SVM Simulation training
    Reich et al [] 2022 Journal NNh Simulation training
    Natheir et al [] 2023 Journal ML (unspecified) Simulation training
    Siyar et al [] 2020 Journal Clustering Simulation training
    Yilmaz et al [] 2022 Journal DNNi Simulation training
    Fazlollahi et al [] 2022 Journal Tutoring system (unspecified) Simulation training
    Du et al [] 2023 Journal SVM Simulation training
    Dhanakshirur et al [] 2023 Conference CNN Training station
    Laparoscopy
    Kuo et al [] 2022 Journal DL (unspecified) Box trainer
    Shafiei et al [] 2023 Journal ML (unspecified) da Vinci system
    Lavanchy et al [] 2021 Journal CNN In-vivo setting
    Ryder et al [] 2024 Journal ML (unspecified) In-vivo setting
    Halperin et al [] 2024 Journal DL (unspecified) Box trainer
    Ebina et al [] 2022 Journal SVM Box trainer
    Hamilton et al [] 2023 Journal AI (unspecified) Training station
    Adrales et al [] 2024 Journal ML (unspecified) Surgical video
    Wang et al [] 2023 Conference AI (unspecified) Surgical video
    Arthroscopy
    Mirchi et al [] 2020 Journal NN Simulation training
    Alkadri et al [] 2021 Journal NN Simulation training
    Shedage et al [] 2021 Conference Clustering Simulation training
    Ophthalmology
    Tabuchi et al [] 2022 Journal AI (unspecified) Surgical video
    Wang et al [] 2022 Journal DNN Surgical video
    Dong et al [] 2021 Journal ML (unspecified) Surgical video
    Robotic-assisted surgery
    Simmonds et al [] 2021 Journal Clustering Simulation training
    Kocielnik et al [] 2023 Conference DL (unspecified) da Vinci system
    Wang et al [] 2023 Journal Bayesian network da Vinci system
    Open surgery
    Bkheet et al [] 2023 Journal DL (unspecified) Surgical video
    Kadkhodamohammadi et al [] 2021 Journal CNN Surgical video
    Surgery
    Papagiannakis et al [] 2020 Conference ML (unspecified) Simulation training
    Thanawala et al [] 2022 Journal ML (unspecified) Case logs
    Surgery skills
    Sung et al [] 2020 Journal CNN Simulation training
    Khan et al [] 2021 Journal ML (unspecified) Motion data
    Otolaryngology
    Lamtara et al [] 2020 Conference ML (unspecified) Simulation training
    Orthopedics
    Sun et al [] 2021 Journal ML (unspecified) Surgical video
    Plastic surgery
    Kim et al [] 2020 Conference DL (unspecified) Medical images
    Radiology
    Saricilar et al [] 2023 Journal NN Simulation training
    Urology
    Kiyasseh et al [] 2023 Journal Transformer Surgical video
    Vascular surgery
    Guo et al [] 2020 Journal SVM+RFj Slave controller

    aMIS: minimally invasive surgery.

    bCNN: convolutional neural network.

    cML: machine learning.

    dSVM: support vector machine.

    eDL: deep learning.

    fCNN: convolutional neural network.

    gLSTM: long short-term memory.

    hNN: neural network.

    iDNN: deep neural network.

    jRF: random forest.

    Findings and Interpretation

    Specific Surgical Procedures

    The scoping review reveals the range of surgical procedures where AI algorithms are being used (see ). The analysis emphasizes the integration of AI in MIS skills (27%, 15/56) [-], neurosurgery (20%, 11/56) [-], and laparoscopy (16%, 9/56) [-] (see ). Moderate representation was observed in arthroscopy (5%, 3/56) [-], ophthalmology (5%, 3/56) [-], and robot-assisted surgery (5%,3/56) [-]. Several other domains appeared less frequently, including open surgery (4%, 2/56) [,], general surgery (4%, 2/56) [,], and surgery skills (4%, 2/56) [,]. Finally, isolated studies were identified in otolaryngology (2%, 1/56) [], orthopedics (2%, 1/56) [], plastic surgery (2%, 1/56) [], radiology (2%, 1/56) [], urology (2%, 1/56) [], and vascular surgery (2%, 1/56) [].

    Table 3. Frequency of medical fields in the included articles (N=56).
    Specialty Included articles, n (%)
    MISa skills 15 (27)
    Neurosurgery 11 (20)
    Laparoscopy 9 (16)
    Arthroscopy 3 (5)
    Ophthalmology 3 (5)
    Robot-assisted surgery 3 (5)
    Open surgery 2 (4)
    Surgery 2 (4)
    Surgery skills 2 (4)
    Otolaryngology 1 (2)
    Orthopedy 1 (2)
    Plastic surgery 1 (2)
    Radiology 1 (2)
    Urology 1 (2)
    Vascular surgery 1 (2)

    aMIS: minimally invasive surgery.

    Functionally, most studies focused on automated skill assessment and learning-curve analysis, while comparatively few examined procedure guidance, workflow recognition, or decision support. This trend was especially evident in MIS and laparoscopy, which relied heavily on video-centric datasets and computer-vision models [-,-], and in neurosurgery, where virtual reality simulators provided standardized training environments and feedback mechanisms [-]. The specialty distribution appears to be driven by the availability of high-quality labeled data. Overall, the distribution of specialties indicates that AI integration aligns strongly with domains that generate structured, labeled, and reproducible data, such as endoscopic or robotic procedures. By contrast, open and specialty surgeries remain underrepresented, constrained by the limited standardization of datasets and variability in operative workflows. Future progress will depend on developing shared, procedure-specific repositories, cross-institutional benchmarks, and multimodal data capture beyond video and kinematic streams to enhance generalizability and educational impact [-].

    AI Techniques Used

    The scoping review identified a diverse set of AI techniques in surgical training (see ). The most frequent were ML (unspecified; 21%, 12/56) [,,,,,,,,,-], clustering (13%, 7/56) [,,,,,,], and CNNs (11%, 6/56) [,,,,,]. We also observed DL (unspecified; 11%, 6/56) [,,,,,] and SVMs (9%, 5/56) [,,,,], followed by neural networks (NNs; 7%, 4/56) [,,,] and AI (unspecified; 7%; 4/56) [,,,]. Additional categories included CNN+LSTM (4%, 2/56) [,], DNNs (4%, 2/56) [,], and fuzzy systems (4%, 2/56) [,]. Single-study categories (2%, 1/56) included regression analysis [], Markov chains [], tutoring system (unspecified) [], Bayesian network [], transformer [], and SVM+RF [].

    Table 4. Application of artificial intelligence (AI) techniques in the included articles (N=56).
    AI technique Included articles, n (%)
    MLa (unspecified) 12 (21)
    Clustering 7 (13)
    CNNsb 6 (11)
    DLc (unspecified) 6 (11)
    SVMsd 5 (9)
    NNse 4 (7)
    AI (unspecified) 4 (7)
    CNN+LSTMf 2 (4)
    DNNsg 2 (4)
    Fuzzy systems 2 (4)
    Regression analysis 1 (2)
    Markov chains 1 (2)
    Tutoring system (unspecified) 1 (2)
    Bayesian network 1 (2)
    SVM+RFh 1 (2)
    Transformer 1 (2)

    aML: machine learning.

    bCNN: convolutional neural network.

    cDL: deep learning.

    dSVM: support vector machine.

    eNN: neural network.

    fLSTM: long short-term memory.

    gDNN: deep neural network.

    hRF: random forest.

    From 2020 to 2024 (see ), ML (unspecified) appears every year, CNNs strengthen in 2021 and 2023, and DL (unspecified) is present in 2020 and 2022-2024. Sequential and hybrid models (CNN+LSTM and DNNs) clusters in 2022-2023. AI (unspecified) emerges from 2022 onward. Probabilistic and rule-based approaches (Bayesian networks, fuzzy systems, and Markov chains) and transformer/SVM+RF appear as single-study categories. Overall, the technique mix tracks data modality and availability (video and kinematics), reinforcing the need for shared multimodal repositories and standardized evaluation metrics to compare methods fairly and improve external validity.

    Table 5. Temporal distribution of artificial intelligence (AI) models in the included articles (2020-2024).
    AI model 2020, n (%) 2021, n (%) 2022, n (%) 2023, n (%) 2024, n (%) Total, n (%)
    MLa (unspecified) 2 (17) 5 (42) 1 (8) 2 (17) 2 (17) 12 (100)
    CNNb 1 (17) 3 (50) 0 (0) 2 (33) 0 (0) 6 (100)
    Clustering 3 (43) 2 (28) 1 (14) 1 (14) 0 (0) 7 (100)
    SVMc 3 (60) 0 (0) 1 (20) 1 (20) 0 (0) 5 (100)
    DLd (unspecified) 1 (17) 0 (0) 2 (33) 2 (33) 1 (17) 6 (100)
    NNe 1 (25) 1 (25) 1 (25) 1 (25) 0 (0) 4 (100)
    AI (unspecified) 0 (0) 0 (0) 1 (25) 2 (50) 1 (25) 4 (100)
    DNNf 0 (0) 0 (0) 2 (100) 0 (0) 0 (0) 2 (100)
    CNN+LSTMg 0 (0) 0 (0) 0 (0) 2 (100) 0 (0) 2 (100)
    Fuzzy systems 0 (0) 0 (0) 1 (50) 1 (50) 0 (0) 2 (100)
    Bayesian network 0 (0) 0 (0) 0 (0) 1 (100) 0 (0) 1 (100)
    Markov chains 0 (0) 0 (0) 0 (0) 0 (0) 1 (100) 1 (100)
    Regression analysis 0 (0) 0 (0) 0 (0) 1 (100) 0 (0) 1 (100)
    SVM+RFh 1 (100) 0 (0) 0 (0) 0 (0) 0 (0) 1 (100)
    Transformer 0 (0) 0 (0) 0 (0) 1 (100) 0 (0) 1 (100)
    Tutoring system (unspecified) 0 (0) 0 (0) 1 (100) 0 (0) 0 (0) 1 (100)
    Total per year 12 (21) 11 (20) 11(20) 17 (30) 5 (9) 56 (100)

    aML: machine learning.

    bCNN: convolutional neural network.

    cSVM: support vector machine.

    dDL: deep learning.

    eNN: neural network.

    fDNN: deep neural network.

    gLSTM: long short-term memory.

    hRF: random forest.

    In the analyzed studies, the number of publications increased from 12 in 2020 to 17 in 2023, with 11 in both 2021 and 2022, and 5 in 2024. The literature search concluded on March 18, 2024, which likely accounts for the lower count in 2024. These totals are summarized in the “Total per year” row of .

    Application of AI Techniques

    AI techniques have been applied across diverse training setups, enhancing both learning experiences and performance assessment in surgical procedures (see ). The most frequent environments were simulation training (36%, 20/56) [-,-,,,,,] and box trainers (23%, 13/56) [40–43,46-50,52,66,70-71], followed by surgical video analysis (16%, 9/56) [,,-,,,,] and robotic systems using the da Vinci platform (11%, 6/56) [,,,,,]. Less frequent configurations included training stations (4%, 2/56) [,] and in-vivo settings (4%, 2/56) [,], with single-study setups for case logs [], motion data [], medical images [], and a slave controller [] (each 2%, 1/56). Across these settings, studies reported the use of automated skill assessment, formative feedback, and adaptive progression, supported by video, kinematic, and performance-metric streams.

    Over time, setup diversity increased, peaking in 2023 (see ). Simulation training and box trainers were consistently present, while surgical video and da Vinci deployments clustered in 2021-2023. These patterns mirror data availability and standardization in risk-managed environments, where AI can be trained and evaluated reliably.

    Table 6. Distribution of training setups in the included articles (N=56).
    Training setup Included articles, n (%)
    Simulation training 20 (36)
    Box trainer 13 (23)
    Surgical video 9 (16)
    da Vinci System 6 (11)
    Training station 2 (4)
    In-vivo setting 2 (4)
    Case logs 1 (2)
    Motion data 1 (2)
    Medical images 1 (2)
    Slave controller 1 (2)
    Figure 2. Appearance of setups over the years in the included articles.

    Principal Findings

    This section discusses the study’s implications and contributions to the field. The review maps and analyzes current applications of AI in surgical training, assessment, and evaluation, identifying the most common surgical procedures, AI techniques, training setups, and highlighting gaps and opportunities for future research. The results show that AI is most frequently reported in data-rich, risk-mitigated environments, notably simulation training and box-trainer setups, and that ML (unspecified) and DL (unspecified) approaches dominate model choices.

    Within these settings, many studies report models that leverage synchronized inputs, for example, kinematics, video, and other performance metrics, to classify technical skill using consistent criteria, to characterize learning trajectories across repeated attempts, and to localize performance-limiting behaviors at the level of gestures, steps, or procedural phases. When embedded in iterative practice, these capabilities may enable individualized training pathways that adjust task parameters and feedback density to a trainee’s evolving competence, with the potential to shorten time to proficiency and to reduce instructor workload. These implications are consistent with the results, in which simulation training accounted for 36% (20/56) and box trainer setups for 23% (13/56) of the included studies.

    Findings in Relation to the Research Questions

    Regarding the first research question aimed at identifying the specific surgical procedures where AI algorithms are most frequently applied in surgical training, AI use concentrates on MIS skills [-], neurosurgery [-], and laparoscopy [-]. Rather than simple frequency, the common thread across these areas is structured, high-signal data capture and well-specified tasks. Endoscopic and robotic workflows generate synchronized video, robotic kinematics, and simulator logs, which enable reproducible labels such as phase boundaries, gesture events, and Objective Structured Assessment of Technical Skills–aligned rubrics. This ecosystem lowers barriers to annotation and validation, thereby accelerating method development. Beyond these clusters, activity in ophthalmology [-], open surgery [,], robot-assisted surgery [-], and single-study specialties including radiology [], urology [], and vascular surgery [] signals a widening scope. However, these domains often face less standardized capture or a more variable field-of-view, which complicates model training and external validation. The overall distribution, therefore, appears to reflect data tractability and curricular formalization more than inherent differences in educational need.

    The second research question investigated which AI techniques have been used in surgical training and evaluation. Studies use ML (unspecified) [,,,,,,,,,-] and DL (unspecified) [,,,,,] as broad families, with task-appropriate specializations such as CNNs for video [,,,,,] and SVMs for lower-dimensional kinematics or hand-crafted features [,,,,]. NNs [,,,] support competency modeling when feature engineering is feasible, and CNN+LSTM hybrids [,] target temporal dynamics for suturing and task segmentation. DNNs are explicitly mentioned in [,]. Single-study categories (fuzzy systems [,], regression analysis [], Markov chains [], tutoring system (unspecified) [], Bayesian network [], transformers [], and SVM+RF []) illustrate exploratory breadth rather than established consensus. Consistent with coding, CNN+LSTM is treated as a distinct class and not double-counted under CNNs. No single approach emerges as universally optimal; instead, methods align with task structure (classification vs sequence prediction), signal characteristics (video and kinematics), and assessment granularity (summative scores versus frame- or gesture-level feedback).

    The third research question investigated how AI techniques are being used to assess and improve surgical training. Across setups, a common pattern is the move from retrospective, manual scoring to prospective, automated analytics that are both standardized and timely. In simulation training, synchronized streams enable immediate feedback and progression gating, which supports deliberate practice cycles grounded in objective metrics. This aligns with the preponderance of simulation studies in the dataset and the consistent application of ML and DL to transform kinematics and video into competency-linked outputs. In box trainers, models quantify motion economy, tool path quality, and task efficiency, enabling skill stratification and targeted coaching [-,-,,,,]. In robotic systems on the da Vinci platform, studies demonstrate automated assessment, uncertainty-aware feedback, and domain adaptation for cross-site or cross-task transfer [,,,,,]. In surgical video pipelines, investigators focus on procedural understanding, ergonomics, and fine-grained performance analytics [,,-,,,,]. The unifying mechanism across these contexts is measurement at scale that reduces feedback latency, increases consistency, and enables adaptive progression rules without displacing instructor oversight.

    Finally, the last research question investigated the way in which AI applications in surgical training affect the learning curve of surgical residents and fellows. Multiple studies report outcomes consistent with accelerated learning and improved technical performance under AI-enabled training. This includes predictive modeling of progression [], metric selection and learning-curve characterization in simulation [], a randomized comparison of feedback modalities [], competency-based training backed by neural models [], continuous monitoring of bimanual expertise with deep models [], and competency estimation in laparoscopic training []. Evidence from robotic contexts shows that automated assessment can structure practice with short feedback loops []. That said, effect sizes remain difficult to aggregate due to heterogeneous study designs, small sample sizes, nonstandard outcome measures, and limited external validation. The most defensible interpretation is that personalized, data-driven feedback and objective, repeated measurement are plausible mechanisms for the observed gains, with further multicenter validation needed to establish generalizability and durability.

    The findings suggest that current AI deployment in surgical training follows data availability and standardization, that ML/DL with video and kinematics are dominant because they best match that data, and that automated, timely feedback is the primary lever through which AI influences performance and learning. Where capture is less standardized or external validation is sparse, adoption tends to lag. This synthesis directly motivates the recommendations presented later in the Discussion section on common benchmarks, transparent reporting, and SDG 4–aligned scalability.

    Comparison With Previous Work

    Systematic literature reviews in surgical training found in the literature have focused on specific training methods (eg, simulation-based training) or on specific types of surgery (eg, plastic surgery and orthopedic surgery) rather than providing a cross-specialty map of AI methods for training, assessment, and evaluation. Reviews focused on simulation-based training within specific domains underscore this pattern. Lawaetz et al [] examined simulation-based training and assessment in open vascular surgery, cataloguing common methods and commenting on effectiveness within that context. Abelleyra Lastoria et al [] surveyed simulation-based tools in plastic surgery and concluded that the validity of many approaches requires further investigation. Woodward et al [] reached a similar conclusion in orthopedic surgery, noting concerns about the construct validity and methodological rigor of simulation studies. Reviews centered on robotic-assisted surgery also reflect divergent emphases: Rahimi et al [] provided a descriptive overview of training modalities and assessment practices, whereas Boal et al [] explicitly scrutinized AI methods for technical skills in robotic surgery and highlighted that both manual and automated assessment tools are often insufficiently validated.

    Closer to the scope of the present scoping review, several analyses have examined automation and AI across surgical training tasks. Levin et al [] identified families of automated technical skill assessment methods, including computer vision, motion tracking, ML and DL, and performance classification, but did not synthesize evidence on educational effectiveness. Lam et al [] focused specifically on ML methods and reported accuracy rates that generally exceeded 80 percent across included studies, offering a performance-oriented view rather than a training-context analysis. Pedrett et al [] emphasized the central role of video-derived motion and robotic kinematic data as inputs to AI models for technical skill assessment in minimally invasive surgery, reinforcing the importance of structured, high-signal data streams.

    Findings from the present review are consistent with these previous observations in several respects. First, the centrality of simulation and other risk-managed environments recurs across literature, reflecting where ground truth is tractable and measurement can be standardized. Second, many reviews identify validation gaps, noting that reported metrics, dataset partitions, and labeling practices vary widely, which complicates comparison across sites and inhibits external generalizability [-]. Third, there is broad agreement that AI-assisted assessment is advancing rapidly in robotic and minimally invasive settings; yet, many frameworks remain descriptive or single-center, and their educational impact is not consistently established with robust designs [-].

    At the same time, this review differs from earlier work in several ways. The scope extends across specialties and across training setups, linking procedures, techniques, and use cases in a single comparative framework. Rather than isolating a single algorithm family or specialty, the analysis connects the dominant AI techniques to the data modalities they exploit and to the assessment functions they serve. This mapping clarifies why ML and DL approaches, particularly CNN-based and hybrid temporal models, are prevalent where high-quality video and kinematics are available, and why adoption is slower where capture is less standardized. In addition, the review integrates signals relevant to learning curves, highlighting studies that associate AI-enabled feedback with improvements in proficiency trajectories, while also acknowledging heterogeneity and the need for external validation. By taking this comparative perspective, the review identifies shared deficiencies that cut across specialties, including nonstandard outcome measures, limited transparency in algorithmic reporting, and sparse multicenter testing, and points toward future work on benchmarks, interoperable data schemas, and scalable deployment aligned with SDG 4.

    Whereas previous reviews have been primarily domain-specific or method-specific, this scoping review offers a cross-specialty synthesis that links where AI is used, which techniques are used, and how they are used to support training and assessment. This perspective complements existing literature by emphasizing comparability across contexts, illuminating mechanisms by which AI influences learning, and articulating the methodological steps needed to translate promising prototypes into reproducible, generalizable, and educationally meaningful tools.

    Strengths and Limitations

    This scoping review offers a broad, cross-specialty perspective on the application of AI in surgical training, assessment, and evaluation. It maps procedures, techniques, and training setups within a single comparative framework, which supports interpretation across contexts rather than within a single specialty. The review adheres to PRISMA-ScR guidance, applies explicit inclusion and exclusion criteria, and uses transparent counting rules that assign each study a primary AI technique and a primary setup to avoid double-counting. Results are presented as both narrative synthesis and structured summaries. The Discussion integrates an SDG 4 perspective, offering concrete implementation considerations related to access, scalability, and equity. Together, these elements provide a panoramic view of where AI is currently deployed, why certain methods dominate in specific data environments, and how these choices influence assessment and feedback in practice.

    Several constraints should be considered. First, the search was limited to English-language publications and to the period ending March 18, 2024, which may omit relevant work outside this window. Second, many articles describe methods only at a general label level (AI, ML, and DL) without specifying architectures or training details, which limits interpretability and reproducibility. Third, the evidence base is concentrated in simulation, box-trainer, and video-centric settings, which may not fully capture transfer to live clinical performance, patient outcomes, or longer-term retention. Fourth, external validation is limited, as relatively few studies report multicenter testing, performance under domain shift, subgroup analyses, or calibration, which constrains confidence in portability.

    To address these limitations, educational outcomes should also be mapped to recognized competency frameworks and reported with standardized metrics that enable replication and meta-synthesis. When multisetup or multi-technique pipelines are used, authors should specify proportional attribution. Reporting on access, resource requirements, and cost per trainee hour will support the deployment and equity assessment of SDG 4. Multicenter collaborations that release shared benchmarks and interoperable datasets will be necessary to improve reproducibility and to allow fair comparisons across techniques and settings.

    Future Work Recommendations

    This scoping review identified current applications of AI in surgical education and highlighted priority areas for further work. As summarized in and visualized in , a large proportion of studies focus on simulation training [-,-,,,,,], representing 36% (20/56) of the included articles. This concentration reflects the suitability of simulation for controlled data capture and iterative practice. Building on this foundation, AI can enhance simulation-based training with realistic, adaptive, and personalized learning experiences [,], while also enabling standardized and rapid feedback that supports deliberate practice.

    Advances in computer vision are particularly significant where high-quality video and kinematic data are accessible, which aligns with the prevalence of simulation and box-trainer studies in the included literature. In these regulated, risk-mitigated environments, AI systems can produce timely and structured feedback linked to defined competency frameworks, including economy of motion, bimanual coordination, camera control, tissue handling, and ergonomics, thereby facilitating deliberate practice. Although natural language processing technologies are less represented in the current review, their growing maturity suggests near-term opportunities to integrate narrative guidance, rubric-based feedback, and reflective prompts alongside quantitative metrics, provided such outputs are aligned with curricular objectives and are appropriately validated.

    Future efforts should pursue 5 complementary directions.

    First, strengthen external validity. Studies should include multi-institution cohorts, predefined external test sets, and reporting of performance under domain shift, including different camera views, instruments, and case difficulty. Where feasible, researchers should evaluate the transfer from simulation or bench-top tasks to higher-fidelity or clinical settings with clearly specified outcome measures and follow-up intervals.

    Second, standardize educational outcomes. Investigators should map AI outputs to recognized competency frameworks and report validity, reliability, learning curve parameters, and time to competency with consistent definitions. Agreement on core outcome sets will enable comparison across techniques and facilitate meta-synthesis.

    Third, expand the breadth and transparency of data. New work should prioritize multimodal capture that combines video, kinematics, tool telemetry, where appropriate, eye tracking or physiological signals. Public or data-sharing consortia should release interoperable schemas, labeling protocols, and benchmark tasks that are specific to procedures and skill elements. Clear descriptions of models and training and validation splits will improve reproducibility.

    Fourth, improve usability, equity, and scalability in alignment with SDG 4. Models should operate on standard hardware, interoperate with existing simulators and video platforms, and function reliably in low-bandwidth or offline environments. Reporting of access, installation steps, resource needs, and cost per trainee hour will support adoption in diverse settings. Interfaces should disclose uncertainty, make feedback interpretable, and integrate into educator workflows without adding undue burden.

    Fifth, broaden methodological scope responsibly. There is an opportunity to study natural language technologies for rubric-based guidance, structured debriefs, and reflective prompts, provided outputs are aligned with curricular objectives and validated for educational use. Prospective trials that compare feedback modalities and density, and that measure downstream retention and transfer, will clarify how AI should be integrated pedagogically.

    Together, these directions could move the field from promising prototypes toward reproducible, generalizable, and educationally meaningful tools that improve surgeon training while supporting equitable access to high-quality education.

    Conclusions

    This scoping review maps current applications of AI in surgical training, assessment, and evaluation across procedures, techniques, and training setups. From 1400 records, 56 studies met the inclusion criteria, with activity concentrated in minimally invasive surgery, neurosurgery, and laparoscopy. AI is most frequently deployed in data-rich, risk-mitigated environments, particularly simulation training and box trainers, where synchronized video and kinematic streams support objective measurement and timely feedback. Technique choices reflect these data conditions, with ML (unspecified) and DL (unspecified) methods predominating and task-specific variants, such as CNNs and hybrid temporal models, applied to video-centric problems.

    Across settings, studies describe automated skill assessment, structured formative feedback, and adaptive progression, with several reporting improvements consistent with accelerated learning curves. At the same time, heterogeneity in study design, small samples, nonstandard outcome measures, and limited external validation constrain strong inferences about effect sizes and generalizability. The evidence, therefore, supports cautious optimism that AI-enabled feedback can enhance skill acquisition, while underscoring the need for more rigorous evaluation.

    Future work should prioritize precise reporting of models and datasets, multicenter validation, and standardized educational outcomes linked to recognized competency frameworks. Interoperable data schemes, shared benchmarks, and transparent methods will be essential to enable comparison across sites and techniques. Attention to scalability, access, and usability will support alignment with SDG 4, ensuring that benefits extend beyond well-resourced centers. With these elements in place, AI has the potential to deliver reproducible, equitable, and educationally meaningful gains in surgical training.

    We thank the Engineering Faculty, the Research Group NexEd Hub, and the Computing Department of Universidad Panamericana, Mexico City Campus. Finally, we would like to thank Rodrigo González Serna and Monserrat Villacampa Espinosa de los Monteros for their assistance during the design and creation of the flow diagram and the graphs, respectively. Generative AI was used to improve the grammar, style, and clarity of some sentences and paragraphs after initial human drafting. The authors verified all output for factual accuracy and scientific integrity. The model was not used to generate paragraphs, summaries, display charts or tables, or to analyze or interpret data. The model used was ChatGPT based on GPT-4-turbo (“omni”), the vendor is OpenAI, over the web app (chat.openai.com). There were no external funding sources for this study. Consequently, funders had no influence on the design of the study, the collection, analysis, or interpretation of data, the writing of the manuscript, or the decision to publish the results.

    We would also like to thank the Academy of Medical Sciences (AMS) for their support (NIF0041018), as this study originated from this award.

    The datasets generated or analyzed during this study are available in the AI Review – Selected Zotero group library [].

    None declared.

    Edited by T Leung, G Eysenbach; submitted 29.Mar.2024; peer-reviewed by R Yin, M Pojskic; comments to author 13.Jul.2024; revised version received 20.Oct.2025; accepted 23.Oct.2025; published 18.Nov.2025.

    ©David Escobar-Castillejos, Ari Y Barrera-Animas, Julieta Noguez, Alejandra J Magana, Bedrich Benes. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.Nov.2025.

    This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

    Continue Reading