NEW YORK (AP) — Nobel laureate Han Kang’s first book of nonfiction to come out in English will be released next spring.
The Korean author’s “Light and Thread” is scheduled to be published March 24…

NEW YORK (AP) — Nobel laureate Han Kang’s first book of nonfiction to come out in English will be released next spring.
The Korean author’s “Light and Thread” is scheduled to be published March 24…

A recent study published November 19 in Trends in Biotechnology reports that scientists used the gene-editing tool CRISPR to improve how efficiently a fungus produces protein while also lowering the environmental footprint of that production by…
ADDIS ABABA, Nov. 21 (Xinhua) — Coordinated efforts are currently underway to avert possible cross-border spread of the Marburg virus disease outbreak in southern Ethiopia, the Africa Centers for Disease Control and Prevention (Africa CDC)…

Maxey with the lefty runner to become the 6th player to hit 50 this season.
Before a few free throws to cap a historic first 15 games of his season.
Cup Night 4: Everything to know ahead of tonight’s…

Fans have been counting down the days until the final theatrical chapter of Wicked is revealed. To celebrate the highly anticipated release of Wicked: For Good and its soundtrack,…

There are many approaches one could take for holiday gifting. You can shop your giftee’s wish list directly, browse our carefully-curated gift guides for inspiration, or head to your favorite retailer to see their recommendation. In the spirit…

Today we’re celebrating Friday with a second Pre-Release for 1.21.11! This one has some improvements when See-Through Leaves is turned off along with bug fixes.
Happy mining!

Thank you, Director Knotek, for the kind introduction. It is a pleasure to be with you today and to be back in Cleveland and the beautiful Fourth district.1 It is always a pleasure to visit the Federal Reserve Bank of Cleveland—and all the Reserve Banks. The hard work and dedication of Reserve Bank staff, leadership, and directors help ensure the economic well-being of communities across the country, including in Ohio, Pennsylvania, West Virginia, and Kentucky. One of the many important roles the Reserve Banks have is to be the connection between policymakers and families, workers, and businesses in every corner of the country. That allows us to be well-informed and to make the best decisions for the benefit of all Americans. I thank everyone here at the Cleveland Fed for their service.
As we are gathered to discuss financial stability in an era of rapid economic and technological transformation, I would like to turn our attention to what many see as a significant catalyst of change: artificial intelligence (AI). Today, I will talk about how I approach thinking about AI. First, I will build on remarks I gave earlier this month and discuss how I see AI affecting the economy through the lens of the dual mandate given to the Federal Reserve by Congress to pursue maximum employment and price stability.2 Next, I will discuss AI in the context of financial stability because a stable and resilient financial system is, of course, critical to achieve that dual mandate. I will do so by highlighting some of the findings in our recently released Financial Stability Report (FSR). And finally, I will discuss how the current period compares with another recent period of notable technological change—the late 1990s. Afterwards, I would be happy to answer your questions.
AI technology has been advancing at an extraordinary pace in recent years.3 Hundreds of millions of people now regularly use AI to write and translate documents, obtain practical guidance on a variety of topics, and search for information.4 A recent report from the St. Louis Fed showed that in August almost 55 percent of working-age adults said they used generative AI for personal or work use, up notably from a year earlier.5 Businesses are increasingly adopting AI, with use cases clearly moving from the experimentation phase to operations. In the financial industry, AI adoption rates are even higher.6 And while it may take some time before we see its full effects, many believe that AI has the potential to generate substantial change across the economy and to be a valuable tool for people all around the world.7
AI and the Dual Mandate
I see, as many others do, the promise for AI technology to be transformative, perhaps in ways as dramatic as past technological achievements, such as the printing press, steam engine, and the internet. When I think about the macroeconomic consequences of AI, I consider how it will affect both sides of our dual mandate.
AI can enable a worker to complete in seconds or a few minutes tasks that previously took many minutes, if not hours. Already, it is boosting worker productivity in a wide range of industries and occupations, including customer service, logistics, computer programming, and medical research. Increased productivity leads to economic growth, which may also create new employment opportunities. There is already robust competition among high-tech firms for workers who possess the skills to develop and effectively deploy this technology.
At the same time, many people have legitimate concerns that AI will cause job loss. At least for certain firms and workers in certain occupations, this is likely to be true. Indeed, some large employers have recently indicated they are lowering overall hiring plans in light of advances in AI and other forms of automation.8 Some research has also suggested that AI is having a more detrimental effect on the job prospects of younger, less-experienced workers, including recent graduates, compared with those further along in their careers.9
The net effect on employment is highly uncertain and may show variation across industries and occupations over time. For the overall economy, one way to think about AI-related job losses—at least until it creates new jobs—is that AI can create economic slack. In a world where firms only deployed AI to replace existing workers, even at a lower level of overall employment, the productive capacity of the economy has increased (or stayed the same). Of course, if people get redeployed to more productive tasks, the economy produces more without any additional slack. Whether AI substitutes for labor or complements it is an important question that requires continued study.
On the other side of our dual mandate—price stability—AI could help the economy achieve higher growth through increased productivity while reducing inflationary pressures. Specifically, increased productivity could lower production costs and put downward pressure on prices. Not only could AI allow a specific worker to be more efficient in their day-to-day tasks, but it also has the potential to allow for better thinking around complex problems. More efficient allocation of resources as well as potential improvements in supply chains could reduce associated costs, leading to lower prices for various goods and services. Conversely, AI could put upward pressure on certain price categories as many firms push to scale up the technology. I mentioned the rising wages for certain workers with skills that complement AI. AI technology also requires data centers, which compete with other production processes for land, energy, and other inputs. As with the labor market, the timing and degree of AI’s effect on prices are uncertain and should be monitored.
In terms of AI’s implications for monetary policy, it is likely still too soon to tell. Policymakers must sort out changes in the economy that are due to cyclical factors from those resulting from structural change, which AI may well represent. Productivity gains from AI may affect the relationship between employment and inflation and, hence, the conduct of monetary policy. Monetary policy decisions, of course, need to be made with a broad view of what is happening in the economy, not just in one sector or technology. There is still much to learn. I counsel exercising humility about the challenges in predicting AI’s effects on employment and inflation.
AI and Financial Stability
Now, I will turn to discussing AI through the lens of financial stability. But first I would like to step back and briefly offer my overall assessment. The financial system remains sound and resilient, with high levels of risk appetite and leverage at hedge funds balanced by strong household and business balance sheets and high levels of capital in the banking system. Because the continued resilience of the financial system is so important to achieving our dual-mandate goals, policymakers monitor a range of indicators associated with vulnerabilities, as is discussed in our semiannual FSR. The latest version of that report was published a few weeks ago.10
As part of our financial stability monitoring work at the Fed, we routinely survey market contacts regarding potential risks to the U.S. financial system. A finding in the most recent survey is likely of particular interest to a conference exploring financial stability in a time of technological change—30 percent of contacts surveyed cited a turn in the prevailing sentiment toward AI as a salient risk to the U.S. financial system and the global economy. That was up from 9 percent in the spring survey. Participants noted that if such an unwinding were to occur, it could tighten financial conditions and restrain economic activity more broadly. It is not my intention to make a judgement on the likelihood of those, or other scenarios offered in the FSR. Rather, I want to reiterate that I am monitoring the scenarios.
Some observers have asked about comparisons with the late 1990s, which was another period when promising new technological advances were closely tied to strong stock market performance.11 They have asked, are we experiencing “dot-com boom 2.0?” So, in the rest of my talk, I will address this question. I will argue that there are important differences between the recent period and the stock market experience during the dot-com boom of the late 1990s. Of course, much has changed over the past quarter-century, so history can only be a useful reference and not a predictor of future outcomes.
How Current Conditions Compare with the Dot-Com Era
I see clear differences across the two periods. Before I discuss three notable differences, let me first refresh our memory and mention a dimension along which the dot-com era and the AI era are similar—the particularly rapid stock price appreciation of firms that encapsulate the promise of transformational technologies.12 Indeed, the stock prices of dot-com firms increased more than 200 percent during their boom between 1996 and 1999, a little faster than the increase we have seen so far in the stock prices of AI-related firms since 2022.13
A closer inspection of the firms driving the stock price increases, however, reveals the first critical difference in the stock market dynamics across the two periods. In the late 1990s and early 2000s, many, albeit not all, dot-com firms typically had little to no realized earnings and only speculative revenue prospects even as they obtained external funding and were listed on public exchanges with relative ease amid pronounced investor enthusiasm. In contrast, the firms most closely identified with AI technologies in today’s market generally have well-established and growing earnings streams.
The second difference is that because of their solid earnings base, the price-to-earnings ratios of AI-related firms have thus far remained well below the peak ratios of dot-com firms in the late 1990s and early 2000s. During the dot-com episode, investors were apparently far more willing to bet on the promise of the internet, with dot-com stocks commanding far larger premiums than AI stocks do today.
The third difference is that the proliferation of dot-com firms in the late 1990s and early 2000s suggests a less discriminating enthusiasm than what we are observing in today’s market. At the peak in the late 1990s, more than 1,000 firms were publicly listed as dot-com companies, with many having minimal revenue and highly speculative business models. Indeed, researchers have shown that the mere addition of “dot-com” to a company name was sufficient to boost the stock price of the firm.14 By contrast, in the current environment, by one measure about 50 publicly traded firms are considered to be AI-focused enterprises, in the sense that they are expected to derive revenue from other firms using AI technologies, as opposed to the broad-based speculation that we saw in the late 1990s. 15 However, it is important to acknowledge that the rapid growth of private capital markets—which are not publicly traded—in recent years may obscure the extent of investor enthusiasm for financing AI firms, as a growing number of private firms reportedly now identify as AI-focused or AI-native enterprises.
Finally, with respect to financial leverage, both dot-com previously and AI firms recently exhibited limited reliance on debt financing for the most part. This more limited use of leverage may reduce the extent to which a shift in sentiment toward AI could transmit to the broader economy through credit markets. That said, recent market reports suggest that AI firms may be increasing their use of debt, from both public and private credit markets, to fund substantial investments in computing infrastructure and talent acquisition. Just a few weeks ago, large volumes of corporate bonds were issued to support AI infrastructure buildout by key players in the industry. Some analysts estimate that future investments in AI infrastructure will require a lot more debt.16 If that turns out to be the case, leverage in the AI sector could increase—and so could the losses if sentiment toward AI shifts. I will watch this developing trend closely.
In sum, looking at these dimensions, as of now, the differences between the current market and the dot-com era make it less likely that we are witnessing a replay of the late 1990s. Most notably, in contrast with the dot-com era, the current AI-related stock market activity appears more concentrated among established firms with actual earnings. Additionally, these developments are occurring against the backdrop of a financial system that is sound and resilient.
Conclusion
To conclude, it is important to acknowledge that assessing the implications of rapidly developing technology for financial stability is subject to a great deal of uncertainty. The ultimate implications of AI for both the macroeconomy and financial stability may take a long time to crystallize. Technology, the financial system, and the broader economy are always evolving. The potential transition to our new world could be bumpy, and it is too early to tell what trajectory will ultimately play out. Our research is continually enriched by the insights provided by market participants in regular outreach, keeping us abreast of the latest developments in AI and their economic implications, and by an ever-expanding analytical toolkit, including AI. Ensuring that the AI revolution unfolds within the context of a stable financial system is not just desirable—it is also imperative for achieving our dual-mandate objectives of maximum employment and price stability.
Thank you.
1. The views expressed here are my own and are not necessarily those of my colleagues on the Federal Reserve Board or the Federal Open Market Committee. Return to text
2. See Philip N. Jefferson (2025), “AI and the Economy,” speech delivered at Euro20+, Deutsche Bundesbank, Frankfurt, Germany, November 7. Return to text
3. See Microsoft (2025), “AI Diffusion Report: Where AI Is Most Used, Developed, and Built,” AI Economy Institute, November, https://www.microsoft.com/en-us/research/group/aiei/ai-diffusion/. Return to text
4. See Aaron Chatterji, Thomas Cunningham, David J. Deming, Zoe Hitzig, Christopher Ong, Carl Yan Shan, and Kevin Wadman (2025), “How People Use ChatGPT,” NBER Working Paper Series 34255 (Cambridge, Mass.: National Bureau of Economic Research, September). Return to text
5. See Alexander Bick, Adam Blandin, and David Deming (2025), “The State of Generative AI Adoption in 2025,” Federal Reserve Bank of St. Louis, On the Economy Blog, November 13. Return to text
6. This assessment is based on Federal Reserve staff analysis of data from the U.S. Census Bureau’s Business Trends and Outlook Survey, which is available on its website at https://www.census.gov/programs-surveys/btos.html. Return to text
7. See Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond (2023), “Generative AI at Work,” NBER Working Paper Series 31161 (Cambridge, Mass.: National Bureau of Economic Research, April; revised November 2023). Return to text
8. For example, see Tom Huddleston Jr. (2025), “Walmart CEO: ‘AI Is Literally Going to Change Every Job’—How the Best Employees Can Still Stand Out,” CNBC, September 29; also, see Karen Weise (2025), “Amazon Plans to Replace More than Half a Million Jobs with Robots,” New York Times, October 21. Return to text
9. See J.P. Morgan (2025), “AI’s Impact on Job Growth,” webpage, https://www.jpmorgan.com/insights/global-research/artificial-intelligence/ai-impact-job-growth. Return to text
10. The most recent FSR is available on the Federal Reserve Board’s website at https://www.federalreserve.gov/publications/files/financial-stability-report-20251107.pdf. Return to text
11. For example, AI technology and its effect on financial markets is the focus of the 10th Foundation for Advancement of Research in Financial Economics (FARFE) Conference. Information about the conference is available on FARFE’s website at https://farfe.org/Conference-2025.html. Return to text
12. See Alexander Ljungqvist and William J. Wilhem Jr. (2003), “IPO Pricing in the Dot-Com Bubble,” Journal of Finance, vol. 58 (April), pp. 723–52; also, see Tim Loughran and Jay R. Ritter (2002), “Why Don’t Issuers Get Upset about Leaving Money on the Table in IPOs?” Review of Financial Studies, vol. 15 (January), pp. 413–44. Dot-com firms are identified following the approach in Ljungqvist and Wilhelm Jr. (2003) and Loughran and Ritter (2002) by using a specific set of SIC codes. Return to text
13. Similarly, the NASDAQ stock price index increased about 215 percent between 1996 and 1999. Return to text
14. See Michael J. Cooper, Orlin Dimitrov, and P. Raghavendra Rau (2001), “A Rose.com by Any Other Name,” Journal of Finance, vol. 56 (December), pp. 2371–88. Return to text
15. This assessment is based on the iShares Future AI & Tech ETF, which focuses on “companies that provide products and services that are expected to contribute to AI technologies in areas including generative AI, AI data and infrastructure, AI software, and AI services.” Obviously, the number of firms that use AI technology is much larger. Return to text
16. See Morgan Stanley (2025), “Bridging a $1.5tr Data Center Financing Gap,” July. Return to text

GENEVA — Organizers of the Eurovision Song Contest announced plans to change the voting system of the popular musical extravaganza to ensure fairness, a move that follows allegations of “interference” by Israel’s government.
The European…

The application of large language models (LLMs) within the medical domain is undergoing rapid growth [,]. Key areas of investigation include assisted diagnosis [] and the structured representation of electronic health records []. These models exhibit considerable potential, with preliminary findings from research and practical implementations showing promising results. LLMs through pretraining and fine-tuning on extensive datasets containing medical literature, clinical records, and biomedical knowledge bases, leverage deep learning methodologies to develop rich linguistic representations and demonstrate robust contextual understanding and knowledge integration []. This results in significantly enhanced performance compared with traditional language processing tools across natural language understanding, pattern recognition, and correlation analysis, with notable advantages in processing intricate medical data and facilitating cross-domain knowledge transfer []. While the pursuit of enhanced model performance remains crucial, ensuring robust protection of patient privacy and data security remains a paramount concern and a fundamental requirement for the responsible and sustainable advancement of LLM applications in health care. This presents a complex challenge that necessitates both technical and regulatory solutions [-].
A substantial portion of current research on LLMs for health care applications uses publicly available resources [], such as the Medical Information Mart for Intensive Care-IV []. However, some studies are conducted using internal patient health information (PHI) repositories, which often contain highly sensitive personally identifiable information (PII) [], including patient names, medical record numbers, age, zip code, admission date, and so on. This practice necessitates robust data governance frameworks to ensure patient privacy and data security. In conventional approaches to developing computer models using patient data, researchers typically processed and trained data within local or strictly regulated environments. This practice inherently reduced the risk of sensitive data compromise during transmission and storage [,]. However, deploying LLMs on the cloud often requires uploading vast amounts of raw medical data directly to remote servers. These servers may be distributed across different regions and are frequently not entirely under the control of health care institutions or researchers [,]. Beyond these conventional risks, LLMs introduce additional privacy concerns [] due to their generative nature; they may inadvertently reproduce sensitive information learned during training, and their large scale and complexity expose them to attacks such as model inversion or prompt injection. Together, these factors make the protection of medical data in LLM deployment especially challenging. Recent studies have highlighted concrete privacy threats associated with LLM use: fine-tuning can significantly increase PII memorization and vulnerability to leakage attacks, especially when tuning specific layers of the model []; novel mitigation strategies like “Whispered Tuning,” which combine PII redaction, differential privacy, and output filtering, can markedly reduce leakage while preserving performance []; and beyond memorization, pretrained LLMs can infer personal attributes (eg, location, income, and gender) from seemingly innocuous text with high accuracy []. Consequently, data transmission and storage processes may encounter heightened risks of security vulnerabilities and unauthorized access. Whether through misuse by internal personnel or external cyberattacks, PHI is at risk of improper use or malicious disclosure [].
Therefore, reconciling the potential of LLMs to enhance health care quality and efficiency with the imperative to protect patient privacy represents a significant challenge requiring careful consideration. The regulatory oversight of LLMs processing PHI is governed by a complex patchwork of national and international privacy laws, including the General Data Protection Regulation in the European Union and the Health Insurance Portability and Accountability Act (HIPAA) in the United States []. Besides, Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD)-LLM [] is an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical and health care applications. There is a growing consensus within the academic community regarding the paramount importance of patient privacy protection in LLM research, with numerous concerns and queries being raised concerning the potential risks to sensitive data [,]. A range of privacy-preserving techniques is being widely considered and adopted within the health care domain to ensure data security and regulatory compliance. These include established methods such as deidentification [,], differential privacy [], federated learning [], and homomorphic encryption [,].
This leads us to our core research question: What measures are being used to protect patient privacy in the PHI-LLMs in the health care field, and are these measures sufficient? Although there are some systematic reviews or scope reviews of LLM research in health care, no scoping review has been published on this critical issue. The primary objective of this study is to conduct a scoping review of the existing literature on PHI-LLMs in health care, evaluate the adequacy of current approaches, and identify areas in need of improvement to ensure robust patient privacy protection.
This scoping review was guided by the framework for scoping studies by Arksey and O’Malley []. Besides, the study reporting followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR; checklist provided in ) []. We focused on the following three research questions: (1) What studies on the development and application of LLMs using PHI currently exist within the health care domain? (2) What patient privacy considerations are addressed in existing PHI-LLMs research, and are these measures sufficient? (3) How can future research on the development and application of LLMs using PHI better protect patient privacy?
The inclusion and exclusion criteria for studies are shown in .
Inclusion criteria
Exclusion criteria
We searched PubMed and Embase for studies published between January 1, 2022, and July 20, 2025. This timeframe was chosen to coincide with the release and rapid adoption of advanced LLMs (eg, GPT-3.5 and ChatGPT) and the subsequent surge of their applications in health care. Earlier studies (published before 2022), which primarily investigated transformer-based or nongenerative models, were excluded as they fell outside the scope of this review. The search strategies were drafted by ZXY and further refined through team discussion. The final search strategy can be found in . The final search results were exported into EndNote X9, and duplicates were removed by a library technician.
First, the titles and abstracts of identified studies were independently screened by 2 researchers (ZXY and LSY) based on the inclusion and exclusion criteria. Any disagreements between the reviewers were resolved through group discussions involving at least 2 researchers to ensure consensus and maintain the integrity of the selection process. The full-text review was also conducted by 2 researchers (ZXY and LSY) independently, with conflicts resolved through whole-group discussion.
The data extraction form was initially drafted by ZXY based on the study objectives and research questions. Following the draft, the form was refined through group discussions to develop a preliminary extraction template. To ensure consistency in the definitions and extraction criteria, 10 articles were randomly selected for a pilot extraction. Feedback from this pilot phase was used to finalize the extraction form. Subsequently, ZXY and LSY independently extracted data from the included studies. Any conflicts or discrepancies encountered during the extraction process were resolved through comprehensive group discussions involving all researchers to maintain the integrity and consistency of the data extraction.
The extracted data encompassed three main categories:
All extracts are based on reports from the included study itself.
Descriptive analyses were performed to summarize the characteristics of the included studies. Categorical variables were summarized as frequencies and percentages. Sample size was further categorized into predefined ranges and presented as frequencies and percentages, while other continuous or count-based variables were directly summarized as reported. All results were presented in tables or figures according to the main domains of study characteristics, data characteristics, and privacy protection measures.
As this was a scoping review of previously published research, no patients or the public were involved in the design of this study.
After removing duplicates, a total of 6174 citations were identified through searches of electronic databases and references in review articles. Based on their titles and abstracts, 3181 full-text articles were retrieved and assessed for eligibility. Of these, 2993 were excluded for the following reasons: 4 were not written in English, 48 were preprint papers, 126 were unrelated to LLM research, 1387 were reviews, comments, or letters, 38 were protocols, and 1390 did not involve relevant patient data. Following the eligibility assessment, a total of 2717 records were excluded for the following reasons: 647 used only public databases or previously published cases, 439 simulated clinical scenarios or patients rather than using real-world data, 1587 focused on medical knowledge quizzes or examinations, and 44 represented secondary analyses. Ultimately, 464 studies were deemed eligible for this review () []. The specific references included in the analysis are listed in the .
To provide an overview of the studies included in this review, we summarize their general characteristics in . This review encompasses studies published between January 1, 2022, and July 20, 2025. Of the 464 studies included in this review, the majority were published in 2025 (n=256, 55.2%), with a substantial proportion appearing in 2024 (n=188, 40.5%). In contrast, only a small number of studies were published in 2023 (19/464, 4.1%) and 2022 (1/464, 0.2%). This temporal distribution underscores the rapid and recent surge of research on LLM applications in health care, with most evidence emerging within the past 2 years. Based on the institutions of the first authors, the largest proportion of studies originated from the United States (153/464, 33.0%), followed by China (94/464, 20.3%) and Germany (37/464, 8.0%). Moderate contributions were observed from Turkey (31/464, 6.7%), Italy (17/464, 3.7%), Israel (13/464, 2.8%), South Korea (12/464, 2.6%), and the United Kingdom (12/464, 2.6%). Studies from other countries collectively accounted for 20.5% (n=95).
| Characteristic | Values | |||
| Publication year, n (%) | ||||
| 2025 | 256 (55.2) | |||
| 2024 | 188 (40.5) | |||
| 2023 | 19 (4.1) | |||
| 2022 | 1 (0.2) | |||
| Country, n (%) | ||||
| United States | 153 (33.0) | |||
| China | 94 (20.3) | |||
| Germany | 37 (8.0) | |||
| Turkey | 31 (6.7) | |||
| Italy | 17 (3.7) | |||
| Israel | 13 (2.8) | |||
| South Korea | 12 (2.6) | |||
| United Kingdom | 12 (2.6) | |||
| Other countriesa | 95 (20.5) | |||
| Model type, n (%) | ||||
| Existing model | 356 (76.7) | |||
| Fine-tuning model + existing model | 49 (10.6) | |||
| Fine-tuning model | 40 (8.6) | |||
| Self-developed model | 17 (3.7) | |||
| Self-developed model + existing model | 2 (0.4) | |||
| Location of the large language model, n (%) | ||||
| Cloud deployment | 72 (15.5) | |||
| Local deployment | 70 (15.1) | |||
| External cloud service | 69 (14.9) | |||
| Othersb | 8 (1.7) | |||
| Cannot judge | 57 (12.3) | |||
| Not report | 188 (40.5) | |||
| Task type, n (%) | ||||
| Making diagnoses | 134 (28.9) | |||
| Clinical note-taking | 62 (13.4) | |||
| Making treatment recommendations | 45 (9.7) | |||
| Generating medical reports | 43 (9.3) | |||
| Biomedical data mining | 38 (8.2) | |||
| Prognostic predictive model | 22 (4.7) | |||
| Communicating with patients | 20 (4.3) | |||
| Making diagnoses + Making treatment recommendations | 19 (4.1) | |||
| Other tasksc | 81 (17.5) | |||
| Sample sized, n (%) | ||||
| Patients (n=342, 73.7%) | ||||
| <100 | 125 (36.5) | |||
| 100-1000 | 161 (47.1) | |||
| 1000-10,000 | 35 (10.2) | |||
| ≥10,000 | 19 (5.6) | |||
| Not report | 2 (0.6) | |||
| Token (n=16, 3.4%) | ||||
| 100-1000 | 10 (62.5) | |||
| 1000-10,000 | 3 (18.8) | |||
| ≥10,000 | 2 (12.5) | |||
| Not report | 1 (6.3) | |||
| Notes (n=94, 20.3%) | ||||
| <100 | 12 (12.8) | |||
| 100-1000 | 19 (20.2) | |||
| 1000-10,000 | 23 (24.5) | |||
| ≥10,000 | 38 (40.4) | |||
| Not report | 2 (2.1) | |||
| Images (n=12, 2.6%) | ||||
| <100 | 2 (16.7) | |||
| 100-1000 | 7 (58.3) | |||
| 1000-10,000 | 1 (8.3) | |||
| ≥10,000 | 2 (16.7) | |||
| Number of centerse, n (%) | ||||
| 1 | 377 (81.3) | |||
| 2 | 41 (8.8) | |||
| 3 | 16 (3.4) | |||
| 4 | 7 (1.5) | |||
| ≥5 | 20 (4.3) | |||
| Not report | 3 (0.6) | |||
| Summarize the large language models used in the research, n (%) | ||||
| ChatGPT | 341 (73.5) | |||
| Llama | 74 (15.9) | |||
| Mistral | 27 (5.8) | |||
| Flan | 13 (2.8) | |||
| Claude | 19 (4.1) | |||
| Gemini | 32 (6.9) | |||
| Gemma | 9 (1.9) | |||
| GLM | 9 (1.9) | |||
| Deepseek | 6 (1.3) | |||
| Qwen | 13 (2.8) | |||
| Fine-tuning | 49 (10.6) | |||
| Others | 73 (15.7) | |||
| Type of data, n (%) | ||||
| Text | 366 (78.9) | |||
| Text + Image | 35 (7.5) | |||
| Text + Audio | 4 (0.9) | |||
| Image | 56 (12.1) | |||
| Audio | 2 (0.4) | |||
| Text + Image + Audio | 1 (0.2) | |||
| Data collection method, n (%) | ||||
| Retrospective | 409 (88.1) | |||
| Prospective | 53 (11.4) | |||
| Prospective + retrospective | 2 (0.4) | |||
| Follow the statement, n (%) | ||||
| Not report | 435 (93.8) | |||
| STROBEf | 17 (3.7) | |||
| STARDg | 3 (0.6) | |||
| TRIPODh | 7 (1.5) | |||
| CLAIMi | 2 (0.4) | |||
| Registered, n (%) | ||||
| No | 451 (97.2) | |||
| Yes | 13 (2.8) | |||
aOther countries: Japan (n=11), Australia (n=10), France (n=9), Spain (n=9), Canada (n=6), Switzerland (n=5), India (n=5), Singapore (n=4), Belgium (n=3), Brazil (n=3), Croatia (n=3), the Netherlands (n=3), Pakistan (n=3), Saudi Arabia (n=3), Ireland (n=2), Mexico (n=2), Portugal (n=2), Romania (n=2), Thailand (n=2), Burkina Faso (n=1), Finland (n=1), Iran (n=1), Jordan (n=1), Poland (n=1), Ukraine (n=1), United Arab Emirates (n=1), and Vietnam (n=1).
bOther location of LLMs: local deployment + cloud deployment (n=5); local deployment + external cloud service (n=2); cloud deployment + external cloud service (n=1).
cOther tasks: synthesizing data for research (n=15), translation (n=12), triaging patients (n=11), conducting medical research (n=7), making diagnoses + triaging patients + Making treatment recommendations (n=5), making diagnoses + generating medical reports (n=4), generating billing codes (n=3), writing prescriptions (n=3), making diagnoses + triaging patients (n=2), making diagnoses + biomedical data mining (n=2); educating patients (n=2), making treatment recommendations + triaging patients (n=2), communicating with patients + making treatment recommendations (n=2), clinical note-taking + making treatment recommendations (n=2); enhancing medical knowledge (n=1), educating patients + making treatment recommendations + making diagnoses (n=1), communicating with patients + making diagnoses + making treatment recommendations (n=1), triaging patients + prognostic (n=1), generating medical reports + making treatment recommendations (n=1), generating medical reports + prognostic (n=1), clinical note-taking + generating medical reports (n=1), clinical note-taking + prognostic (n=1), and clinical note-taking + translation (n=1).
dSample size was defined according to the primary data modality: number of patients (clinical studies), number of images (imaging studies), or number of clinical notes/documents (text-based studies), number of tokens (referring to the unit of original studies).
eReferring to the number of clinical sites contributing patient data.
fSTROBE: Strengthening the Reporting of Observational Studies in Epidemiology.
gSTARD: Standards for Reporting of Diagnostic Accuracy.
hTRIPOD: Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis.
iCLAIM: Checklist for Artificial Intelligence in Medical Imaging.
With respect to model type, 356 (76.7%) studies used existing models, 49 (10.6%) studies combined fine-tuning with existing models, 40 (8.6%) studies applied fine-tuning alone, 17 (3.7%) studies developed their own models, and 2 (0.4%) studies reported using both self-developed and existing models. Regarding the location of LLM deployment, 72 (15.5%) studies reported cloud deployment, 70 (15.1%) studies reported local deployment, and 69 (14.9%) studies reported the use of external cloud services. Eight (1.7%) studies used other deployment approaches, 57 (12.3%) studies could not be judged from the report, and 188 (40.5%) studies did not provide deployment information.
A total of 134 (28.9%) studies focused on making diagnoses, 62 (13.4%) studies focused on clinical note-taking, 45 (9.7%) studies focused on making treatment recommendations, 43 (9.3%) studies focused on generating medical reports, and 38 (8.2%) studies focused on biomedical data mining. Prognostic predictive model tasks were examined in 22 (4.7%) studies, and communication with patients in 20 (4.3%) studies. In addition, 19 (4.1%) studies combined diagnostic and treatment recommendation tasks, while other tasks were reported in 81 (17.5%) studies.
Regarding sample size, the distribution was varied: 125 (36.5%) studies enrolled fewer than 100 patients, 161 (47.1%) studies included 100-1000 patients, 35 (10.2%) studies included 1000-10,000 patients, 19 (5.6%) studies included more than 10,000 patients, and 2 (0.6%) studies did not report sample size. For token-based datasets (n=16), 10 (62.5%) studies used 100-1000 tokens, 3 (18.8%) studies used 1000-10,000, 2 (12.5%) studies used more than 10,000, and 1 (6.3%) study did not report. Regarding note-based datasets (n=94), 12 (12.8%) studies analyzed fewer than 100 notes, 19 (20.2%) studies used 100-1000, 23 (24.5%) studies used 1000-10,000, 38 (40.4%) studies used more than 10,000, and 2 (2.1%) studies did not report. For image-based datasets (n=12), 2 (16.7%) studies included fewer than 100 images, 7 (58.3%) studies included 100-1000, 1 (8.3%) studies included 1000-10,000, and 2 (16.7%) studies included more than 10,000.
Regarding the number of centers, 377 (81.3%) studies were conducted in a single center, 41 (8.8%) studies were conducted in 2 centers, 16 (3.4%) studies were conducted in 3 centers, and 7 (1.5%) studies were conducted in 4 centers. Twenty (4.3%) studies involved 5 or more centers, while 3 (0.6%) studies did not report this information. As for LLMs used in the research, ChatGPT was the most frequently used model, accounting for 341 (73.5%) studies. This was followed by 74 (15.9%) studies used Llama, 32 (6.9%) studies used Gemini, 27 (5.8%) studies used Mistral, 19 (4.1%) studies used Claude, and 13 (2.8%) studies used Qwen. Additionally, 49 (10.6%) studies applied fine-tuning techniques, while 73 (15.7%) studies reported using other models not specifically listed. Regarding data type, 366 (78.9%) studies used text; 56 (12.1%) studies used image data; 35 (7.5%) studies combined text and image; 4 (0.9%) studies combined text and audio; 2 (0.4%) studies used audio only; and 1 (0.2%) study combined text, image, and audio. With respect to data collection method, 409 studies (88.1%) were retrospective, 53 (11.4%) studies were prospective, and 2 (0.4%) studies combined both prospective and retrospective approaches.
With respect to reporting standards, 435 (93.8%) studies did not specify adherence to any guideline/statement, while 17 (3.7%) studies followed Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement, 3 (0.6%) studies followed the Standards for Reporting of Diagnostic Accuracy (STARD) statement, 7 (1.5%) studies followed TRIPOD statement, and 2 (0.4%) studies followed Checklist for Artificial Intelligence in Medical Imaging. Regarding study registration, 451 (97.2%) studies were not registered, and only 13 (2.8%) studies reported registration.
The Sankey diagram () shows that the most frequent disease-task pairs included tumors with making diagnoses (n=35), tumors with making treatment recommendations (n=23), and tumors with clinical note-taking (n=16). In addition, studies categorized as “not special” also frequently addressed making diagnoses (n=24) and clinical note-taking (n=15). Other notable pairs included musculoskeletal disorders with making diagnoses (n=17), neurological disorders with making diagnoses (n=13), and circulatory diseases with making diagnoses (n=11). Tasks such as generating medical reports and biomedical data mining were also commonly associated with tumors and “not special” categories. The summary table could be found in .

outlines the characteristics of privacy protection measures implemented in the included studies. Regarding ethical oversight, 419 (90.3%) studies reported approval from an ethics committee, whereas 45 (9.7%) studies did not. With respect to patient consent, 224 (48.3%) studies reported a waiver of informed consent, 92 (19.8%) studies indicated that informed consent had been obtained, and 148 (31.9%) studies did not report consent information. For data availability, 203 (43.8%) studies did not provide a statement, 160 (34.5%) studies declared that data were available from the corresponding author upon reasonable request, 66 (14.2%) studies stated that data were not available, and 35 (7.5%) studies made data publicly accessible.
| Characteristics | Values, n (%) | ||
| Ethical review | |||
| Yes | 419 (90.3) | ||
| No | 45 (9.7) | ||
| Patient consent | |||
| Waiver of informed consent | 224 (48.3) | ||
| Not report | 148 (31.9) | ||
| Informed consent has been obtained | 92 (19.8) | ||
| Data availability declaration | |||
| Not report | 203 (43.8) | ||
| Corresponding author on reasonable request | 160 (34.5) | ||
| Not open | 66 (14.2) | ||
| Public | 35 (7.5) | ||
| Privacy protection technology | |||
| Not report | 178 (38.4) | ||
| Deidentification | 158 (34.1) | ||
| Cannot judge from report | 116 (73.4) | ||
| Based on manual | 17 (10.8) | ||
| Based on rule matching | 13 (8.2) | ||
| Othersa | 12 (7.6) | ||
| Anonymization | 91 (19.6) | ||
| Deidentification+Anonymization | 23 (5.0) | ||
| Othersb | 14 (3.0) | ||
| Is there a statement to remove any personally identifiable information? | |||
| No | 363 (78.2) | ||
| Yes | 101 (21.8) | ||
| Were direct identifiers or indirect identifiers removed? | |||
| Direct identifiers | 166 (35.8) | ||
| Indirect identifiers | 9 (1.9) | ||
| Cannot judge | 107 (23.1) | ||
| Not report | 182 (39.2) | ||
| Whether the degree of deidentification is assessed? | |||
| No | 458 (98.7) | ||
| Yes | 6 (1.3) | ||
| Reidentification protection technology used? | |||
| No | 455 (98.1) | ||
| Yes | 9 (1.9) | ||
| Declaration of compliance with safety standards | |||
| Health Insurance Portability and Accountability Act | 44 (9.5) | ||
| General Data Protection Regulation | 6 (1.3) | ||
| Both | 2 (0.4) | ||
| Not report | 412 (88.8) | ||
aBased on rule matching + machine learning + deep learning (n=3), based on LLMs (n=2), based on rule matching + manual (n=2), based on rule matching + machine learning (n=1), based on synthetic data (n=1), based on postprocessing (n=1), based on machine learning (n=1), and based on deep learning+ postprocessing (n=1).
bData hosting (n=5), anonymization + data hosting (n=3), federated learning (n=1), anonymization + data hosting + homomorphic encryption (n=1), anonymization + homomorphic encryption (n=1), deidentification + data hosting (n=1), data augmentation (likely referred to synthetic data generation; n=1), and homomorphic encryption (n=1).
When examining privacy protection technologies, 178 (38.4%) studies did not report the methods used, while 158 (34.1%) studies applied deidentification, 91 (19.6%) studies used anonymization, 23 (5.0%) studies reported combining both, and 14 (3.0%) studies used other technologies. Among those providing more detail about deidentification, 17 (10.8%) studies used manual methods, 13 (8.2%) studies applied rule-based matching, 12 (7.6%) studies reported other approaches, and 116 (73.4%) studies cannot be judged from the reports.
Concerning statements on the removal of personally identifiable information, 363 (78.2%) studies did not provide such a statement, while 101 (21.8%) studies explicitly reported it. About the type of identifiers removed, 166 (35.8%) studies specified the removal of direct identifiers, 9 (1.9%) studies reported the removal of indirect identifiers, 107 (23.1%) studies could not be judged from the report, and 182 (39.2%) studies did not provide this information. Regarding assessment of the degree of deidentification, 458 (98.7%) studies did not report such an assessment, while 6 (1.3%) studies did. Concerning reidentification protection technologies, 455 (98.1%) studies did not use them, and 9 (1.9%) studies reported their use. With respect to compliance with safety standards, 44 (9.5%) studies declared adherence to HIPAA, 6 (1.3%) studies to the General Data Protection Regulation, and 2 (0.4%) studies to both, whereas 412 (88.8%) studies did not provide such information.
shows the characteristics of privacy protection technology for different data types. Among text-based studies (n=366), 131 (35.8%) studies applied deidentification, 74 (20.2%) studies used anonymization, 15 (4.1%) studies combined both, 4 (1.1%) studies reported data hosting, 2 (0.5%) studies reported anonymization with data hosting, 6 (1.6%) studies used other methods, and 134 (36.6%) studies did not report. For image-based studies (n=56), 12 (21.4%) studies reported deidentification, 10 (17.9%) studies reported anonymization, 5 (8.9%) studies reported both, 1 (1.8%) study reported anonymization with data hosting, 1 (1.8%) study reported data hosting, and 27 (48.2%) studies did not report. For audio-based studies (n=2), one (50.0%) study reported anonymization and one (50.0%) study did not. For combined text and image studies (n=35), 14 (40.0%) studies used deidentification, 6 (17.1%) studies used anonymization, 3 (8.6%) studies used both, and 12 (34.3%) studies did not report. For text and audio studies (n=4), one (25.0%) study reported deidentification, and 3 (75.0%) studies did not. The single study using text, image, and audio data did not report its privacy protection method.
| Type of data and privacy protection technologya | Values, n (%) | |
| Text (n=366) | ||
| Deidentification | 131 (35.8) | |
| Anonymization | 74 (20.2) | |
| Deidentification + anonymization | 15 (4.1) | |
| Data hosting | 4 (1.1) | |
| Anonymization + data hosting | 2 (0.5) | |
| Othersb | 6 (1.6) | |
| Not reported | 134 (36.6) | |
| Images (n=56) | ||
| Deidentification | 12 (21.4) | |
| Anonymization | 10 (17.9) | |
| Deidentification + anonymization | 5 (8.9) | |
| Anonymization + data hosting | 1 (1.8) | |
| Data hosting | 1 (1.8) | |
| Not reported | 27 (48.2) | |
| Audio (n=2) | ||
| Anonymization | 1 (50.0) | |
| Not reported | 1 (50.0) | |
| Text+Images (n=35) | ||
| Deidentification | 14 (40.0) | |
| Anonymization | 6 (17.1) | |
| Deidentification + anonymization | 3 (8.6) | |
| Not reported | 12 (34.3) | |
| Text+Audio (n=4) | ||
| Deidentification | 1 (25.0) | |
| Not reported | 3 (75.0) | |
| Text + Images + Audio (n=1) | ||
| Not reported | 1 (100.0) | |
aClassify according to the conditions reported in the original articles of the included studies.
bAnonymization + data hosting + homomorphic encryption (n=1), anonymization + homomorphic encryption (n=1), federated learning (n=1), data augmentation (n=1), deidentification + data hosting (n=1), and homomorphic encryption (n=1).
In this scoping review, we identified 464 studies published between 2022 and 2025 that focus on the development and application of LLMs in health care using PHI. Strikingly, 256 (55.2%) of these studies were published in 2025 alone, compared with 188 (40.5%) studies in 2024, 19 (4.1%) studies in 2023, and only one (0.2%) study in 2022. This sharp increase highlights the extremely rapid pace of research in this emerging field and reflects the growing recognition of both the opportunities and challenges associated with LLM deployment in health care. These studies encompass a variety of countries, disease domains, and task types. Overall, the ethical reviews of these studies have been largely satisfactory. The vast majority of studies have reported on the approval of the ethics committee, ensuring that their procedures meet the relevant ethical standards. Nevertheless, there remains a shortfall in reporting on informed consent in certain prospective studies. It is concerning that a small number of LLM research projects using imaging data or retrospective data from electronic medical records fail to adequately report ethical review processes and the consideration of patient-informed consent, including whether consent was obtained or formally waived. Even when research involves only patient imaging data or retrospective data, it must still undergo rigorous ethical review [,]. Strict adherence to ethical review processes is essential to ensure the fairness and scientific integrity of medical research and to safeguard patient rights.
According to our findings, more than half of LLMs use cloud deployments, and the generative nature of LLMs can accidentally expose private data learned during training [], including PII []. And the vast parameter counts and extensive training corpora of LLMs enlarge their memorization footprint, leakage vectors, and prompt-manipulation surface, thereby exposing critical vulnerabilities that render these models prime targets for prompt-injection and data-poisoning attacks. Therefore, it is necessary to enhance the privacy protection of PHI. Cloud deployment offers the advantages of cost-effectiveness and scalability, as users are not required to invest in expensive hardware and can dynamically scale computing resources as needed. Additionally, cloud services provide established tools and global accessibility, facilitating rapid iteration and collaboration among distributed teams. However, cloud deployments also face significant data privacy risks, as data must be uploaded to third-party servers, which can lead to risks such as data breaches, unauthorized access, compliance issues, service disruptions, and model exploitation. These risks can be effectively mitigated through the integration of technical countermeasures like zero-trust architecture [], edge computing [], data encryption [], and access control.
Given the inherently high sensitivity of medical data and the significant risk of irreversible harm upon disclosure, the development of privacy-preserving safeguards for LLMs has become imperative. This requirement is not merely a technical prerequisite but also reflects an uncompromising mandate at the ethical and regulatory levels. In current research on the application of LLMs to health care, approximately one-third of the studies fail to mention any techniques for effectively protecting PHI. PHI includes PII such as patient names, addresses, Social Security numbers, and medical record numbers. Once compromised, such information can lead to serious privacy breaches and security risks. Therefore, removing PII is the first crucial step in safeguarding patient privacy [,]. By effectively eliminating or obfuscating these direct identifiers, it becomes possible to mitigate the risk of unauthorized access to PHI and thereby reduce the negative consequences of potential data breaches. The failure to prioritize PII protection in the deployment of LLMs within health care poses significant risks that extend beyond immediate privacy concerns. It threatens the integrity of patient-provider relationships, exposes individuals to financial and identity-related crimes, stifles technological and scientific progress, and raises critical ethical issues. Among the included studies, only one mentioned the use of federated learning techniques, and none used homomorphic encryption. Despite this, federated learning and homomorphic encryption are emerging as pivotal techniques for privacy-preserving in LLMs. Federated learning, due to its distributed training and decentralized model architecture, has gained significant traction in health care []. Future research should prioritize the development of comprehensive data privacy protection to facilitate the broader adoption of federated learning and homomorphic encryption in health care. Synthetic data generation has emerged as a promising solution to address privacy concerns in health care research and LLMs []. This approach uses AI models to create realistic, anonymized patient data that preserves privacy while enabling data access for secondary purposes []. While synthetic data offers benefits in promoting privacy, equity, safety, and continual learning, challenges remain, including the potential introduction of flaws and biases. Further research is needed to develop unified quality assessment metrics and address the current deficiency in longitudinal synthetic data generation [].
In studies involving PHI deidentification techniques or the use of anonymized data, the related descriptions are often very vague. Although these studies mention the removal of PII or the use of anonymized data, they typically do not specify which PII elements were removed or merely use ambiguous phrases such as “removing PII” or “using anonymized data,” lacking detailed technical explanations and transparency. First, the absence of clear technical descriptions undermines the credibility and reproducibility of the research, making it difficult for other researchers to replicate the results under the same conditions and thereby affecting the scientific validity and the effectiveness of subsequent applications. Second, privacy protection may be at risk because the lack of transparency in the deidentification process can result in PII not being fully removed, increasing the risk of data breaches. Finally, ethical issues arise as well. If the data processing for deidentification and anonymization is not sufficiently transparent, it can lead to ethical disputes regarding privacy protection and informed consent, especially when assessing whether there remains a risk of reidentifying individuals from the data. This lack of transparency may result in failing to meet the requirements of ethical reviews. According to the Food and Drug Administration, even deidentified data must retain traceability to meet regulatory requirements. Traceability after deidentification is not only a critical component of privacy protection but also essential for ensuring data availability, compliance, and credibility. It plays an indispensable role in data sharing, research transparency, emergency response, and other areas, providing a robust foundation for data-driven decision-making and innovation. When designing and implementing deidentification schemes, the need for traceability must be carefully considered to achieve a balance between privacy protection and data utility. In addition, the descriptions of “de-identification assessment” provided in current studies often lack transparency, leaving their scope and rigor unclear. For instance, it remains uncertain whether such assessments involved quantitative estimation of reidentification risk, reference to regulatory standards such as HIPAA Safe Harbor, or validation by external independent experts. The absence or ambiguity of these elements makes it difficult to determine the effectiveness and compliance of deidentification practices. Future studies should explicitly report these aspects in both methodology and results to ensure greater rigor and credibility in protecting patient privacy.
The HIPAA establishes standards and practices for deidentifying PHI. According to this rule, there are 2 methods for deidentifying PHI: expert determination, which requires a formal assessment by a qualified expert; and Safe Harbor, which involves the removal of specified identifiers so that covered entities and business associates cannot identify individuals from the remaining information. Although HIPAA does not explicitly use the term “anonymization,” anonymization is often considered an irreversible process that ensures data can no longer identify individuals. Anonymization requirements are more stringent than deidentification, as they guarantee that the data cannot be reidentified under any circumstances. While this study retains the original terminology used in the articles reviewed, most studies are vague in their descriptions of deidentification and anonymization, making it difficult to determine which specific methods were used. Researchers should clearly specify the approaches used to protect data privacy to ensure transparency and accuracy.
Medical big data exhibits unique multimodal characteristics. The term “multimodal” refers to the diverse sources and forms of medical data, which include laboratory data (eg, laboratory results), imaging data (eg, computed tomographic scans, x-rays, ultrasounds, and electrocardiograms), and video data containing audio (eg, fetal ultrasounds). Depending on the specific data type, its confidentiality, integrity, and availability are ensured through various methods. In our research, we found that current medical data privacy protection primarily relies on deidentification and anonymization techniques. However, in the context of multimodal medical data, a single privacy protection method is often insufficient to effectively prevent data leakage, tampering, and misuse. Therefore, designing multimodal data privacy protection technologies represents a critical direction for future research.
While existing literature reviews predominantly focus on the applications of LLMs in health care, there remains a notable gap in comprehensive scoping reviews that specifically evaluate privacy protection measures for PHI within LLM implementations (PHI-LLMs). Previous analyses addressing privacy concerns have primarily examined broader contexts rather than focusing specifically on patient information protection in language model applications, resulting in insufficient coverage of both technical safeguards and systemic compliance aspects within health care ecosystems.
A limitation of this study is that the evaluation of privacy protection measures relies solely on the information reported in published papers. Therefore, if certain studies have implemented privacy protection methods but did not disclose them in detail within their articles, we are unable to identify them. This situation may affect the comprehensiveness and accuracy of our evaluation. While we catalogued the adoption of different privacy protection methods, our review did not evaluate their security levels, implementation quality, or practical trade-offs. Future research should systematically assess the effectiveness and applicability of these techniques in health care-specific settings.
Based on the 3 key findings outlined above, we offer the following additional recommendations for protecting patient privacy in health care–related LLM research, structured around 3 phases: study design, implementation, and reporting. When conducting research reports, the key terms can be referred to for a standardized design. The glossary of key terms () can be found in the .
In the design phase of LLM research, patient privacy protection must be prioritized. First, the data minimization principle should be strictly adhered to, meaning only the minimum necessary PHI required to achieve the research objectives should be collected and used []. Second, a clear definition of research purpose and usage scope is essential, ensuring that all PII used has a well-defined purpose and is not repurposed for unauthorized studies or commercial applications. Additionally, ethical approval and informed consent are critical components of the design phase. Researchers must submit detailed research plans to an Institutional Review Board or Ethics Committee, outlining how PII will be obtained, used, and protected. Where applicable, obtaining informed consent from patients is necessary to ensure they are aware of how their data will be used and safeguarded.
In the implementation phase, priority should be given to deploying the LLM locally. During model training, multiple patient privacy protection strategies such as deidentification, anonymization, federated learning, synthetic data, and differential privacy should be used. Due to the potential risk of reidentification, continuous security monitoring and auditing are indispensable. The research team should conduct regular security assessments and vulnerability scans to promptly identify and address potential security vulnerabilities. Postbreach responses also constitute an indispensable part of a comprehensive privacy protection framework. Effective incident response should include rapid detection, containment, patient notification, and remediation strategies, which are increasingly emphasized in health care data governance guidelines.
Research reports should fully embody the principles of transparency and reproducibility. Researchers should disclose in detail the data sources, ethical approval processes, informed consent procedures, and privacy protection techniques used. Selecting appropriate reporting guidelines (such as STROBE [], STARD [], CONSORT-AI [Consolidated Standards of Reporting Trials–Artificial Intelligence] [], and TRIPOD-LLM []) can improve report quality and provide a reference for other researchers.
Our scoping review sounds an alarm on the inadequately addressed imperative of patient privacy protection in medical research using LLMs. In response, we formulate comprehensive recommendations for the study design, implementation, and reporting phases to fortify PHI protection and foster transparency in PHI-LLM research. Our findings compellingly argue for the urgent development of stricter regulatory frameworks and the integration of advanced privacy-preserving technologies to safeguard PHI. It is anticipated that such measures will enable future health care applications of LLMs to achieve a balance between innovation and rigorous patient privacy protection, thus elevating ethical standards and scientific credibility.
The data generated during the scoping review are available from the corresponding author on reasonable request. However, the majority of such data have been presented on paper in tables, figures, and text.
This study is supported by the National Key Research and Development Program of China (numbers 2022YFC3502300 and 2022YFC3502305). The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.
ZXY and LSY conceptualized the study, defined the methodology, performed the database searches, and managed the screening process. ZXY and LSY also performed data extraction and authored the original draft. CZ, YDD, and WSJ reviewed the literature, charted the data, and conducted the analyses. GL, YLZ, and SHC provided supervision and contributed to the writing review. All authors contributed to the discussion of literature screening, data extraction, and writing of the paper. All authors reviewed and approved the final version of the paper. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. All authors affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
None declared.
Edited by A Coristine; submitted 26.Apr.2025; peer-reviewed by P Han, Z Hou, L Pilgram; comments to author 16.Jul.2025; revised version received 02.Nov.2025; accepted 03.Nov.2025; published 21.Nov.2025.
©Xiaoying Zhong, Siyi Li, Zhao Chen, Long Ge, Dongdong Yu, Shijia Wang, Liangzhen You, Hongcai Shang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 21.Nov.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.