Just like each person has unique fingerprints, every CMOS chip has a distinctive “fingerprint” caused by tiny, random manufacturing variations. Engineers can leverage this unforgeable ID for authentication, to safeguard a device from attackers trying to steal private data.
But these cryptographic schemes typically require secret information about a chip’s fingerprint to be stored on a third-party server. This creates security vulnerabilities and requires additional memory and computation.
To overcome this limitation, MIT engineers developed a manufacturing method that enables secure, fingerprint-based authentication, without the need to store secret information outside the chip.
They split a specially designed chip during fabrication in such a way that each half has an identical, shared fingerprint that is unique to these two chips. Each chip can be used to directly authenticate the other. This low-cost fingerprint fabrication method is compatible with standard CMOS foundry processes and requires no special materials.
The technique could be useful in power-constrained electronic systems with non-interchangeable device pairs, like an ingestible sensor pill and its paired wearable patch that monitor gastrointestinal health conditions. Using a shared fingerprint, the pill and patch can authenticate each other without a device in between to mediate.
“The biggest advantage of this security method is that we don’t need to store any information. All the secrets will always remain safe inside the silicon. This can give a higher level of security. As long as you have this digital key, you can always unlock the door,” says Eunseok Lee, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this security method.
Lee is joined on the paper by EECS graduate students Jaehong Jung and Maitreyi Ashok; as well as co-senior authors Anantha Chandrakasan, MIT provost and the Vannevar Bush Professor of Electrical Engineering and Computer Science, and Ruonan Han, a professor of EECS and a member of the MIT Research Laboratory of Electronics. The research was recently presented at the IEEE International Solid-States Circuits Conference.
“Creation of shared encryption keys in trusted semiconductor foundries could help break the tradeoffs between being more secure and more convenient to use for protection of data transmission,” Han says. “This work, which is digital-based, is still a preliminary trial in this direction; we are exploring how more complex, analog-based secrecy can be duplicated — and only duplicated once.”
Leveraging variations
Even though they are intended to be identical, each CMOS chip is slightly different due to unavoidable microscopic variations during fabrication. These randomizations give each chip a unique identifier, known as a physical unclonable function (PUF), that is nearly impossible to replicate.
A chip’s PUF can be used to provide security just like the human fingerprint identification system on a laptop or door panel.
For authentication, a server sends a request to the device, which responds with a secret key based on its unique physical structure. If the key matches an expected value, the server authenticates the device.
But the PUF authentication data must be registered and stored in a server for access later, creating a potential security vulnerability.
“If we don’t need to store information on these unique randomizations, then the PUF becomes even more secure,” Lee says.
The researchers wanted to accomplish this by developing a matched PUF pair on two chips. One could authenticate the other directly, without the need to store PUF data on third-party servers.
As an analogy, consider a sheet of paper torn in half. The torn edges are random and unique, but the pieces have a shared randomness because they fit back together perfectly along the torn edge.
While CMOS chips aren’t torn in half like paper, many are fabricated at once on a silicon wafer which is diced to separate the individual chips.
By incorporating shared randomness at the edge of two chips before they are diced to separate them, the researchers could create a twin PUF that is unique to these two chips.
“We needed to find a way to do this before the chip leaves the foundry, for added security. Once the fabricated chip enters the supply chain, we won’t know what might happen to it,” Lee explains.
Sharing randomness
To create the twin PUF, the researchers change the properties of a set of transistors fabricated along the edge of two chips, using a process called gate oxide breakdown.
Essentially, they pump high voltage into a pair of transistors by shining light with a low-cost LED until the first transistor breaks down. Because of tiny manufacturing variations, each transistor has a slightly different breakdown time. The researchers can use this unique breakdown state as the basis for a PUF.
To enable a twin PUF, the MIT researchers fabricate two pairs of transistors along the edge of two chips before they are diced to separate them. By connecting the transistors with metal layers, they create paired structures that have correlated breakdown states. In this way, they enable a unique PUF to be shared by each pair of transistors.
After shining LED light to create the PUF, they dice the chips between the transistors so there is one pair on each device, giving each separate chip a shared PUF.
“In our case, transistor breakdown has not been modeled well in many of the simulations we had, so there was a lot of uncertainty about how the process would work. Figuring out all the steps, and the order they needed to happen, to generate this shared randomness is the novelty of this work,” Lee says.
After finetuning their PUF generation process, the researchers developed a prototype pair of twin PUF chips in which the randomization was matched with more than 98 percent reliability. This would ensure the generated PUF key matches consistently, enabling secure authentication.
Because they generated this twin PUF using circuit techniques and low-cost LEDs, the process would be easier to implement at scale than other methods that are more complicated or not compatible with standard CMOS fabrication.
“In the current design, shared randomness generated by transistor breakdown is immediately converted into digital data. Future versions could preserve this shared randomness directly within the transistors, strengthening security at the most fundamental physical level of the chip,” Lee says.
“There is a rapidly increasing demand for physical-layer security for edge devices, such as between medical sensors and devices on a body, which often operate under strict energy constraints. A twin-paired PUF approach enables secure communication between nodes without the burden of heavy protocol overhead, thereby delivering both energy efficiency and strong security. This initial demonstration paves the way for innovative advancements in secure hardware design,” Chandrakasan adds.
This work is funded by Lockheed Martin, the MIT School of Engineering MathWorks Fellowship, and the Korea Foundation for Advanced Studies Fellowship.
Sidley’s Asia Funds and Financial Services Newsletter discusses important regulatory and enforcement developments that affect financial institutions, investment advisers, and investment funds operating in the Asia-Pacific region in a fast-changing regulatory landscape. In this issue, we cover (among other things) the impact of the U.S. Outbound Investment Rules on managers with Chinese investment exposure, a critical review of the Hong Kong Securities and Futures Commission (SFC) enforcement process, as well as Singapore’s proposals to tighten liquidity management guidelines for fund managers.
Featured articles
U.S. Outbound Investment Rules Reshape Fund Management Strategy
New U.S. regulations on outbound investments are fundamentally transforming fund management, particularly affecting managers with Chinese investment exposure. Three key developments are driving this change: (i) the Outbound Investment Regulations (OIR) that took effect January 2, 2025; (ii) the Comprehensive Outbound Investment National Security (COINS) Act enacted December 18, 2025; and (iii) new Treasury FAQs issued December 23, 2025.
Expanding Scope Under COINS Act
The existing OIR, as detailed in our May 2025 Update, established prohibitions and notification requirements for U.S. person investments into China-related entities in semiconductors, artificial intelligence, and quantum computing. The COINS Act significantly expands this framework in three ways.
First, “countries of concern” will expand beyond China, Hong Kong, and Macau to include Russia, Iran, North Korea, Cuba, and Venezuela under the Maduro regime. Second, covered foreign persons will include Chinese Communist Party Central Committee members and political leadership of countries of concern, plus entities “subject to their direction or control.” Third, covered technologies expand to five sectors, adding hypersonic systems to the existing categories, with Treasury authorized to designate additional technologies.
Key Clarifications on Public Securities
The new Treasury FAQs provide crucial guidance on the publicly traded securities exception. Settlement timing, not execution date, determines exception eligibility. If settlement occurs after listing, acquisitions fall within the exception, even for pre–initial public offering (IPO) subscriptions. Treasury reversed its position on minority shareholder protections, now considering director nomination rights as standard protections when generally available to similarly situated shareholders, though director appointment rights remain restricted.
Enhanced Due Diligence Requirements
Each investment requires individual assessment through “reasonable and diligent inquiry” involving public information searches, target questioning when possible, and contractual representations. The COINS Act authorizes Treasury to publish a nonexhaustive list of covered entities, but transaction-specific diligence will remain necessary.
Impact on Limited Partner Investments
The COINS Act may fundamentally alter limited partner (LP) investment frameworks. Currently, U.S. LPs can invest in non-U.S. funds with contractual assurances that capital won’t fund prohibited transactions. Under COINS, LPs would need assurances against any investment in entities from countries of concern, regardless of sector involvement, potentially severely restricting U.S. participation in non-U.S. funds with such exposure.
Administrative Enhancements
The COINS Act provides Treasury with new tools including nonbinding feedback mechanisms for transaction guidance, voluntary self-disclosure frameworks for violations, and expanded exception categories covering de minimis transactions, ancillary transactions, ordinary business activities, and regulated foreign investment companies.
Operational Implications
Fund managers face comprehensive operational changes. Investment strategies require reassessment, with potential pivots toward allied nations and sophisticated screening mechanisms. Due diligence processes become more complex and time-consuming, examining ownership structures, revenue sources, and potential connections to restricted entities. Compliance frameworks need substantial updating with new monitoring systems and revised fund documentation. Different fund types face unique challenges. Private equity managers must navigate complex exit strategies with potentially limited strategic buyers, while venture capital funds may redirect focus from restricted country startups toward opportunities in allied nations.
Implementation Timeline
Current OIR rules remain effective while Treasury has 450 days from December 18, 2025, to implement COINS Act regulations. Fund managers must immediately audit existing portfolio exposure, update compliance programs, and enhance investor communications while preparing for substantially more restrictive requirements. The changes represent more than additional regulation — they fundamentally reshape global fund management toward domestic and allied nation investments with enhanced compliance requirements and permanent structural market changes.
SFC Enforcement Activity Under Scrutiny: Review Identifies Critical Gaps in Case Management and Cross-Border Corporation
The Process Review Panel (PRP) annual report (released December 2025) presents a comprehensive evaluation of SFC enforcement capabilities, revealing both operational strengths and persistent systemic challenges that affect regulatory effectiveness. Of the 60 cases reviewed, 24 were enforcement matters, providing significant insight into the SFC’s regulatory performance during a period of evolving market complexity.
Case Management and Procedural Efficiency
The report identifies worrisome delays in case processing, with enforcement cases ranging from three months’ to 16 years’ completion time. Four specific cases highlighted critical bottlenecks in SFC’s internal machinery. One case demonstrated a particularly problematic nine-month delay in referring matters to the Department of Justice, followed by an additional eight-month period to secure approval for Market Misconduct Tribunal proceedings. This 17-month administrative lag underscores fundamental inefficiencies in interdivisional coordination and external referral mechanisms.
Another case revealed excessive delays between investigation completion and disciplinary action, with over one year elapsing before issuing a Notice of Proposed Disciplinary Action despite the case’s straightforward nature. The PRP’s criticism here reflects broader concerns about the SFC’s ability to balance thoroughness with timeliness in regulatory responses.
Cross-Boundary Enforcement Challenges
The report emphasizes growing complexities in cross-border investigations, particularly involving Mainland suspects. While acknowledging the SFC’s strengthened cooperation framework with the China Securities Regulatory Commission through various memoranda of understanding (MOUs), including the significant 2023 bilateral agreement, practical enforcement outcomes remain limited. Several cases concluded with no further action despite substantial investigative investment, primarily due to suspects’ being based in or having absconded to the Mainland.
The PRP’s observations suggest that while cooperation mechanisms exist, their practical application requires further refinement. The panel encourages deeper collaboration with Mainland authorities, recognizing that successful cross-boundary enforcement is increasingly critical to Hong Kong’s regulatory effectiveness given integrated market structures.
Strategic Planning and Resource Optimization
Perhaps most concerning are cases where significant SFC resources were expended without meaningful enforcement outcomes. The report describes investigations involving fictitious transactions and false disclosures that ultimately yielded no disciplinary or legal action due to suspects being unlocatable or having limited realizable assets. This pattern suggests potential inadequacies in early case assessment and strategic planning.
The PRP recommends more proactive measures to protect investor interests, including earlier asset freezing orders and improved coordination with related regulatory bodies such as the Accounting and Financial Reporting Council. These suggestions reflect growing expectations that regulators should act preventively rather than merely reactively.
Technological Advancement and Operational Modernization
The report highlights technology as both a solution and an ongoing challenge. While acknowledging the SFC’s adoption of advanced technologies and the positive impact of the March 2023 investor identification regime, historical cases demonstrate significant inefficiencies in handling voluminous trading data. The PRP encourages continued exploration of artificial intelligence and other technologies to enhance investigative efficiency, recognizing that technological sophistication is essential for modern financial regulation.
Overall Assessment and Future Priorities
The SFC’s responses to PRP recommendations demonstrate it has implemented various measures to improve efficiency, including streamlined referral processes, expanded expert pools, and coordination mechanisms with law enforcement partners. However, the fundamental challenges revealed — lengthy processing times, cross-boundary enforcement difficulties, and resource allocation inefficiencies — suggest deeper structural issues requiring sustained attention.
The report coincides with increasingly complex regulatory challenges, including cryptocurrency activities, algorithmic trading, and sophisticated cross-border market manipulation schemes. While the SFC plainly demonstrates its commitment to improvement and has established robust frameworks for international cooperation, translating these frameworks into successful enforcement outcomes remains challenging. The PRP’s recommendations provide a roadmap for addressing these systemic issues, emphasizing the need for more proactive case management, enhanced technological adoption, and deeper cross-boundary collaboration to maintain Hong Kong’s position as a leading international financial center.
MAS Proposes Updates to Liquidity Risk Management Guidelines for Fund Managers
In December 2025, the Monetary Authority of Singapore (MAS) issued a consultation paper proposing amendments to the current Guidelines on Liquidity Risk Management Practices (Fund Management Companies) (LRM Guidelines), to provide greater clarity on MAS’ expectations on the management of liquidity risks by fund management companies (FMC).
The proposals were issued pursuant to the International Organization of Securities Commissions (IOSCO) Final Report on Revised Recommendations for Liquidity Risk Management for Collective Investment Schemes issued in May 2025 and Financial Stability Board’s Final Report on Liquidity Preparedness for Margin and Collateral Calls issued in December 2024. As before, an FMC may apply the LRM Guidelines in a proportionate manner, taking into account the nature, size, and complexity of its activities.
The key amendments proposed in the consultation paper:
(a) Strengthening internal governance: An FMC should establish clear accountability and decision-making processes for the design and activation of liquidity management tools under both normal and stressed market conditions. The circumstances under which such tools may be activated should be set out, and the roles and responsibilities of decision makers should be defined. The board and senior management of an FMC should have adequate understanding of potential interactions between liquidity risk and other risk types.
(b) Alignment between fund redemption terms and liquidity of fund assets: An FMC managing open-ended funds is expected to maintain consistency between the fund’s investment strategy and redemption terms under both normal and stressed market conditions. This alignment should be maintained both at the initial product design stage as well as on an ongoing basis.
(c) Adoption of antidilution liquidity management tools (ADTs): An FMC is expected to adopt a diversified approach to liquidity management using both quantitative tools (e.g., suspension of redemptions and redemption gates) and ADTs (e.g., swing pricing). There should be provisions in place for at least one appropriate liquidity management tool, preferably ADTs, and reliance should not be placed solely on suspension or redemption gating. In the case of open-ended funds, particularly those that invest mainly in less-liquid assets, there should be at least one ADT implemented to mitigate material investor dilution.
(d) Imposition of liquidity costs to transacting investors: To safeguard the interests of all investors, an FMC should implement measures such that investors who subscribe to or redeem from a fund bear the liquidity costs associated with their transactions. In particular, where ADTs have been activated, an FMC managing open-ended funds should impose both the explicit costs (e.g., brokerage fees and commissions, trading levies, and settlement fees) as well as implicit costs (e.g., bid-ask spread and market impact costs) on the subscribing or redeeming investors.
(e)Enhancing investor disclosures: An FMC managing open-ended funds should disclose to investors (i) an overview of the fund’s investment strategy and potential liquidity risks, (ii) the features of the redemption terms (e.g., dealing frequency, lock-up periods, and notice and settlement periods), and (iii) the objective and circumstances under which liquidity management tools may be activated.
(f)Ongoing liquidity risk management: An FMC should regularly monitor market depth, liquidity, and concentration of their portfolio positions so that liquidity risks arising from margin and collateral calls are adequately managed and mitigated. Regular reviews should be performed to assess the effectiveness of the liquidity management tools applied and whether additional tools are needed in the management of liquidity mismatches and to provide fair treatment of all investors, where relevant. There should be formalized processes to monitor early warning indicators of potential deterioration in a fund’s liquidity as well as escalation and reporting procedures.
(g)Removal of exchange-traded funds (ETFs) from the scope of the LRM Guidelines: This is pursuant to further guidance issued by IOSCO on ETFs, as ETFs have different liquidity considerations and structural features from open-ended funds.
The consultation period closes on 28 February 2026. The revised LRM Guidelines are expected to come into effect six months after being finalized and published by the MAS. In the meantime, FMCs are encouraged to begin preparations as early as possible.
REGULATORY STANDARDS/UPDATES
SFC Exempts Non-Centrally-Cleared Equity Options From Over-the-Counter Margin Requirements
December 2025: The SFC confirmed it will exempt non-centrally-cleared single-stock options, equity basket options, and equity index options from the over-the-counter (OTC) margin requirements with effect from January 4, 2026. This exemption, which will last until further notice, aims to align with global developments, specifically mirroring approaches in the EU and the UK, and is partly due to licensed corporations’ insignificant current exposure to these options.
MAS Sets Standards on Recruitment and Onboarding Training of Representatives
December 2025: MAS issued an information paper setting out standards that it expects financial institutions to apply when assessing whether their appointed representatives are fit and proper to carry out regulated activities. The paper sets out MAS’s supervisory expectations in the following areas: (i) onboarding of representatives, (ii) monitoring of representatives with adverse information, (iii) onboarding training, and (iv) hiring of assistants by representatives and outsourced activities.
MAS Revises Representative Misconduct Reporting Requirements
December 2025: The new Notice SFA 04-N24 on Reporting of Misconduct of Representatives by Holders of Capital Markets Services Licence and Exempt Persons was issued by MAS after two rounds of public consultation. The new notice, which takes effect on January 1, 2027, revises the scope of reportable “misconduct” to mean (i) any act relating to a contravention of the market conduct provisions under Part 12 of the Singapore Securities and Futures Act 2001, or (ii) any act involving fraud, dishonesty, illegal monetary gains, or any offense of a similar nature (such as cheating, forgery, dishonest misappropriation of monies, criminal breach of trust, bribery, money laundering, and tax evasion). The reporting templates to be used by financial institutions for the misconduct report and investigation report is expected to be shared by MAS by Q2 2026.
MAS Streamlines Incident Reporting Processes
December 2025: MAS has streamlined its incident reporting template and submission channel to facilitate more standardized and streamlined incident data collection. The updates apply to incident reporting under various MAS-issued instruments, including the following notices and guidelines as applicable to Singapore fund management companies: Notice on Technology Risk Management, Guidelines on Business Continuity Management and Guidelines on Outsourcing (Financial Institutions other than Banks). In the event of a reportable incident, financial institutions are to provide initial notification to MAS by contacting its MAS review officer (during office hours) or MAS duty officer (outside of office hours or if the MAS review officer is uncontactable). Following the initial notification, financial institutions are to submit all incident reports to MAS via MAS-Tx using the new incident reporting template from February 1, 2026, onward.
SFC Refines List of Persons Designated as Financial Services Providers Under OTC Clearing Regime
December 2025: The revised list of designated financial service providers (FSPs) under the OTC derivatives clearing regime became effective on January 1, 2026. Licensed persons (including significant nonfinancial counterparties) whose average total position in OTC derivatives meets the US$20 billion clearing threshold must ensure that relevant transaction with designated FSPs are centrally cleared. The list, which contains over 100 entities, primarily includes entities that are part of global systemically important bank groups or major dealer groups that are also clearing members of the largest central counterparties for interest rate swaps in major markets (U.S., Europe, Japan, and Hong Kong).
Hong Kong to Standardize Calculation Periods Under OTC Clearing Regime
January 2026: The SFC and HKMA have jointly proposed adopting two “Calculation Periods” annually for OTC derivatives clearing, effective March 1, 2027. These periods would run from March 1 to May 31 and September 1 to November 30 each year, aiming to enhance operational efficiency and certainty for derivative dealers by standardizing the process for identifying firms that meet the US$20 billion clearing threshold.
INTERMEDIARIES/MARKET SUPERVISION
SFC Enhances Regulatory Cooperation on Cross-Border Digital Asset–Related Matters
January 2026: The SFC entered into an MOU with the Capital Markets Authority of the United Arab Emirates to enhance regulatory cooperation on the supervision of cross-border digital asset–related activities. The MOU establishes a framework for cooperation and exchange of information including, changes that potentially affect the financial or operational stability of regulated entities, for example, enforcement actions or sanctions, ownership changes, major cyberattacks, security breaches, or system failures.
Reminder of Statutory Obligations During SFC Inspections
January 2026: The SFC issued a stern reminder that Section 180 inspections are mandatory statutory obligations, not negotiable requests. The SFC is moving to eliminate the “friction” caused by firms’ using administrative or legal excuses to stall supervisory efforts. Common shields such as client confidentiality, staff absences, or data privacy are now explicitly deemed insufficient reasons to delay or withhold information. Beyond criminal liability for obstruction, the SFC warned of immediate supervisory interventions for noncooperation, including suspending the onboarding of new clients and restricting business activities. Ultimately, the manager-in-charge (MIC) for overall management oversight (supported by the MIC for compliance) will be held personally accountable for any attempts to impede the process.
Hong Kong Regulators Tighten Grip on IPO Gatekeepers
January 2026: The SFC is cracking down on IPO sponsors following a surge in “process-driven” listing applications and declining document quality. Regulators are targeting resource strain, specifically identifying principals overseeing six or more active deals as lacking adequate supervision capacity. Sponsors must immediately report staff who haven’t passed HKSI Paper 16 and disclose principal workloads. “Concerned sponsors” face on-site thematic inspections and must submit signed rectification plans within three months. The SFC warns that persistent failures in due diligence or expert oversight will result in restricted business scopes or suspended applications to protect market integrity.
KEY PRODUCT DEVELOPMENTS
SFC Streamlines Measures for Authorized EU-Regulated Retail Funds
November 2025: The SFC announced streamlined post-authorization measures for Undertakings for Collective Investment in Transferable Securities (UCITS) funds to align with home jurisdiction regulations and enhance Hong Kong’s position as an asset management hub. Key changes include removing the need for prior SFC approval for changes to depositories, investment delegates supervised by home regulators, and material changes in investment objectives.
Hong Kong Regulators Outline Strategic Priorities for Green Finance
January 2026: The Green and Sustainable Finance Cross-Agency Steering Group (co-chaired by the SFC and HKMA) outlined its three-year (2026–28) strategic priorities to consolidate and strengthen sustainability disclosure, sustainable finance markets, external engagement, and talent development while elevating the focus on transition. Key initiatives include developing best practices for transition plan disclosure through a pilot program and enhancing Hong Kong’s position as a leading sustainable financing hub in Asia.
Boosting Liquidity: SFC Permits Affiliated Market Makers and VA Margin Trading
February 2026: The SFC issued new guidance to enhance the liquidity of the regional virtual asset (VA) market. These initiatives include permitting licensed VA brokers to offer margin financing for Bitcoin (BTC) and Ether (ETH) and providing a framework for licensed platforms to offer perpetual contracts to professional investors. Additionally, affiliates of licensed platforms can now act as market makers. On February 13, 2026, the SFC granted a new virtual asset trading platform license to Victory Fintech (VDX), bringing the total number of licensed platforms to 12.
SIGNIFICANT ENFORCEMENT ACTIONS
We highlight below several noteworthy disciplinary and enforcement actions that may be of interest to MICs/responsible officers (ROs), licensed representatives, intermediaries, and others operating in the Hong Kong financial markets.
Market Misconduct
November 2025: The SFC secured its first custodial sentence against a finfluencer for providing unlicensed investment advice through a paid Telegram chat. A request for bail pending appeal was denied.
December 2025: The SFC secured the conviction and immediate eight-month custodial sentence against the wife of a listco chairman for false trading. The defendant had placed a series of bid orders at inflated prices for the listco shares through her own personal account in the final minutes before market close to create a false appearance of demand and alleviate pressure from margin calls on her husband’s account.
December 2025: A former vice president of a share registrar company was given an immediate custodial sentence following his guilty plea for insider dealing. While handling proxy forms for a proposed privatization, the defendant learned that the necessary voting threshold would not be met and sold his shares, avoiding a loss of approximately US$37,000.
January 2026: A retail trader was convicted of false trading, ordered to disgorge all profits and complete 220 hours community service following his guilty plea to “scaffolding” (i.e., repeatedly placing and canceling trading orders at progressively higher prices) and conducting wash trades (i.e., acting as both buyer and seller).
Conflicts of Interest
February 2026: The SFC sanctioned an investment manager and two MICs for extensive fund management failures involving six closed-end Cayman subfunds (Segregated Portfolios). The firm was fined over US$1 million for (among others) failure to disclose and avoid conflicts relating to six high-interest bearing loans made by the investment manager and its director (as lenders) to the Segregated Portfolios (as borrowers). The SFC also banned the CEO/RO and MIC for overall management oversight for 14 months for approving the loans and its MIC for anti-money-laundering for 12 months for failing to adequately screen investors of the Segregated Portfolios (or their beneficial owners), including checking their politically exposed person status.
Virtual Assets
January 2026: The SFC reprimanded and fined an online brokerage firm over US$500,00 for allowing retail clients to trade VA products intended only for professional investors over a four-year period. In determining the penalty, the SFC noted the firm self-reported the breaches, voluntarily compensated affected clients, and ceased all regulated activities in Hong Kong.
Internal Controls
November 2025: The SFC suspended an RO, a director, and a MIC for over three months for allowing the unauthorized sale and transfer of client assets totaling over US$3 million. The MIC had neglected her duties to protect client assets from theft or fraud by processing instructions from a bogus email address and ignoring red flags (including the rejected telegraphic transfers). The licensed corporation was separately reprimanded and fined over US$116,000 for its internal control failures.
December 2025: The SFC reprimanded and fined a Swiss private bank over US$1.4 million for systemic internal control failures that included (among others) inadequate product due diligence for 322 bonds that meant customers did not receive sufficient information and warnings for certain complex products.
Unauthorized Personal Trading
January 2026: A former account executive was suspended for seven months for facilitating unauthorized personal trading by an external broker in a client’s account. The external broker was also suspended for 27 months for concealing from his employer his use of the client’s account (held in the name of his relative) for personal trading.
Miscellaneous: Life Bans
January 2026: A former licensed representative of a well-known bank was banned for life following her criminal conviction for theft of more than US$198,000. She had retained the ATM card and PIN to make unauthorized withdrawals from a client’s bank account.
Proven reserves totaled 300 mmboe, the highest value in the last four years.
The reserves replacement ratio reached 121%, driven by the execution of recovery projects.
The average reserve life stands at 7.8 years for the Ecopetrol Group.
BOGOTÁ, Colombia, Feb. 19, 2026 /PRNewswire/ — Ecopetrol S.A. (BVC: ECOPETROL; NYSE: EC) (“Ecopetrol” and together with its subsidiaries, the “Ecopetrol Group”) reported today its proven reserves of oil, condensate, and natural gas (1P reserves), including its share in proven reserves from subsidiaries, estimated based on the standards of the U.S. Securities and Exchange Commission (SEC). 99% of 1P reserves have been certified by three recognized, specialized, and independent firms: Ryder Scott Company, DeGolyer & MacNaughton, and GaffneyCline & Associates.
As of the end of 2025, Ecopetrol Group´s proven reserves totaled 1,944 billion barrels of oil equivalent (mmboe), representing a 2.7% increase compared to the reserves at the end of 2024.
Although the 2025 Brent reference price (USD 68.64/Bbl) decreased by 13.9% compared to the 2024 price (USD 79.69/Bbl)1, proven reserves contributions reached 300 mmboe and the reserves replacement ratio was 121%, demonstrating the Company’s effective management in line with its long–term sustainability and resilience strategy.
The reserves contributions were mainly the result of: (i) enhanced recovery projects with outstanding performance in the Castilla, Chichimene, and Akacias fields; (ii) better operational management in the Rubiales and La Cira–Infantas fields, focused on asset efficiency and value; and (iii) contracts with the ANH2.
These results represent the highest reserves replacement achieved in the last four years and reflect the capability and commitment of the Ecopetrol Group to generate value across its exploration, development, and production assets, thereby strengthening the sustainability of the Ecopetrol Group.
The following table presents the consolidated balance of proven reserves (1P) for 2025, in million barrels of oil equivalent3:
Concept(SEC)
MMBOE
Proven reserves as of Dec 31, 2024
1,892.7
Revisions*
140.8
Enhanced Recovery
142.6
Extensions and Discoveries
16.1
Purchases/Sales
0.0
Production
–248.0
Proven reserves as of Dec 31, 2025
1,944.2
* “Revisions” includes additions from contracts with the ANH, contributing 100 mmboe.”
Ecopetrol is the largest company in Colombia and one of the main integrated energy companies in the American continent, with more than 19,000 employees. In Colombia, it is responsible for more than 60% of the hydrocarbon production of most transportation, logistics, and hydrocarbon refining systems, and it holds leading positions in the petrochemicals and gas distribution segments. With the acquisition of 51.4% of ISA’s shares, the company participates in energy transmission, the management of real-time systems (XM), and the Barranquilla – Cartagena coastal highway concession. At the international level, Ecopetrol has a stake in strategic basins in the American continent, with Drilling and Exploration operations in the United States (Permian basin and the Gulf of Mexico), Brazil, and Mexico, and, through ISA and its subsidiaries, Ecopetrol holds leading positions in the power transmission business in Brazil, Chile, Peru, and Bolivia, road concessions in Chile, and the telecommunications sector.
This release contains statements that may be considered forward-looking statements within the meaning of Section 27A of the U.S. Securities Act of 1933, as amended, and Section 21E of the U.S. Securities Exchange Act of 1934, as amended. All forward-looking statements, whether made in this release or in future filings or press releases, or orally, address matters that involve risks and uncertainties, including in respect of the Company’s prospects for growth and its ongoing access to capital to fund the Company’s business plan, among others. Consequently, changes in the following factors, among others, could cause actual results to differ materially from those included in the forward-looking statements: market prices of oil & gas, our exploration, and production activities, market conditions, applicable regulations, the exchange rate, the Company’s competitiveness and the performance of Colombia’s economy and industry, to mention a few. We do not intend and do not assume any obligation to update these forward-looking statements.
Head of Corporate Communications (Colombia) Marcela Ulloa Email: [email protected]
1 Brent marker prices referenced in accordance with SEC standards for reserve purposes. 2 Contracts with the ANH enabled the allocation of crude–oil royalties amounting to 95.8 mmboe from the Castilla, Akacias, Caño Sur Este, Chichimene, Rubiales, and Yariguí–Cantagallo fields, under ANH Resolution 0977 of 2025, and 4.5 mmboe from economic rights in Tello – La Jagua. 3 Totals may not exactly equal the sum of the figures due to rounding.
Korea’s AI jobs debate did not begin with Hyundai Motor. But Hyundai Motor’s push to deploy AI-enabled humanoid robots has turned an abstract concern into a concrete policy flashpoint. When Boston Dynamics unveiled the next-generation Atlas platform and Hyundai signaled plans for deployment in manufacturing operations, the reaction was immediate: union resistance, media warnings of robot-driven job loss, and renewed political focus on safeguards for workers.
The timing matters. The controversy emerged as the Korean government sharpened its focus on job disruption from AI diffusion, widening digital divides, and broader social uncertainty. President Lee has repeatedly framed AI adoption as unavoidable but has stressed the need for wider training access and faster worker adjustment so that technological change translates into broad-based productivity gains rather than polarization. In response, the presidential office and the National AI Strategy Committee have begun convening stakeholders around the AI Framework Act and the government’s AI Action Plan, with a clear emphasis on inclusion and workforce readiness.
The Hyundai Motor case now sits at the center of that discussion. It is not just a labor dispute over one company’s technology deployment. It is a test of how Korea manages technology-driven innovation in a high-cost, aging manufacturing economy under intensifying global competition.
This piece makes a narrower argument. The policy debate is increasingly framed around the risk of automation-driven job loss. But the available evidence suggests that Korea’s more immediate constraint is weak productivity growth and uneven labor-market adjustment, not large-scale technological displacement. The right response is not to slow technology deployment, but to pair technology-driven innovation with better measurement of disruption, faster worker transition, and policies that raise productivity across the economy.
Much of the current debate assumes that AI-driven job loss is already accelerating. The evidence does not support that conclusion. Korea does not maintain a high-frequency displacement dataset comparable to the U.S. Job Openings and Labor Turnover Survey (JOLTS), which makes real-time measurement of technology-driven layoffs difficult. That alone argues for caution.
The strongest available evidence points to limited aggregate effects so far. A recent OECD analysis finds no clear evidence that AI exposure has reduced overall employment in Korea to date. Where pressures appear, they are uneven—concentrated among certain routine-intensive roles and younger workers—suggesting adjustment frictions rather than economy-wide displacement.
The broader labor-market picture also does not resemble a technology shock. Recent weakness has been concentrated in construction and manufacturing, sectors facing cyclical headwinds. Official data show that construction employment fell by roughly 125,000–140,000 jobs in 2025 amid a sharp contraction in construction investment. Manufacturing employment also declined alongside softer export demand. These developments align more closely with macroeconomic slowdown than automation-driven disruption.
Taken together, the evidence does not show an economy being hollowed out by AI. It shows sectoral adjustment under cyclical pressure, alongside gradual task reallocation. That distinction matters. Policy should focus on accelerating worker transition and raising productivity, not slowing technology deployment in response to a displacement shock that the data do not yet show.
Even if Korea is not currently facing a large-scale automation shock, it faces a deeper structural challenge that AI and robotics directly relate to: weak productivity growth. Additionally, China’s cost pressure is real and growing. Korean manufacturers face intensifying competition from Chinese firms that now combine rising productivity with lower labor costs and aggressive automation investment. This dual pressure is narrowing Korea’s cost competitiveness across a wide range of manufacturing sectors.
OECD data show that Korea’s GDP per hour worked remains materially below the U.S. frontier. As of the latest comparable year, Korea’s productivity level stands at roughly 70 to 75 percent of the U.S. level, depending on the measure used. That gap has narrowed over decades but remains significant. More importantly, productivity growth has slowed. OECD data indicate that Korea’s labor productivity growth averaged above 3 percent annually in the early 2000s but has fallen to around 1 percent in recent years, reflecting a broader global productivity slowdown.
The Bank of Korea highlights a structural imbalance that helps explain the slowdown. Services account for roughly two-thirds of total employment and over 40 percent of GDP, yet productivity growth in the service sector significantly lags that of manufacturing. Because most Korean workers are employed in services, this gap directly constrains wage growth and long-term living standards.
Unit labor costs in Korea have risen faster than productivity in recent years, particularly in manufacturing, while Chinese producers continue to scale output and upgrade technology. China’s automation push is especially striking. According to the International Federation of Robotics (IFR), China installed more industrial robots in 2023 than the rest of the world combined and accounted for more than half of global installations.
As Chinese firms move up the value chain while maintaining cost advantages, competitive pressure on advanced manufacturing economies such as Korea is increasing. Under these conditions, productivity growth becomes central to maintaining global market share rather than simply controlling costs.
This is where robotics and AI matter. They are not simply labor-saving tools. They are among the few scalable technologies capable of raising output per worker, especially in sectors where productivity has stalled. In an aging economy facing intensifying competition from China and other manufacturing hubs, sustained productivity growth is essential for maintaining income growth and fiscal stability.
ITIF has consistently argued that advanced economies facing demographic pressure must rely on technology-driven productivity gains rather than labor expansion to sustain growth. Automation and robotics are among the few tools capable of delivering sustained productivity gains at scale.
Framing automation primarily as a threat to employment risks missing this competitive reality. Slower adoption does not preserve domestic jobs in the long run. It can shift future investment and production to locations where firms can raise productivity more quickly.
Korea already ranks among the most robot-intensive manufacturing economies in the world. IFR data show that Korea had roughly 1,012 industrial robots per 10,000 manufacturing workers in 2023—the highest density globally. This is often portrayed domestically as excessive automation. It is better understood as a competitive strength.
High robot density reflects Korean firms’ ability to deploy capital effectively, integrate advanced production technologies, and sustain high-value manufacturing. It also supports a broader domestic robotics ecosystem spanning components, software, and system integration.
Major Korean manufacturers are now investing in next-generation platforms, including humanoid and AI-enabled industrial systems. These investments aim to raise productivity and secure a position in a rapidly expanding global robotics market. Countries that lead in robotics typically do so by deploying automation widely and building complementary capabilities around that deployment.
Public narratives that frame robotics primarily as a social risk may unintentionally weaken domestic demand for productivity-enhancing technologies. Over time, that risks slowing ecosystem development and shifting investment toward markets with more supportive adoption environments.
A more grounded response to AI-related anxiety would center on productivity growth and worker transition rather than presuming large-scale job loss. Four priorities stand out.
Improve how disruption is measured: Korea should build a clear public series on layoffs, separations, displacement, and job-to-job flows that can be tracked over time, similar to the U.S. JOLTS framework. Without clear metrics, policy will continue to react to perception rather than evidence.
Align regulatory timelines with real-world technology diffusion: The current one-year implementation window under the AI Framework Act is too short to assess actual risks, industry impact, and compliance costs. Extending the adjustment period to roughly three years would give policymakers time to observe how AI is deployed across sectors, identify genuinely high-risk use cases, and calibrate regulation accordingly. Premature obligations risk slowing adoption before Korea has captured the productivity gains these technologies can deliver.
Treat robotics and automation as competitiveness tools, not social threats: Korea leads the world in robot density because its firms deploy automation at scale. If companies face sustained political pressure for adopting productivity-enhancing technologies, investment and future job creation will shift elsewhere. Domestic deployment strengthens domestic capability and supports the broader robotics ecosystem.
Support worker transition without encouraging long-term detachment from the labor market: Temporary income support tied to reskilling and rapid reemployment is appropriate during periods of adjustment. Long-term or permanent basic income structures are not. Extended income replacement risks weakening labor-market attachment and competitiveness. Policy should instead provide short-term support during reskilling periods while rewarding firms and workers that move quickly into higher-productivity activities.
The debate sparked by recent robotics deployments is ultimately about how Korea chooses to compete. The data do not show an economy being hollowed out by AI. The data show an economy under productivity pressure. That distinction matters. Policies built around anxiety will slow adoption and weaken competitiveness. Policies built around productivity, diffusion, and rapid worker transition will do the opposite.
A follow-up post will explore what practical AI safeguards should look like in Korea.
Samsung Electronics today announced it has reached a significant milestone in 6G development, successfully verifying eXtreme multiple-input multiple-output (X-MIMO) technology in the 7 GHz band — a key candidate frequency for future 6G networks — with KT Corporation (KT) and Keysight Technologies.
Through outdoor field testing, the companies demonstrated a peak downlink data rate of up to 3 gigabits per second (Gbps) in the 7 GHz band using X-MIMO. The breakthrough was enabled by ultra-high-density antenna technology that integrates significantly more antenna elements into equipment of comparable size. Considered a foundational component of 6G, ultra-high-density antenna technology achieves four times the antenna density of current 5G systems.
▲ Researchers from Samsung Research of Samsung Electronics and KT Corporation verify X-MIMO technology in 7 GHz Band for 6G development.
Overcoming 7 GHz Band Challenges
As data demands surge due to advancements in AI, immersive services and fixed wireless access (FWA), 6G is becoming increasingly important in meeting evolving global connectivity needs. The 7 GHz band stands out as a promising candidate for future communications, offering an optimal balance of coverage and capacity between the 5G 3.5 GHz band and millimeter-wave frequencies.
X-MIMO technology in the 7 GHz band is regarded as a core 6G technology because it enhances data throughput by increasing antenna density enabled by shorter wavelengths while achieving coverage comparable to 5G by compensating for the shorter propagation distance of 7 GHz.
Strengthening the 6G Ecosystem Through Collaboration
The verification took place at Samsung Electronics’ Seoul R&D Campus, where researchers measured data rates during the simultaneous transmission of eight data streams from the base station to a single user. To replicate real-world network conditions, Samsung collaborated with KT to establish an outdoor wireless test environment. The testing also utilized Samsung’s 6G base station prototype featuring 256 digital ports and Keysight’s 6G terminal testbed.
“Through our collaboration with KT and Keysight, we have demonstrated the potential for significant improvements in data rates for next-generation communications,” said JinGuk Jeong, Executive Vice President and Head of Advanced Communications Research Center, Samsung Research at Samsung Electronics. “We remain committed to pioneering future network technologies that will enable diverse services and enhanced user experiences in the 6G era.”
“The validation of ultra-high-density antenna technology performance in the 7 GHz band marks a critical step toward 6G commercialization. By securing stable, high-capacity operation in high-frequency bands, we have established a foundational technology for enabling ultra-fast, immersive services,” said Jong-Sik Lee, Executive Vice President and Head of Future Network Laboratory at KT Corporation. “Moving forward, we will continue to drive network innovation in collaboration with Samsung Electronics.”
“This work with Samsung and KT highlights how Keysight’s industry-leading 6G capabilities are accelerating real-world innovation, unlocking new spectrum for early 6G deployments and bridging the gap between research and commercial readiness to enable next-generation AI-driven wireless communications that deliver greater value to customers,” said Kailash Narayanan, Senior Vice President and President of Communications Solutions Group at Keysight.
In addition to this achievement, Samsung and KT successfully validated user-level AI-based radio access network (AI-RAN) optimization technology in a commercial network in December 2025. Including further collaboration with KT on uplink coverage technologies, Samsung will continue working closely with global partners to make data transmission faster and more reliable for users.
With the continuous advancement of medical informatization, the volume of medical big data is growing exponentially, offering significant opportunities for the application of artificial intelligence (AI) in health care, particularly in areas such as assisted diagnosis, personalized treatment, and disease prediction. Especially in recent years, advances in computing power and algorithmic innovation have made machine learning (ML) models a cornerstone of medical intelligence, with their efficient training and optimization relying heavily on large-scale, high-quality datasets from multiple sources [-]. However, the acquisition and sharing of medical data face significant obstacles, including privacy concerns, data security risks, and regulatory constraints [].
In the digital medicine era, the secure exchange and control of sensitive health information have become central concerns in modern health care systems [,]. Meanwhile, strict legal and regulatory requirements—such as HIPAA (Health Insurance Portability and Accountability Act) and GDPR (General Data Protection Regulation)—must be met when handling these personal data []. Additionally, the competition between different medical centers and hospitals has led to data often remaining siloed []. These challenges not only impede the effective integration of multisource health care data but also hinder the translation of ML models from theoretical research to clinical practice.
While conventional centralized data storage architectures can partially facilitate model development, their dependence on centralized infrastructures has become increasingly problematic, rendering them vulnerable to single-point failures and malicious cyberattacks [,]. Consequently, establishing secure frameworks for cross-institutional data sharing and intelligent processing—while rigorously protecting data security and patient confidentiality—has emerged as a pivotal challenge impeding progress in medical AI. The fundamental tension between aggregating health care data for scientific progress and preserving individual privacy and data security has spurred the development of novel computational approaches. Federated learning (FL), as an emerging ML paradigm, addresses part of this problem by allowing institutions to collaboratively train models without exchanging raw data [,]. However, in practice, the deployment of FL in health care faces practical obstacles: the integrity and authenticity of model updates may be subject to security threats, including malicious attacks, client-side data tampering, and model forgery—all of which can reduce the accuracy of the global model. Furthermore, at present, there is a lack of reliable incentive structures for continuous participation, with limited auditability. The heterogeneity between institutions and edge devices undermines the integration and generalization of the model. Blockchain technology, with its decentralized architecture, immutable ledger system, and transparent traceability, has become a potential solution to address these limitations of logical reasoning [,,]. When the two are integrated, blockchain can provide a verifiable source for model contributions, automated and transparent incentive mechanisms [,], tamper-proof logs for auditing, and a governance layer that supports cross-institutional workflows []. Therefore, the combined paradigm of blockchain-based federated learning (BCFL) is expected to become a practical approach to coordinating data privacy, trust, and collaborative intelligence in medicine.
Although research on FL and blockchain has accelerated in recent years, existing reviews exhibit clear gaps in focus and depth, limiting their value for clinical researchers and multidisciplinary audiences. The main shortcomings can be summarized as follows. First, most reviews examine either FL or blockchain technology in isolation, without providing a systematic analysis of how these two approaches can be integrated to address concrete challenges in health care. Second, prior reviews tend to emphasize algorithmic and technical details, while offering limited discussion of how BCFL can be adapted to real-world medical scenarios—such as cross-hospital electronic health record (EHR) integration, collaborative training of medical imaging models, Internet of Medical Things (IoMT) device coordination, and epidemic surveillance. Finally, few reviews adequately address the challenges, regulatory considerations, and future development trends of BCFL within the constraints of modern health care governance frameworks.
To address these gaps, this review makes three key contributions. First, it provides a medical demand–oriented, systematic classification of BCFL frameworks and outlines their typical workflows. Second, it maps different BCFL architectures to representative health care application scenarios, clarifying their practical relevance. Third, it analyzes the technical, regulatory, and implementation challenges that currently hinder BCFL adoption and identifies promising directions for future research, providing evidence-based insights for clinical translation and decision-making in the health care and biomedical research communities.
Objective
Although previous studies have analyzed BCFL frameworks from the perspectives of technology and commercial applications, the review of their practical application scenarios in health care remains limited. Especially in the era of AI, with the explosive growth of medical data, the application of large models in medical practice has encountered some development obstacles, bringing some new ideas for the application prospects of BCFL in health care. This study investigates the synergistic integration of FL and blockchain technology, elucidating their combined architectural framework and operational mechanisms. We demonstrate how this technological convergence enables secure, privacy-preserving health care data sharing and collaborative model development across distributed health care institutions. Although blockchain-enhanced FL has recently emerged as a promising approach in medical research, the field remains nascent, constrained by technical limitations, unresolved privacy issues, and implementation barriers. Furthermore, we present a systematic review of current advancements in blockchain-assisted FL for medical applications and propose actionable research directions to overcome existing challenges.
Key contributions of this work include:
This study systematically reviews the theoretical foundations of FL and blockchain technology and elaborates on their potential and advantages in the medical field.
A comprehensive taxonomy of existing integration frameworks, accompanied by a mechanistic analysis of bidirectional benefits: how blockchain enhances FL security and trustworthiness, and how FL expands blockchain’s utility in distributed computing scenarios.
To summarize recent advancements and practical applications of BCFL across key health care domains, including cross-institutional medical data sharing, IoMT, public health surveillance, and telemedicine.
The current technological limitations and application challenges are examined, and key future research directions are proposed to address these gaps and advance real-world implementation.
By synthesizing current research and offering a structured analytical framework, we aim to provide a comprehensive reference for medical personnel, researchers, and health care policymakers. Ultimately, it fosters the development of trustworthy, privacy-preserving, and collaborative AI systems that can support precision medicine and smart health care in a decentralized digital era.
Methods
Overview
This review systematically summarizes recent advances in BCFL within the medical field, following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist (). To ensure methodological rigor and reproducibility, we adopted a transparent multistep process including comprehensive literature search, independent screening, quality assessment, and evidence synthesis.
Search Strategy
The literature data were primarily retrieved from several prominent academic databases, including PubMed, IEEE Xplore, Web of Science, and Google Scholar. To ensure the timeliness of the research, the review focuses on literature published between January 2018 and February 2025, while also incorporating some early seminal studies to trace the theoretical development and technological evolution of BCFL in the medical field.
After that, the search strategy uses Boolean logic operators to formulate a comprehensive formula, for example, (“blockchain”) AND (“federated learning”) AND (“medical” OR “healthcare”). To enhance retrieval efficiency and encompass interdisciplinary intersections, the search terms are further expanded, such as “(blockchain-enabled federated learning),” “(distributed machine learning),” “(decentralization),” “(Internet of Medical Things (IoMT),” “(telemedicine),” “(EMR),” and “(epidemics)” are incorporated to ensure literature comprehensiveness. The exact search string is as follows: (“blockchain” OR “distributed ledger technology”) AND (“federated learning” OR “collaborative learning” OR “distributed machine learning”) AND (“healthcare” OR “medical” OR “clinical” OR “EMR” OR “(epidemics)” OR “IoMT” OR “telemedicine”). Gray literature (conference proceedings and preprints from arXiv) was screened manually and included only if it contained original data or detailed technical methodology relevant to BCFL.
Inclusion and Exclusion Criteria
To ensure relevance and academic rigor, this review establishes strict inclusion and exclusion criteria. The inclusion criteria are as follows: (1) the studies must involve blockchain and FL technologies and explore their medical applications; (2) the literature published in peer-reviewed journals and reviews indexed in the Science Citation Index or Social Sciences Citation Index or in top-tier international conferences (eg, IEEE and Association for Computing Machinery) and seminal papers with >30 citations were included regardless of publication venue; and (3) reported theoretical frameworks, system architectures, empirical evaluations, or case studies. The exclusion criteria then include (1) studies focusing solely on blockchain or FL without medical applications, (2) editorial and opinion articles that lack technical details or empirical validation, and (3) duplicate reports or low-quality publications from non–peer-reviewed sources.
Literature Screening and Processing
The above search strategy initially retrieved 2547 documents. After automatic deduplication by EndNote, 1327 were retained. Subsequently, two independent reviewers (XW and XC) manually screened all the literature in two stages based on the inclusion and exclusion criteria. First, based on the title and abstract, high-quality reviews, papers from top journals and conferences, and literature that clearly explained the application of BCFL in the health care field were selected, totaling 319 articles. In the second stage, further review was conducted. Through full-text reading, studies lacking empirical verification, technical details, or experimental data were eliminated, and ultimately 111 high-quality documents were retained. In addition, to ensure the comprehensiveness of the literature, this review also referred to the latest review papers and included the core research results cited therein to avoid missing key progress. Detailed literature retrieval strategies can be found in . After the screening process, the reviewers evaluated the quality of the literature. Since the corpus of this review mainly comes from the cross-disciplinary research of medicine, engineering, and computer science, the GRADE (Grading of Recommendations Assessment, Development, and Evaluation) framework commonly used in clinical evidence-based studies was not applicable. We used the Systematic Literature Review quality checklist based on the Kitchenham principle [] to score each of the 12 indicators of the included studies, including reproducibility, method transparency, evaluation design, data description, experimental validity, attack/privacy discussion, and reproducibility, item by item (1/0.5/0). Two independent reviewers (XW and XC) assessed each document. When there were significant differences in the scores (the total score difference of a single document was greater than 2 or there were differences in key items), a third senior reviewer (YX) arbitrated. Ultimately, the studies were classified into three quality grades based on the total score: high (≥9), medium (5-8), and low (<5). When writing summaries and conclusions, priority was given to citing high-quality research. [-] provides complete assessment criteria, scoring rules, and statistics on peer-to-peer agreements.
Results
Study Findings
Finally, this review is founded on over 100 strictly selected papers, encompassing theoretical research, technical architecture, application scenarios, challenges, and future trends. These studies provide a robust academic foundation for an in-depth exploration of BCFL applications in medicine. A PRISMA flow diagram illustrates the systematic selection process ().
Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram of the systematic review phases. FL: federated learning.
Current Status of Artificial Intelligence in Medicine
In the context of the rapid advancements in AI, the performance of ML models heavily depends on access to large volumes of high-quality data. However, the health care sector has long struggled with data sharing due to concerns over privacy, security, and regulatory compliance. The prevalence of “data silos” in health care impedes the advancement of AI and hinders the translation of research findings into practical clinical applications.
Traditional centralized ML approaches require aggregating raw data from multiple sources to a central server for training. While this approach allows for extensive data usage, it still presents significant challenges in practical implementation. First, data stored and transmitted in a centralized system is susceptible to network attacks, which can lead to sensitive information theft or tampering, posing a serious privacy risk. Second, centralized storage faces significant compliance challenges, especially under stringent privacy regulations such as GDPR [] and HIPAA, which further restrict the cross-organizational exchange of health care data. Moreover, health care organizations, research institutions, and enterprises are often reluctant to share critical data due to competitive concerns and resource protection, thus further exacerbating data silos and hindering cross-organizational collaboration in building high-quality ML models.
Federated Learning in Health Care
In this context, the limitations of traditional algorithms are becoming increasingly apparent, driving researchers to explore innovative solutions. In 2016, Google introduced the concept of federated learning [], a distributed and collaborative ML paradigm. At its core, FL enables multiple data holders to collaboratively develop a global model by training locally and exchanging model parameters without sharing raw data. This paradigm is well-suited for scenarios with strict privacy requirements and decentralized data that cannot be centrally stored, such as health care, finance, and smart cities [-]. This approach not only effectively enhances data privacy protection but also overcomes the limitation of data silos, ushering in a new era of privacy-preserving collaborative learning.
FL operates on the principles of distributed model training and global parameter aggregation. One of its core algorithms is the Federated Averaging (FedAvg) algorithm. As illustrated in , the fundamental process can be summarized as follows [,]. The first step is that a central coordination server initializes the global model and distributes it to all participating clients (eg, hospitals or mobile devices). Then each client trains the model locally using its private dataset and transmits the updated model parameters (or gradients) back to the central server in an encrypted form. Following this, the central server collects and aggregates the model parameters uploaded by all clients and updates the global model based on a predefined aggregation algorithm (eg, FedAvg). The updated model is then redistributed to the clients for the next training round. This process is iteratively performed over multiple communication rounds until the global model converges or predefined performance metrics are met.
Figure 2. Federated learning architecture and workflow in the medical field.
Based on the above operating principle of FL, its unique technical characteristics give it different advantages in the health care field: if it can use large-scale, diverse, and geographically distributed datasets without compromising patient privacy, this decentralized approach not only mitigates the impact of data silos, which are common in hospitals and research institutions, but also captures a broader range of features by leveraging heterogeneous real datasets, ultimately achieving an AI model with higher efficiency, robustness, and accuracy. Therefore, since its proposal, FL has shown great application potential in the field of health care []. For instance, FL has been applied in drug research, allowing pharmaceutical companies to leverage shared algorithmic models to accelerate drug discovery while avoiding direct data sharing []. Additionally, FL-based collaborative protocols, involving multiple hospitals and cloud servers, have been developed for EHR analytics []. Moreover, FL has been investigated for predicting hospitalizations in cardiac patients []. FL has also been widely studied in medical imaging applications, such as prostate cancer detection, brain tumor segmentation [], and MRI analysis for Alzheimer and Parkinson diseases. Moreover, FL-based approaches have been proposed for detecting coronavirus infections [,].
Despite the promise of FL in medical applications, its practical application still faces the following challenges: a central concern is its dependence on a centralized server for model coordination and parameter aggregation. However, the central server can be a source of a single point of failure and remains susceptible to man-in-the-middle attacks [,]. Moreover, as an increasing number of local devices simultaneously transmit model parameters to the central server, it places significant pressure on server bandwidth and scalability, thereby elevating the risk of network congestion [,,]. Besides, although FL avoids direct sharing of raw data, its training process may still inadvertently expose sensitive information through model parameters. For instance, via gradient inversion attacks [], adversaries can reconstruct sensitive training data from shared model parameters. The risk is further heightened by the presence of malicious participants, who may compromise the integrity of FL by uploading falsified or adversarial local training updates, such as poisoning attacks []. These attacks degrade the reliability of the global model, reducing its accuracy and overall utility. Additionally, the lack of a robust incentive mechanism poses a practical barrier to adoption. FL implicitly assumes that all participants willingly contribute data and computational resources without direct compensation. However, this assumption is difficult to uphold in real-world scenarios. The absence of a fair and transparent incentive mechanism may diminish participants’ motivation to contribute high-quality updates, ultimately degrading system performance [-]. Moreover, some participants may receive rewards without actively contributing data, leading to unfair financial compensation.
When FL is applied to the medical domain, heterogeneity across systems, data, and distributions presents a fundamental challenge to model reliability and practical deployment. System heterogeneity arises from disparities in computing power, memory, network stability, and energy availability among participating devices []. This is particularly critical in resource-constrained environments, where limited hardware capabilities and intermittent connectivity can hinder local training and delay model updates. Data heterogeneity further complicates collaboration, as medical institutions often store data in different formats and standards, leading to integration difficulties [,]. Variations in data quality—such as incomplete records or inconsistent annotations—and significant differences in dataset sizes across institutions can distort the learning process and reduce model performance. Most critically, distribution heterogeneity, or the non-independent and identically distributed (non-IID) data problem, undermines generalizability. Institutions serve distinct patient populations, resulting in divergent feature and label distributions; for example, a model trained predominantly on “healthy” samples may struggle with accurate predictions when exposed to datasets with a higher prevalence of “diseased” samples, reducing its generalizability.
These advantages and limitations together underscore the urgent need for complementary technologies to enhance trust, security, and coordination in FL workflows. Blockchain, with its decentralized, tamper-resistant, and auditable infrastructure, offers a promising solution to many of these pain points. In the following section, we explore how blockchain can be integrated with FL to build more robust, transparent, and privacy-preserving frameworks for medical applications.
Blockchain in Health Care
Introduced in 2008 as the foundational technology behind the Bitcoin system, it was originally designed to solve trust-related issues in digital currency transactions []. Essentially, it functions as a distributed digital ledger, where transaction records are maintained and shared across all participants through a peer-to-peer network. Unlike traditional centralized systems, blockchain eliminates reliance on a trusted third party, ensuring decentralized data storage [].
The implementation of blockchain technology relies on multiple technical layers and core components. To fully understand the principles of blockchain, it is essential to analyze its specific components. The fundamental unit of blockchain is the “block,” comprising two primary components: the block header and the block body, which is shown in . The block header contains metadata, including the hash value of the preceding block (which links the current block to its predecessor), timestamps, random numbers, and block version numbers. This metadata establishes the linkage between blocks. The block body holds smart contracts and actual data, including transaction records. These blocks are cryptographically linked in chronological order, forming an immutable chain structure. Each block is closely linked to its predecessor via a hash value, ensuring that any tampering with a single block will alter the hash values of all subsequent blocks. This change is detectable and will be rejected by the entire network, thereby preserving data authenticity and integrity.
Figure 3. A schematic diagram of the blockchain structure.
Furthermore, to fully understand the potential of blockchain in such applications, it is essential to examine its underlying architecture and systematically analyze its functional layers (shown in ).
Figure 4. The system architecture of blockchain. P2P: Peer to Peer; PBFT: Practical Byzantine Fault Tolerance; PoQ: Proof of Quality; PoS: Proof of Stake; PoW: Proof of Work; SCM: Supply Chain Management.
Fundamentally, blockchain systems are structured into 6 tightly interwoven layers, each addressing specific functional needs. At the core lies the data layer, which serves as the foundation for storing and organizing transaction records. The data layer organizes and secures transaction records through cryptographic techniques. This layer not only guarantees data integrity and immutability but also significantly reduces storage overhead, which is critical for handling the growing volume of medical data []. Above this, the network layer enables efficient and reliable information exchange across decentralized nodes via peer-to-peer communication protocols []. In the context of high-stakes environments like health care, securing consensus across distributed and often mutually untrusting participants is the responsibility of the consensus layer. By establishing agreement on the validity and chronological order of transactions, consensus mechanisms prevent malicious interference and ensure the reliability of shared records [,]. The widely adopted algorithms include Proof of Work (PoW), Proof of Stake (PoS), and Practical Byzantine Fault Tolerance (PBFT) [,,]. To sustain long-term network participation, the incentive layer introduces economic mechanisms that reward nodes for validating transactions and generating new blocks. This is especially important in medical applications, where incentivized behavior facilitates the uploading of more high-quality medical data, while the authenticity and availability of data directly affects patient outcomes. Building upon these foundations, the contract layer operationalizes complex protocols through smart contracts, which autonomously execute predefined rules and transactions without the need for third-party intermediaries []. In the health care sector, this translates into streamlined workflows for processes such as insurance claims, consent management, and secure data access, offering enhanced transparency, reduced administrative overhead, and minimized human error. Finally, the application layer serves as the interface between users and the blockchain, bridging technical functionality with real-world applications. Across industries, this layer has fueled innovations in financial services, supply chain management, and beyond [,]. In health care, it enables secure sharing of EHRs, protecting patient privacy, and supporting collaborative research efforts across institutions []. By facilitating trusted interactions in environments where data sensitivity and security are paramount, blockchain’s multilayered architecture offers an indispensable foundation for integrating advanced paradigms such as FL, thereby unlocking new possibilities for privacy-preserving, distributed medical intelligence.
Through the analysis of the structure and underlying architecture of blockchain, we can better understand that blockchain has unique characteristics that distinguish it from traditional systems, including decentralization, immutability, transparency, traceability, security, anonymity, and high availability [,].
These characteristics are particularly valuable in the field of health care. For instance, decentralization eliminates the inherent single point of failure in centralized medical record systems, thereby enhancing the system’s resilience and reducing its vulnerability to cyber attacks. This feature is particularly suitable for scenarios with high trust requirements, such as medical record management and supply chain supervision. Immutability and traceability not only ensure that clinical data, diagnostic results, and patient consent records cannot be altered or deleted, guaranteeing the auditing of medical data, but also are crucial for tracking the source of drugs and ensuring the integrity of the supply chain, thereby reducing the risk of counterfeit drugs. Furthermore, blockchain offers a transparent and privacy-protecting environment that enables authorized clinicians, researchers, and insurance companies to verify data sources without exposing sensitive patient information, thereby enhancing accountability in medical research and drug development. The security of blockchain stems from smart contracts, consensus mechanisms, etc, maintaining transaction integrity and preventing malicious nodes from arbitrarily altering the block addition process [,].
As illustrated in , a comparison of the technical characteristics of blockchain and FL is presented. Simultaneously, given these characteristics, blockchain facilitates trust establishment among untrusted participants in wireless networks [,]. Consequently, blockchain demonstrates great potential across various domains, including cryptocurrencies, health care, and the Internet of Things (IoT) [,,]. In health care, blockchain technology enables the storage and verification of IoT data within patients’ electronic medical records (EMRs), clinical trials, and sensors, granting patients control over their own medical data [-]. During AI-driven sample learning, medical data from various institutions—including x-rays, CT scans, MRI reports, and pathological examinations—are securely stored on the blockchain. Predefined entity and event annotation platforms facilitate data labeling within each institution, followed by model training on in-hospital servers [,]. Kordestain et al [] proposed HapiChain, a telemedicine platform built on a patient-centered blockchain infrastructure, ensuring the security of remote consultations between patients and doctors. In addition, decentralized blockchain solutions, such as Drug-ledger and Med-ledger, have been proposed to enhance traceability and security in the pharmaceutical supply chain [,].
Given blockchain’s significant advantages in decentralization, privacy protection, data immutability, incentives, and automation, its attributes fit well with FL’s requirements for secure data sharing and distributed modeling. Blockchain can serve as a secure and reliable collaborative infrastructure for FL, addressing challenges such as trust deficiency, data integrity, and transparency [,,]. Conversely, the decentralized data processing mechanism of FL can compensate for blockchain’s limitations in scalability and computational efficiency. Therefore, integrating blockchain with FL has the potential not only to address their respective challenges but also to unlock new application scenarios and possibilities. The following section examines the necessity and feasibility of integrating blockchain with FL.
Integration of Blockchain and Federated Learning in Health Care
Overview
BCFL has emerged as a promising solution in health care, fostering a balance between privacy protection and data collaboration while unlocking new opportunities for data-driven health care innovation [,]. The following section analyzes the complementary strengths and performance of integrating blockchain with FL, illustrating how blockchain mitigates the limitations of FL (as illustrated in ) and how FL benefits blockchain.
Blockchain Empowers Federated Learning in Health Care
Decentralization Mitigates Single Points of Failure and Scalability Bottlenecks
The traditional architecture of FL primarily depends on a central server to manage and coordinate participants. This architecture is susceptible to single points of failure and can also result in bandwidth and computational resource bottlenecks as the number of clients increases [,,]. In contrast, blockchain’s decentralized architecture eliminates reliance on a central server by leveraging a distributed network, allowing automated data collaboration across multiple nodes and effectively mitigating the risk of single points of failure []. In this context, temporary aggregators are selected based on blockchain’s consensus mechanisms (eg, PoW or PoS) [,]. Moreover, blockchain’s Byzantine fault tolerance, which enables the dynamic management of unreliable nodes through consensus mechanisms, further enhances the stability of FL in large-scale distributed environments [].
Incentives to Enhance Participant Motivation
Traditional FL systems often lack effective incentives, especially in environments where resources are unevenly distributed. High-performing participants may struggle to sustain long-term contributions due to the absence of direct benefits [,]. Blockchain’s built-in incentive mechanisms address this challenge by rewarding data contributors, validators, and maintainers through tokens or other financial instruments, introducing an economic driver for FL []. This incentive model encourages the contribution of high-quality data while discouraging unreliable participation, ultimately improving the overall performance and stability of FL models. For instance, Weng et al [] proposed an incentive mechanism designed to promote collaboration in training deep learning models. The mechanism introduces two key concepts: compatibility and activity. Compatibility ensures that each participant receives optimized rewards based on their contribution, while activity incentivizes participants to update the local model and aggregate the global model actively. Upon each global model update, rewards are distributed to local devices and miners based on individual contributions. Similarly, Kang et al [] introduced a reputation-based incentive model to measure client trustworthiness. By leveraging blockchain’s immutability, the system ensures distributed reputation management and evaluates participants based on model quality and computational contributions.
Privacy Protection and Attack Resistance Enhancement
Although FL protects raw data privacy, it remains vulnerable to adversarial threats such as poisoning attacks and Byzantine attacks, which can mislead model training and hinder convergence [,]. Blockchain strengthens FL security through its immutability and tamper-proof nature. Its authentication mechanisms detect and exclude malicious nodes, ensuring that only authorized participants can access FL data, thereby enhancing privacy protection. Moreover, blockchain’s cryptographic techniques and anonymity mechanisms reduce the risks of background knowledge attacks and collusion attacks []. Additionally, its consensus mechanism ensures data consistency across all nodes while using sophisticated algorithms to prevent malicious nodes from compromising the network [,]. For example, in medical data collaboration, blockchain records the training processes and contributions of each participating hospital, ensuring both data integrity and privacy while mitigating risks of data contamination and adversarial attacks. Shayan et al [] proposed a multi-Krum consensus mechanism to counter poisoning attacks by electing a validation peer committee that filters out malicious model updates. Similarly, Chen et al [] used a blockchain-based validation voting mechanism, where nodes vote on model update validity and remove malicious devices based on consensus results.
Transparency and Auditability
Blockchain’s transparency and auditability effectively address the challenges of trust deficits and compliance difficulties in FL. By leveraging blockchain’s transparent data-sharing mechanisms, FL participants can verify the source and integrity of health care data and model updates in real time, ensuring fairness and reliability in contributions []. Additionally, blockchain’s immutable records establish an accountability framework for FL [,]. These records facilitate anomaly detection and responsibility attribution throughout the model training process, strengthening compliance and governance mechanisms. This transparent and auditable nature not only fosters trust in collaborative learning but also provides a technological foundation for regulatory compliance.
Automated Management With Smart Contracts
Smart contracts enable automated execution of key processes in FL, including model update sharing, model update validation, and global model aggregation. By enforcing predefined rules, smart contracts eliminate human intervention, ensuring an unbiased and tamper-proof process. Moreover, they can dynamically allocate resources and rewards through conditional triggering mechanisms—such as when a model reaches an expected accuracy or when a node successfully completes a specific task [,]. This automation enhances the autonomy and reliability of FL while ensuring transparency and fairness through open code logic. By reducing administrative overhead and mitigating trust concerns, smart contracts introduce a novel and efficient approach for managing decentralized collaborative learning [].
How Federated Learning Can Benefit Blockchain
Enhancing Blockchain Consensus Efficiency
The blockchain consensus mechanism, while ensuring network security and data consistency, is often associated with substantial computational costs and energy consumption, a challenge that is particularly serious in the PoW mechanism [,]. PoW relies on miners solving complex hashing operations to compete for block generation, necessitating the continuous operation of high-performance computing hardware, which in turn results in substantial global energy consumption. Studies indicate that the annual energy consumption of the Bitcoin network is comparable to that of a small- to medium-sized country. This highly inefficient competition leads to an enormous waste of computational resources—only the first miner to discover a valid hash can package the transaction and claim the reward, rendering all other computational efforts futile. Furthermore, the excessive energy consumption of PoW constrains blockchain’s sustainable adoption in critical domains such as health care data management and edge computing, necessitating the development of more energy-efficient consensus optimization strategies. In this context, FL offers a novel approach to optimizing blockchain consensus mechanisms. Integrating the blockchain consensus process with FL allows miners to contribute to model training while competing for block validation, effectively repurposing computational resources that would otherwise be wasted. This approach not only reduces energy consumption but also enhances computational resource efficiency, rendering the consensus process more practically valuable [].
Facilitating Cross-Chain Data Collaboration
As blockchain applications continue to expand, the demand for data exchange across different blockchains has grown, making cross-chain technology a crucial solution to addressing “data silos” across various domains. FL and its variants, such as federated transfer learning, can establish a unified model collaboration framework across different blockchain networks, enabling privacy-preserving data sharing and joint modeling. By maintaining a shared ML model, disparate blockchains can collaborate while preserving autonomy and privacy, thereby facilitating cross-chain applications in finance, health care, and other sectors [].
Enhancing Blockchain Scalability
Blockchain faces storage and computational bottlenecks when handling large-scale data. FL, by adopting a local training model that eliminates the need to upload raw data to the blockchain, significantly reduces on-chain storage demands. Additionally, FL alleviates blockchain’s computational burden by distributing processing tasks among participating nodes, thereby providing a scalable foundation for large-scale collaboration.
Architectural Frameworks for Integrating Blockchain and Federated Learning
A BCFL typically adopts one of three architectural paradigms: fully coupled, flexibly coupled, and loosely coupled architectures [,,]. These architectures differ in terms of the degree of coupling between blockchain nodes and FL clients, each offering unique characteristics in function allocation, resource usage, and system structure.
Fully Coupled BCFL
The fully coupled architecture represents a highly integrated design, wherein FL clients simultaneously function as blockchain nodes, assuming dual roles. Consequently, each node is responsible for local model training, update validation, global model aggregation, and new block generation. These tasks are executed on a single node, fostering a fully decentralized collaborative model [,,,].
Global model aggregation can be carried out either by selected nodes or collaboratively by all nodes, depending on the network’s design strategy. Moreover, the blockchain’s distributed ledger not only records local model updates but also stores global models and other relevant information generated during training, ensuring data integrity and traceability.
Flexibly Coupled BCFL
The flexibly coupled architecture achieves higher design flexibility by separating FL clients from blockchain nodes. In this architecture, FL clients primarily handle local data collection and model training, whereas blockchain nodes are responsible for validating model updates, storing the global model, and maintaining the ledger [,,]. The blockchain can aggregate global models via selected nodes, which typically possess superior computing resources and reliability, thereby reducing resource consumption and enhancing system efficiency. Alternatively, aggregation can be performed collaboratively by all nodes, achieving full decentralization and mitigating the risk of a single point of failure.
This architecture significantly lowers the resource requirements for FL clients, allowing them to function in different network environments while preserving blockchain’s inherent advantages in data security and transparency. Due to its high adaptability, this architecture has become a preferred choice for large-scale distributed systems, such as health care data sharing and cross-organizational collaboration.
Loosely Coupled BCFL
The loosely coupled architecture further weakens the coupling between blockchain nodes and FL clients by optimizing functional allocation. FL clients primarily perform local model training and upload updates to the blockchain for validation, whereas the blockchain handles authentication, model update validation, and participant reputation management.
In this architecture, the blockchain does not store model updates but instead records only reputation-related data. A reputation mechanism is implemented as a key criterion for assessing participant reliability, thereby incentivizing them to contribute high-quality data and updates [,,]. This design enhances system scalability by alleviating storage pressure on the ledger while ensuring the trustworthiness of participant behavior.
Workflow in BCFL
Overview
In BCFL systems, the flexibly coupled architecture has emerged as the predominant choice for real-world applications due to its optimal balance of efficiency and adaptability. By separating FL clients from blockchain nodes, this architecture allows them to operate on different networks and devices, thereby reducing system communication overhead and latency. Additionally, it alleviates the computational and storage burden on client devices while preserving key advantages such as data privacy protection and blockchain-based verification, ultimately achieving an optimal balance between efficiency and privacy. Leveraging these advantages, the flexibly coupled architecture has demonstrated significant potential in practical applications, including medical data sharing and cross-organizational collaboration.
As illustrated in , the following section focuses on the specific workflows of mainstream BCFL frameworks, analyzing their distinct advantages in practical applications.
Figure 5. Flexibly coupled blockchain-based federated learning architecture and workflow. gRPC API: Google Remote Procedure Calls – Application Programming Interface; REST-API: Representational State Transfer – Application Programming Interface.
Task Release
Task initiators release FL tasks and requirements on the blockchain, specifying details such as data volume and type, hardware specifications, and the number of training rounds. Leveraging blockchain’s transparency and decentralization, this process ensures fair and open task distribution while fostering participant trust.
Local Model Training and Update Transmission
Each FL client downloads the initial global model from the blockchain, after which it preprocesses local data, extracts features, and uses this data for model training, subsequently generating local model updates. These updates are then transmitted to the blockchain network in encrypted form.
It is important to note that, in the flexibly coupled architecture, FL clients and blockchain nodes operate within different networks and systems, each with clearly defined responsibilities. Therefore, this architecture heavily relies on integrated middleware, which serves as a communication bridge and coordinator between the two components. In a research study, Lamken et al [] used REST-API (Representational State Transfer – Application Programming Interface) for communication with the Hyperledger Fabric blockchain, facilitating the recording and incentivization of gradient uploads. Additionally, the Remote Procedure Calls (RPC) protocol developed by Google, known as the gRPC API, facilitates data exchange between FL clients and the Ethereum blockchain network [,].
Blockchain Node Verification Update
Blockchain nodes (ie, miners) verify the uploaded model updates using a predefined validation mechanism. Concurrently, miners exchange their validated local model updates with each other. A consensus algorithm guarantees that only validated updates contribute to the global model aggregation.
Global Model Aggregation
Subsequently, the blockchain selects an interim leader among its nodes through a consensus mechanism. The selected node(s) then collect verified model updates and aggregate them to construct the global model []. The flexibly coupled architecture enables this process to be executed by selected nodes or collectively by all nodes, thereby opening up the possibility of optimizing efficiency across various scenarios.
New Block Generation and Model Storage
Validated model updates and global models are packaged by selected blockchain nodes to generate new blocks. Upon adding the block header information, the legitimacy of the block is verified through a consensus mechanism among the nodes.
Distributed Ledger Update
The newly generated blocks are broadcast across the entire network, and all blockchain nodes update their local ledgers after verification. This process ensures the transparency and traceability of the global model and its associated information throughout the network.
Reward Distribution and Incentives
The system allocates rewards, such as cryptocurrency or reputation scores, based on client performance. This incentive mechanism not only motivates participants to contribute high-quality updates but also deters malicious behavior, thereby enhancing the accuracy and reliability of the model.
Global Model Download
After the training is completed, all participating clients can download a newly generated block containing the updated global model parameters from the blockchain. Clients can then independently decide whether to participate in the next training round based on their specific needs. This mechanism enhances both system flexibility and participant autonomy.
As illustrated in , a comparison of BCFL integration architectures is presented. In future practical applications, the selection of a specific architecture must be carefully evaluated based on scenario requirements, resource constraints, and design objectives to achieve optimal collaboration and technical performance.
Table 1. Comparison of blockchain-based federated learning integration architectures.
Architecture type
Characteristics
Advantages
Disadvantages
Applicable scenarios
Fully coupled BCFL
High integration: FL clients and blockchain nodes are fully merged
Fully decentralized: All nodes work together through a consensus mechanism
High transparency: All transactions and model updates are recorded on the blockchain
Strong security: Resistant to single-point failures and man-in-the-middle attacks
Strict control: Highly controlled over data and models
High resource demand: Requires significant computational and storage resources
High network complexity: All nodes participate in the consensus mechanism, and the network complexity is high
Intensive coordination: Frequent internode communication is required
Large-scale distributed environments: Suitable for large medical institutions and research centers
Strict control and security requirements: Scenarios involving the sharing and analysis of sensitive medical data [,]
Flexibly coupled BCFL
Functional separation: FL clients operate independently from blockchain nodes
Computational offloading: Model aggregation occurs at selected nodes
Enhanced efficiency: Optimized allocation of computing and storage resources
Greater flexibility: Can be adapted to different application scenarios
Improved scalability: Supports large-scale data sharing and collaborative learning
Complex coordination: The responsibilities of clients and nodes are separated, and complex coordination and management mechanisms are required
Centralization risks: Use of a centralized aggregator may introduce a single point of failure
Node selection challenges: Issues such as node selection criteria and fairness are involved
Dynamic collaboration settings: Suitable for cross-institutional medical data sharing, IoMT device management [,]
Loosely coupled BCFL
Minimal integration: FL clients and blockchain nodes operate independently
Lightweight blockchain: Primarily used for identity authentication and reputation management
Reduced overhead: Reduce the operating cost of blockchain and improve system performance
Enhanced privacy: Reduce on-chain storage pressure and improve scalability
Lower decentralization: May still rely on trusted central nodes for model aggregation
Data integrity risks: Blockchain does not store model updates
Resource-constrained environments: Suitable for wearable medical devices and real-time health monitoring
Small-scale institutions: Ideal for personal mobile health applications and smaller clinics []
aBCFL: blockchain-based federated learning.
bFL: federated learning.
cIoMT: Internet of Medical Things.
BCFL in Medicine
Overview
As the demand for data-driven technologies in health care continues to grow, the BCFL framework presents significant potential due to its advantages in privacy preservation, data security, and collaborative efficiency. BCFL facilitates cross-organizational data sharing and collaborative analytics, optimizing personalized health care solutions while driving advancements in areas such as telemedicine, IoMT, and public health monitoring, which are shown in . In the following section, we will discuss the various applications of BCFL in the medical field and analyze its key role and potential value in addressing real-world challenges.
Figure 6. Blockchain-based federated learning framework for different domains in health care. EMR: electronic medical record.
Cross-Institutional Medical Data Sharing and Collaborative Analysis
In modern health care, data serves as a crucial resource for driving innovation and enhancing treatment efficacy. However, data sharing among health care institutions is hindered by concerns over privacy, data security, and regulatory compliance. The integration of blockchain and FL offers an innovative solution for cross-institutional health care data sharing and collaborative analysis. While numerous studies have demonstrated the feasibility of BCFL in various medical domains, the strength of evidence supporting these applications varies considerably, and critical challenges remain.
Several studies have investigated BCFL in the context of chronic disease management, particularly diabetes prediction. Hasan et al [] developed a blockchain-FL framework that reported a 15% improvement in predictive performance across multiple metrics. Although these results are encouraging, the framework relied primarily on public diabetes datasets with limited diversity, raising questions about its generalizability to heterogeneous real-world populations. Similarly, Moulahi et al [] evaluated a BCFL model on the Pima Indians Diabetes dataset, achieving a multilayer perceptron accuracy rate of 97.11% and an average FL accuracy rate of 93.95% while protecting privacy. Yet, the reliance on small, well-characterized datasets constrains the robustness of the findings. Taken together, these studies suggest that BCFL holds promise for chronic disease prediction, but the supporting evidence remains preliminary, and large-scale multi-institutional validation is still lacking.
In the realm of the IoMT, Ramani et al [] introduced the ODMSM-FL (Optimized Data Management and Secured Federated Learning) approach to address secure data storage and exchange using EHR datasets from HealthData.gov. This research report presents a set of numerical results on key performance indicators: transaction throughput (102.75 Kbps), data retrieval delay (64.02 ms), security (88.97%), and accuracy (86.32%). The research results highlight the great potential of ODMSM-FL in effectively addressing the urgent data management and security issues in IoMT. However, the system was evaluated under controlled experimental conditions rather than real-world clinical settings, limiting its immediate applicability. By contrast, research in medical imaging tasks, such as brain tumor segmentation, has placed greater emphasis on model accuracy and privacy preservation. For example, Kumar et al [] proposed a permissioned blockchain-based federated framework with quality-aware model aggregation, achieving improved segmentation metrics on the BraTS 2020 dataset. Specifically, compared with the baseline method, our approach increased the Dice similarity coefficient of enhanced tumors by 1.99% and reduced the Hausdorff distance of the overall tumor by 19.08%. Although the study demonstrated methodological innovation, it still relied on benchmark imaging datasets rather than prospective clinical data, which restricts the strength of evidence regarding its clinical translatability.
Other investigations have targeted specific diagnostic applications. Heidari et al [] designed the FBCLC-Rad (Federated Learning–Enabled Blockchain CapsNets Lung Cancer Radiologist) framework for lung cancer detection, achieving near-perfect accuracy on nodule identification tasks. This technology achieved an accuracy rate of 99.69% with the lowest classification error. While technically impressive, results derived from controlled datasets may not fully reflect the complexity of real-world diagnostic workflows. Liang et al [] extended BCFL applications to clinical trials, where blockchain ensured data authenticity and traceability, and FL supported participant screening across organizations. This study represents an important step toward integrating BCFL into the clinical research pipeline, but it remains largely conceptual, with limited empirical validation in actual trial environments.
A growing body of work has also highlighted the integration of BCFL with EMRs to facilitate precision medicine [,-]. This approach has demonstrated significant effectiveness in enhancing diagnostic accuracy, optimizing treatment planning, identifying patient subgroups for clinical trials, and accelerating the development of novel therapeutics. Within the paradigm of precision medicine, such a framework facilitates a transition from the traditional “one-size-fits-all” treatment model to a more personalized and adaptive intervention strategy. However, despite their conceptual appeal, most studies are limited to prototype frameworks or simulations and have not yet undergone prospective evaluation in clinical practice. As such, the current evidence supporting BCFL in EMR-based precision medicine remains promising but immature.
Overall, existing literature demonstrates the conceptual feasibility and technical potential of BCFL for cross-institutional health care data sharing and analysis. Nevertheless, the evidence base is uneven: studies using small public datasets provide only preliminary support, while those addressing more complex tasks such as imaging or clinical trials often lack real-world validation.
Internet of Medical Things
The IoMT is an advanced technological ecosystem that integrates internet technology with medical devices, enabling real-time data collection, exchange, and analysis to enhance clinical decision-making, disease prevention, and patient care. With the rapid advancement of the IoMT, the traditional hospital-centric model has evolved into a patient-centered health care system driven by comprehensive clinical analysis. The widespread adoption of IoMT enables individuals to conveniently monitor their health at home, thereby streamlining diagnosis and treatment while allowing patients to enjoy more efficient, personalized health care. However, despite its growing adoption, IoMT in health care is also facing significant challenges. One of the primary concerns is data privacy and security. IoMT devices collect and transmit vast amounts of sensitive data, including patient identities, insurance details, and payment information. Once these data are accessed by malicious individuals, it could lead to serious consequences. Moreover, the absence of standardized security protocols among IoMT devices exacerbates the risks of data leakage and device manipulation. Consequently, device manufacturers and health care institutions face immense pressure to ensure data privacy and regulatory compliance. Additionally, IoMT devices frequently encounter challenges such as high computational complexity, elevated costs, and communication delays due to resource limitations.
To address these limitations, several studies have explored the integration of blockchain and FL in IoMT. For instance, Rahman et al [] proposed a lightweight hybrid FL framework that leverages blockchain to secure health data provenance and uses smart contracts to coordinate model training and trust management. While this framework demonstrates theoretical scalability and robust traceability, its evaluation was primarily conducted in simulated settings, raising concerns about its applicability in heterogeneous and large-scale real-world health care environments.
Muazu et al [] combined BCFL with edge computing to improve resource allocation, reduce computational costs, and enhance IoMT data security. By offloading intensive computations to edge nodes, the study reported reductions in latency and energy consumption. Meanwhile, the performance of the proposed model offers a higher precision of 83% and an accuracy rate of 78%. However, the framework largely relies on linear regression as the global learning model, which—although interpretable and useful for basic clinical predictions—may not adequately capture the complexity of real-world multimodal medical data. Compared with Rahman et al [], this work provides stronger performance evidence in terms of latency and efficiency but weaker generalizability for complex clinical prediction tasks.
Dhasaratha et al [] extended BCFL by incorporating reinforcement learning and distributed computing to improve risk factor monitoring and COVID-19 patient prediction. The dynamic optimization enabled by reinforcement learning is a notable strength, allowing adaptive performance improvements in evolving environments. Compared with the previous two documents, this approach offers methodological novelty but lacks equally rigorous performance benchmarking across standard IoMT metrics.
To tackle fraud detection and scheduling issues, Lakhan et al [] introduced the FL-BETS (Federated Learning–Based Blockchain-Enabled Task Scheduling) framework, which integrates BCFL with dynamic heuristics for task scheduling across fog and cloud nodes. The experimental results show that the framework, from the initial 60:80 fraud delay ratio to a 10:10 ratio, demonstrates better performance in energy-delay trade-offs and antifraud behavior. Yet, the framework emphasizes technical efficiency rather than clinical utility, and its reliance on hard and soft scheduling constraints may limit adaptability in unpredictable medical environments. Compared with Dhasaratha et al [], which focuses on patient-specific outcomes, this work provides stronger technical validation but weaker clinical alignment.
In the context of wearable IoMT devices, Baucas et al [] designed a system that integrates FL with a private blockchain in a fog computing architecture to enhance privacy and adaptability. Their framework demonstrated efficiency in resource-constrained environments and produced accurate predictive models while safeguarding patient privacy. Unlike the literature above, this study directly validated its framework on wearable health care devices, thereby providing more immediate clinical relevance. However, the scalability of the approach for larger IoMT networks remains uncertain.
Overall, these studies collectively highlight the potential of BCFL in overcoming IoMT’s inherent privacy, security, and performance limitations. Yet, their evidence strength varies significantly: some emphasize theoretical frameworks validated in simulations [], while others demonstrate more robust experimental performance [,] or closer alignment with clinical practice [].
Public Health Surveillance and Epidemiological Forecasting
The global outbreak of COVID-19 highlighted the limitations of existing surveillance infrastructures, particularly the inability to provide accurate, real-time epidemic monitoring. Traditional methods often struggle with the rapid spread and variability of epidemics, while stringent privacy requirements hinder effective collaboration across institutions. For instance, during global outbreaks such as COVID-19, the inability of national and regional health care organizations to efficiently integrate data has impeded comprehensive analyses of epidemic progression []. This phenomenon of data silos delays the formulation of precise response strategies and undermines the efficiency of vaccine distribution and health care resource allocation. Therefore, achieving efficient and secure data integration while preserving privacy has emerged as a critical challenge in public health. The BCFL framework not only integrates anonymized health data from diverse regions but also facilitates the efficient construction of predictive models for epidemic spread. This framework enables multiple health care organizations and research institutions to collaborate securely without compromising patient privacy, thereby providing robust data support for early epidemic detection, transmission trend analysis, and the formulation of intervention strategies. However, the strength of evidence supporting BCFL frameworks varies considerably across different studies, depending on data scale, validation methods, and implementation feasibility.
FedMedChain [] represents an early attempt to address these challenges. By combining blockchain with FL and leveraging the IoMT, it enhances the trustworthiness of public health communication and mitigates risks associated with centralized data transmission. Its contribution lies in demonstrating that blockchain can ensure data transparency and tamper resistance while maintaining privacy. Nevertheless, FedMedChain was mainly verified through small-scale simulation experiments rather than real-world deployments, which limited the strength of the evidence and its direct clinical applicability.
In contrast, Kumar et al [] explored a BCFL framework for processing heterogeneous CT images using capsule networks. This method achieves a high detection accuracy on the CC-19 dataset, and its research results include 98.68% specificity and 98% sensitivity. While the study demonstrates the feasibility of applying BCFL to medical imaging and highlights the benefits of privacy-preserving collaboration, the restricted dataset size and limited institutional diversity weaken its external validity. Compared with FedMedChain, this workplaces greater emphasis on model performance but provides weaker evidence regarding scalability and generalizability.
Durga and Poovammal [] extended this direction by proposing the FLED-Block (Federated Learning–Ensembled Deep Learning Blockchain Model) framework, which integrates blockchain with FL for COVID-19 prediction using multisource heterogeneous CT datasets. This framework improves the classification accuracy by using capsule networks for feature extraction and extreme learning machines for efficient classification. The research results include an accuracy of 98.2%, a precision of 97.3%, and a recall rate of 96.5%. Importantly, it integrates blockchain to share model weights without the need to exchange raw data, thereby resolving privacy issues. Compared with Kumar’s study [], FLED-Block was supported by evidence from more hospitals, providing stronger validation. However, the latency of blockchain is regarded as a limitation, raising questions about its applicability in real-time clinical diagnosis. Therefore, although this framework demonstrates outstanding technical performance, its transformation in emergency health care settings remains uncertain.
Abdel-Basset et al [] proposed the blockchain-based federated learning for pandemic diagnosis (BFLPD) framework, which stands out for its focus on system security and robustness in the context of smart cities. Unlike FedMedChain and FLED-Block, BFLPD combines more encryption technologies, including secure aggregation, homomorphic encryption (Cheon-Kim-Kim-Song scheme), and consensus mechanisms (PBFT), to mitigate malicious attacks and improve reliability. The classification accuracy of BFLPD reaches 95.14%, exceeding the benchmark set by the most advanced distributed models. In addition, this framework also demonstrated significant precision and recall rates (95.26% and 95.77%, respectively) and a relatively high F1-score (95.52%). Meanwhile, the authors incorporated heat map visualization, further enhancing its clinical application value. This framework provides stronger evidence than earlier works, as it addresses adversarial threats that are often overlooked in BCFL studies. Nevertheless, its reliance on complex cryptographic and consensus algorithms introduces implementation challenges, such as high computational overhead, which could hinder real-world deployment.
Telemedicine and Telesurgery
In recent years, telemedicine has experienced rapid advancements, especially in response to the global COVID-19 pandemic, which has significantly increased its role in modern health care systems. Telemedicine leverages modern information and communication technologies to facilitate medical information exchange across geographic boundaries, encompassing various applications such as remote diagnosis, remote consultation, remote treatment, and continuous health monitoring []. By providing on-demand, personalized health care services, telemedicine optimizes medical resource allocation, effectively addressing the challenge of unequal distribution of traditional health care resources and ensuring medical support for patients in remote or underserved areas. Despite its potential, telemedicine faces several critical challenges in practical implementation:
Data Security and Privacy Risks: Most telemedicine systems rely on centralized cloud servers to store patient health data, making them vulnerable to single points of failure.
Lack of Data Access Control Mechanisms: Many existing telemedicine platforms do not offer a robust data access control framework, meaning that once patient data are uploaded to the cloud, patients often lose ownership and control over their own health records.
High Infrastructure and Computational Costs: Telemedicine demands substantial computational resources, high-speed communication networks, and specialized medical equipment, particularly for real-time diagnosis and treatment.
Building upon these advantages, recent studies have proposed generalized frameworks that integrate blockchain and FL to support secure and scalable telemedicine systems. For example, Hiwale et al [] highlighted the importance of incorporating privacy-preserving technologies into BCFL, laying the groundwork for reliable, privacy-compliant telemedicine applications. Within such frameworks, blockchain’s distributed ledger technology enables decentralized data storage, reducing the risks of single points of failure and data breaches, while ensuring transparency and traceability in data access. Simultaneously, FL enhances data privacy by enabling local model training, thereby minimizing the exposure of sensitive health information. Although valuable as a theoretical framework, this study provides limited experimental validation and thus represents weak evidence for clinical applicability. Gupta et al [] enhanced trust between patients and providers by designing a smart contract system based on public blockchains. The framework allows patients to retain ownership and fine-grained control of health data, addressing a central shortcoming of conventional telemedicine platforms. Compared to Hiwale et al [] conceptual work, Gupta et al [] system offers a more concrete mechanism for authorization and data sharing. Nonetheless, its validation remains restricted to simulation environments, with no real-world deployment or clinical evaluation. As such, while it provides moderate evidence of feasibility, its generalizability remains uncertain.
With the rapid advancement of technology and the increasing improvement of medical demands, traditional telemedicine models are evolving beyond routine diagnosis and treatment. Among these advancements, telesurgery—a critical extension of telemedicine—is emerging as a transformative innovation. However, this technology imposes stringent requirements on real-time data synchronization, precise coordination of surgical equipment, and robust data security, introducing new challenges to the reliability of underlying technological infrastructures. For instance, Chaudjary et al [] proposed a secure telesurgery system that integrates blockchain and FL with 6G communication networks and the Interplanetary File System protocol. This study demonstrated notable improvements in latency reduction, storage efficiency, and transmission reliability compared with traditional telesurgery systems. Unlike earlier works by Hiwale et al [] and Gupta et al [], Chaudhary et al [] provided more systematic experimental results, suggesting stronger evidence of technical feasibility. However, these results were still derived from controlled simulations rather than real-world surgical environments, and issues such as blockchain latency and computational overhead remain unresolved. Therefore, while the study represents the strongest evidence among current works, its translation into clinical practice requires further validation.
In the past few years, BCFL’s research in the health care field has shown significant growth. Multiple studies have confirmed the improvement of model performance in public datasets or experimental environments, such as enhancing the accuracy of disease prediction, strengthening image diagnostic capabilities, or improving edge device management.
However, when examined from the perspective of evidence, most of these achievements are still at the stage of simulation experiments, prototype systems, or preclinical validation. Research is usually based on controlled datasets or static data scenarios, and there is a significant gap between the model performance and the actual clinical diagnosis and treatment process. Importantly, current literature pays more attention to technical indicators (such as accuracy, Dice, and delay) rather than specific medical endpoints, such as changes in misdiagnosis rates, shortened treatment duration, or improved patient prognosis. Therefore, a direct chain of evidence has not yet been formed between the technical performance of BCFL and its actual medical value.
Furthermore, the BCFL architecture is not inherently compatible with the real deployment environment. The medical system features a complex governance structure, compliance requirements, and heterogeneous infrastructure. However, existing research often assumes node autonomy, network stability, or institutional equivalence, while neglecting key issues such as data authorization, responsibility division, and system compatibility. Although blockchain consensus, smart contracts, and high-intensity encryption enhance security, they also bring about latency, energy consumption, and maintenance costs, which conflict with the real-time and reliability requirements of clinical practice. These mismatches have led to many frameworks performing well in experimental settings but being difficult to migrate to real medical scenarios.
Given the above limitations, relying solely on technical performance indicators cannot accurately reflect the maturity of BCFL in medical scenarios. To more systematically assess the application level of existing research, we adopted an evidence stratification strategy to categorize the existing literature from multiple dimensions such as architectural innovation, deployment environment, verification depth, clinical relevance, and potential risks. This stratification aims to reveal the gap between “technical performance” and “medical practice value,” identify the most critical bottlenecks in the process from conceptual framework to clinical validation, and provide directional references for future research design. summarizes the evidence stratification of typical BCFL studies in the health care field.
Table 2. Evidence stratification of blockchain-based federated learning studies in health care.
Reference
Application domain
BCFL architecture/contribution
Deployment environment
Validation depth
Clinical relevance/risk
Evidence level/maturity
[]
Cross-Institutional Medical Data Sharing (Chronic Disease)
Proposed a decentralized and privacy-preserving collaboration framework that integrates blockchain and FL, enhancing the predictive performance of diabetes models while ensuring data security and reducing communication overhead
Evaluation on public dataset (unspecified diabetes data)
Retrospective data validation
Population heterogeneity is not covered; it is difficult to extrapolate to real clinical patients
Level 3: Preclinical
[]
Cross-Institutional Medical Data Sharing (Chronic Diseases)
Developed a blockchain-integrated FL mechanism to enhance IoMT data privacy and improve diabetes prediction accuracy, achieving 97.11% accuracy with a multilayer perceptron model
Evaluation on public dataset (Pima Indians Diabetes)
Retrospective data validation
Relying on a small and well-defined dataset limits the robustness of the findings
Level 3: Preclinical
[]
Cross-Institutional Medical Data Sharing (Data Management)
Introduced the ODMSM-FL framework, which optimizes storage, management, and privacy protection for IoMT data, enhancing data security and system efficiency
Evaluation on public EHR dataset (HealthData.gov)
Controlled experimental conditions
Data latency, human-machine device heterogeneity; The real IoMT network is uncontrollable
Level 2: Prototype validation
[]
Cross-Institutional Medical Data Sharing (Medical Imaging)
Designed a blockchain-powered FL framework for brain tumor segmentation using 3D U-Net, achieving significant improvements in Dice similarity coefficient and Hausdorff distance
Evaluation on public benchmark dataset (BraTS 2020)
Retrospective data validation
Relying on benchmark datasets rather than prospective clinical data limits the clinical translational application
Level 2: Prototype validation
[]
Cross-Institutional Medical Data Sharing (Medical Imaging)
Proposed the FBCLC-Rad framework, integrating CapsNets, blockchain, and FL to enhance lung cancer nodule detection accuracy in CT scans, reaching 99.69% accuracy
Evaluation on public and local dataset (Cancer Imaging Archive [CIA], Kaggle Data Science Bowl [KDSB], LUNA 16, and local datasets)
Retrospective data validation
The process of not covering the real image; lack of doctor decision-making and workflow verification
Level 2: Prototype validation
[]
Cross-Institutional Medical Data Sharing (Drug Discovery)
Designed Rahasak-ML, a decentralized blockchain-FL platform enabling multi-institutional collaboration with enhanced transparency and security in drug discovery
Empirical verification in actual test environments is limited
Level 1: Conceptual
[]
Cross-Institutional Medical Data Sharing (EMR)
Integrated FL and blockchain for cloud-based medical record recommendation systems, leveraging Hyperledger Fabric, IPFS, LightGBM, and N-Gram models for collaborative learning
Evaluation on public EHR dataset (not specified)
Simulated
Limited to prototype frameworks or simulations, not prospectively evaluated in clinical practice
Level 2: Prototype validation
[]
Cross-Institutional Medical Data Sharing (EMR)
Proposed a blockchain-FL framework for EHR privacy protection, achieving 92.5% global model accuracy and 88.33% local model accuracy using a deep neural network
Evaluation on public EHR dataset (Chronic Kidney Disease [CKD] dataset [UCI Machine Learning Repository])
Retrospective data validation
Limited to prototype frameworks or simulations, not prospectively evaluated in clinical practice
Level 3: Preclinical
[]
Cross-Institutional Medical Data Sharing (EMR)
Used lightweight encryption and FL to secure EHR data in an Ethereum test environment, reducing reliance on trusted third parties
Evaluation on public EHR dataset (Simulation in Ethereum test environment)
Simulated
Limited to prototype frameworks or simulations, not prospectively evaluated in clinical practice
Level 2: Prototype validation
[]
Cross-Institutional Medical Data Sharing (EMR)
Combined CNN and blockchain-FL to enhance EHR data security and detect abnormal user behaviors automatically
Evaluation on public EHR dataset (Python-based simulation)
Simulated
Limited to prototype frameworks or simulations, not prospectively evaluated in clinical practice
Level 2: Prototype validation
[]
Cross-Institutional Medical Data Sharing (EMR)
Explored blockchain-FL applications in precision medicine, emphasizing diagnostic accuracy, treatment optimization, clinical trial subpopulation identification, and drug development acceleration
Evaluation on public EHR dataset (not specified)
Simulated
Limited to prototype frameworks or simulations, not prospectively evaluated in clinical practice
Level 2: Prototype validation
[]
IoMT (Data Security)
Proposed a lightweight hybrid FL framework with blockchain smart contracts for edge training plan management, trust evaluation, and authentication in IoMT networks
Evaluation on public COVID-19 dataset (not specified)
Simulated
The device has heavy computational burden, high system complexity, and difficult clinical translation
Level 1: Proof-of-concept
[]
IoMT (Data Management)
Developed a blockchain-FL system leveraging edge computing and Paillier encryption to securely manage medical resource transactions in IoMT environments
Evaluation on public dataset (unspecified diabetes data)
Retrospective data validation
The device has heavy computational burden, high system complexity, and difficult clinical translation
Level 2: Prototype validation
[]
IoMT (Data Security)
Introduced a distributed reinforcement learning method integrating blockchain and FL for improved data privacy and security in IoMT applications
Evaluation on public COVID-19 dataset (not specified)
Simulated
The device has heavy computational burden, high system complexity, and difficult clinical translation
Level 2: Prototype validation
[]
IoMT (Data Security)
Proposed the FL-BETS framework, leveraging fog computing and blockchain to minimize energy consumption and latency while enhancing fraud detection in health care
Evaluation on Private dataset focusing on medical insurance fraud (Kaggle)
Simulated
The device has heavy computational burden, high system complexity, and difficult clinical translation
Level 1: Proof-of-concept
[]
IoMT (Data Security)
Developed a fog computing IoT platform that integrates FL and private blockchain technology to enhance privacy protection in wearable IoMT devices
Evaluation on human activity recognition dataset (UCI Machine Learning Library)
Simulated
The device has heavy computational burden, high system complexity, and difficult clinical translation
Level 2: Prototype validation
[]
Public Health (COVID-19 Imaging)
Proposed a blockchain-FL-based IoMT architecture for COVID-19 detection and epidemic management; the architecture enhances data privacy through FL and ensures data transparency and immutability via blockchain
Evaluation on public COVID-19 dataset (Centers for Disease Control [CDC] data)
Simulated
High real-time requirements in epidemic environment; blockchain delay is not resolved
Level 1: Proof-of-concept
[]
Public Health (COVID-19 Imaging)
Developed a blockchain-based FL framework for COVID-19 detection, using Capsule Networks for image segmentation and classification to enhance data privacy and model accuracy
Evaluation on Private COVID-19 dataset (CC-19)
Retrospective data validation
The limited scale and types of data restrict the generalization ability of the model
Level 3: Preclinical
[]
Public Health (COVID-19 Imaging)
Introduced FLED-Block, a blockchain-based FL model integrating Capsule Networks for image feature extraction and extreme learning machines (ELM) for classification, achieving high accuracy with strong privacy protection
Evaluation on public COVID-19 dataset (CT data from multiple hospitals)
Retrospective data validation (multisource datasets)
The computational complexity and the feasibility of actual deployment require further research
Level 3: Preclinical
[]
Public Health (Pandemic Diagnosis)
Designed BFLPD, a blockchain-FL framework for epidemic diagnosis in smart cities, particularly for COVID-19; the framework ensures secure model aggregation and enhances global model integrity and efficiency
Evaluation on public ultrasound COVID-19 dataset (POCUS, ICLUS-DB, and COVIDx-US)
Retrospective data validation (multisource datasets)
The huge computational overhead and the high resistance to actual deployment
Level 3: Preclinical
[]
Telemedicine (Telemedicine System)
Proposed a blockchain-FL application framework for telemedicine, analyzing how these technologies improve data accessibility, security, and privacy in remote health care
Based on theoretical simulations or examples
Conceptual framework
Mainly focuses on the theoretical framework, the lack of a real-world application cases
Level 1: Proof-of-concept
[]
Telemedicine (Remote Surgery System)
An intelligent remote surgery framework named BITS, which is based on blockchain and artificial intelligence, is proposed; this architecture integrates blockchain technologies (such as Ethereum and IPFS protocols), 6G communication networks, and federated learning (or AI algorithms), aiming to enhance the security, privacy, and real-time performance of remote surgery systems
Based on theoretical simulations or examples
Simulated
Limited to a simulated environment, no actual deployment or clinical evaluation has been carried out yet
Level 1: Proof-of-concept
[]
Telemedicine (Remote Surgery System)
Developed a remote surgery system framework leveraging blockchain and FL to enhance data security, reliability, and real-time processing; the framework integrates 6G networks and IPFS for low-latency and high-reliability data transmission
Based on theoretical simulations or examples
Conceptual framework
There is insufficient discussion on the specific implementation details of BCFL in remote surgery and a lack of clinical deployment
Level 1: Proof-of-concept
aBCFL: blockchain-based federated learning.
bFL: federated learning.
cIoMT: Internet of Medical Things.
dODMSM-FL: Optimized Data Management and Secured Federated Learning.
eEHR: electronic health record.
fFBCLC-Rad: Federated Learning–Enabled Blockchain CapsNets Lung Cancer Radiologist.
lFLED-Block: Federated Learning–Ensembled Deep Learning Blockchain Model.
mBFLPD: blockchain-based federated learning for pandemic diagnosis.
nBITS: Blockchain-Driven Intelligent Scheme for Telesurgery System.
oAI: artificial intelligence.
As shown in the table, most BCFL studies are still focused on the concept or prototype stage, lacking multicenter real data validation and evaluation corresponding to clinical endpoint indicators. To promote the clinical application of BCFL, improvements need to be made in three aspects: (1) conduct cross-institutional and prospective validations to evaluate the model’s performance in real patient populations and medical processes; (2) strike a balance among security, latency, and maintainability to avoid the unavailability caused by simply pursuing complex encryption or on-chain computing; and (3) achieve integration with medical information systems, data governance and regulatory frameworks, and deploy the system under the premise of clearly defining data responsibilities and authorities. Only when verified under the joint constraints of clinical workflow, patient heterogeneity, and compliance requirements can BCFL gradually evolve from a conceptual technology to a usable medical solution.
Discussion
Challenge
Although BCFL shows great potential in transforming health care data sharing, its deployment in real medical environments is still highly limited. The vision of change for BCFL must be balanced with technical limitations and the specific complexity of applications (particularly the interoperability gap, high implementation costs, and unresolved scalability bottlenecks), which prevent it from being transformed from a conceptual framework into regular clinical practice.
There are many application challenges in the field of medical data sharing, including the lack of system interoperability and the absence of standardized benchmark datasets specifically designed for medical applications. Currently, various health care information systems (eg, EMR systems) use diverse system architectures, data formats, and operational standards, lacking a unified interoperability framework. Moreover, unlike conventional FL research that often leverages open-access datasets such as CIFAR or MNIST, health care data are inherently sensitive, fragmented, and institution-specific, which makes reproducibility and cross-study comparability particularly difficult.
Scalability and communication efficiency also present critical obstacles. As FL tasks expand, the number of health care data sources and the complexity of training increase significantly. However, in practical deployments, the scalability and throughput limitations of blockchain are particularly pronounced. Even in permissioned frameworks such as Hyperledger Fabric, which offer improved throughput, empirical benchmarks still report end-to-end latencies of several seconds per block under moderate workloads. Insufficient mining resources slow down block generation and verification, hindering the efficient execution of large-scale tasks []. Moreover, the influx of numerous participants in distributed health care environments amplifies the load on the blockchain network, while the efficiency of existing consensus mechanisms is difficult to meet the demands of health care applications []. When applied to BCFL, these constraints imply that each iteration of local training, parameter aggregation, and block creation may introduce cumulative delays that significantly slow model convergence. Furthermore, as the number of blockchain nodes rises, communication costs increase exponentially. The network delays and communication efficiency degradation caused by high communication costs directly impact the training speed and overall model performance.
High implementation and maintenance costs represent another practical barrier. Establishing a BCFL infrastructure requires significant upfront investment in blockchain nodes, secure servers, storage, and high-speed networking. Additionally, energy consumption associated with blockchain consensus protocols, as well as the operational costs of managing frequent model updates across institutions, may exceed the financial capacity of many health care providers, especially in resource-limited settings. Without clear evidence of cost-benefit balance, hospitals and regulators may be reluctant to adopt BCFL at scale.
Additional difficulties arise from the integration of BCFL within IoMT environments. The heterogeneity of IoMT devices results in substantial disparities in storage capacity, computational power, energy consumption, and communication capabilities. For instance, advanced hospital equipment often features powerful processors, stable power supplies, and ample storage space, whereas wearable medical devices typically operate on low-power batteries, constrained network bandwidth, and limited computational resources. This disparity in device capabilities poses a significant challenge to the deployment of FL models. Moreover, energy constraints and unstable network connections make edge medical devices prone to data transmission failures or system disconnections, ultimately resulting in end-device desynchronization. This issue not only hampers the timeliness of data uploads and model updates but may also prevent the global model from converging efficiently.
Compounding these challenges is the heterogeneity of health care data within IoMT systems. Data generated by various devices exhibit significant diversity, often displaying uneven distributions and violating the non-IID assumption. For instance, hospital A may collect dynamic ECG signals, whereas hospital B primarily acquires static medical images. Such disparities in data distribution exacerbate the complexity of model training. Moreover, variations in medical coding standards across countries and regions (eg, ICD-10 [International Statistical Classification of Diseases, Tenth Revision] in the United Kingdom vs ICD-10-CM [International Classification of Diseases, Tenth Revision, Clinical Modification] in the United States) contribute to data standard inconsistencies. Consequently, such heterogeneities complicate global model training, analysis, and evaluation, ultimately impairing the model’s generalization across diverse clients [].
Beyond the application-level challenges, BCFL also faces significant technical limitations that must be addressed to realize its full potential in medical settings. Although BCFL integrates the decentralized nature of blockchain with the “data availability without visibility” principle of FL to provide an initial level of privacy protection and enhance overall system security, it does not fully resolve privacy concerns. The sensitivity of patient data and the stringent privacy requirements in the medical field necessitate addressing a series of complex security challenges. The system remains vulnerable to various types of malicious attacks that pose significant risks to patient confidentiality and institutional trust. For instance, background knowledge attacks involve adversaries inferring sensitive information using previously known data and analyzing shared model parameters []. In conspiracy attacks, multiple nodes conspire to steal data features from other participants by exchanging local training information []. Inference attacks similarly analyze model parameter updates, and attackers can infer private details about patient data [,]. These threats highlight the critical need for advanced privacy-preserving mechanisms within BCFL to ensure the safety and integrity of medical data.
While existing privacy-preserving technologies provide preliminary protections, they are often inadequate when faced with the dual demands of strong privacy and high model utility. Homomorphic encryption allows computation directly on encrypted data, preventing plaintext exposure. However, its computational inefficiency makes it unsuitable for large-scale, complex operations. Differential privacy introduces noise to model parameters to obscure individual data contributions; however, this can significantly compromise model accuracy and system performance if not carefully balanced. Secure Multiparty Computation (SMPC) offers robust data confidentiality through distributed computation, yet its reliance on frequent interaction between parties contradicts the low-interaction protocols typically favored in BCFL for efficient aggregation. These limitations underscore the urgent need for lightweight, efficient, and scalable privacy-preserving solutions specifically tailored to the BCFL context.
Another critical technical issue is the optimization of incentive mechanisms. In traditional blockchain systems, fixed token-based reward structures fail to reflect the true value of each participant’s contribution []. This misalignment can result in low-quality nodes receiving undeserved rewards, while high-contribution nodes may become demotivated due to insufficient compensation. In BCFL, this issue is further complicated by the heterogeneity of participants, who differ in computational capabilities, data quality, and participation frequency. Resource-constrained nodes, in particular, may lack sufficient incentives to participate, ultimately affecting the quality and diversity of the global model. To address these disparities, incentive mechanisms in BCFL must go beyond simple token rewards and instead adopt dynamic, contribution-aware frameworks that account for the multidimensional nature of participant involvement. A well-designed incentive system can encourage broader and more sustained engagement, improve fairness, and enhance the overall efficiency and robustness of BCFL. Therefore, developing more sophisticated and adaptable incentive mechanisms is a pressing direction for future research.
While the immutability of blockchain and the tamper-resistant nature of smart contracts ensure data integrity and trustworthiness, these characteristics also introduce rigidity, posing challenges in dynamic health care environments []. Several scenarios highlight the importance for greater flexibility: when patient data are entered incorrectly or require modification, the immutable blockchain structure cannot accommodate efficiently; then patients may request the deletion or modification of their data to preserve privacy, particularly to comply with data protection regulations; moreover, in public health emergencies, ensuring timely access to accurate health care data is crucial for effective crisis management, necessitating mechanisms for controlled updates within blockchain systems. These scenarios underscore the need for greater flexibility within BCFL systems, where mechanisms must be designed to support controlled edits under predefined conditions—balancing the need for data integrity with the operational demands of evolving health care environments.
What’s more, another important consideration in the medical application of BCFL is model interpretability, which directly impacts the reliability of clinical decisions, patient trust, and regulatory compliance. In medical decision-making, the ability to interpret model outcomes is essential to ensuring safety, transparency, and credibility. However, the “black-box” nature of many complex deep learning models limits their interpretability, posing significant challenges for medical applications []. The necessity of interpretability can be emphasized from multiple perspectives: For health care professionals, AI model predictions must be interpretable to allow physicians to understand the underlying reasoning and effectively integrate them into diagnosis, treatment planning, and patient monitoring. Moreover, interpretability enables researchers and clinicians to identify and trace the sources of bias or errors in model predictions, facilitating performance optimization and improving diagnostic accuracy and reliability. Simultaneously, in terms of patients, the widespread adoption of AI in medicine inevitably raises concerns regarding privacy and ethics. Enhancing model interpretability can build patient trust in AI-assisted diagnosis and treatment by clarifying the model’s reliability and limitations, thereby mitigating concerns over “black-box” decision-making. Moreover, at the regulatory level, numerous countries and regions have mandated transparency and auditability in medical AI systems to ensure that decision-making processes align with ethical and legal standards. Furthermore, medical AI operates within a highly interdisciplinary environment, encompassing physicians, technology developers, data scientists, and other professionals. Interpretability serves as a crucial bridge for communication among experts from diverse disciplines, facilitating the effective implementation of medical AI technologies and ultimately enhancing the quality and accessibility of health care services.
Finally, summarize the limitations of the literature included in the review: first of all, the risk of prejudice is very common. Many available studies are conceptual frameworks, simulations, or small-scale case studies, rather than large-scale clinical implementations, without independent external validation. Common methodological flaws include the selective presentation of favorable performance metrics, limited or lack of adversarial and privacy attack tests, and the absence of long-term or actionable measurements (such as maintenance burden, interoperability failures, or ongoing participation rates). Unavailable code, undisclosed model/configuration details, and dependencies on nonshareable datasets often compromise reproducibility. In conclusion, these factors have created systemic uncertainties, posing the risk of overestimating feasibility and underestimating the actual deployment challenges. Second, the included studies demonstrated substantial inconsistencies in the key dimensions of system design and evaluation. Research in BCFL architecture (fully coupled, flexibly coupled, and loosely coupled), blockchain configuration (private and public), privacy countermeasures (secure aggregation, differential privacy, homomorphic encryption, and SMPC), and data sources (public benchmarks, single-center clinical records, and various IoMT streams) varies greatly. The result measurement criteria have not been standardized: some papers prioritize predictive performance, others emphasize communication or computational overhead, and still others focus on source or incentive metrics. This heterogeneity has led to different discoveries. Furthermore, there are potential biases because many studies are conducted in a controlled environment with carefully curated datasets, which may not reflect the heterogeneity and noise of real-world medical data. Finally, due to the limited performance index reports and insufficient longitudinal validation, statistical uncertainties (confidence intervals and variability between runs) are rarely reported; large-scale, multi-institutional deployments remain uncommon; and few studies have evaluated long-term stability, scalability under large volumes of clinical data, or regulatory compliance under real-world conditions. These gaps prominently indicate the need for more robust and well-designed research, including prospective clinical trials, to verify the effectiveness, safety, and interoperability of the BCFL framework in actual health care settings.
Future Prospects
The main obstacles to data sharing and collaboration in the health care system lie in the lack of interoperability and standardization, as well as the absence of standardized benchmark datasets in medical applications in the context of FL and blockchain applications. Therefore, future research must integrate the joint efforts of the government, regulatory authorities, and industry leaders to establish unified technical standards and policy frameworks and give priority to the development of standardized BCFL benchmark datasets. These standards should encompass data formats, transmission protocols, privacy safeguards, and technology implementation guidelines to facilitate seamless data integration and collaboration across diverse health care entities. Standardized datasets should reflect the realistic heterogeneity of medical imaging, EHRs, and multimodal data streams. By providing a common reference point, these benchmarks will enable fair and transparent algorithm comparisons, promote the reproducibility of results, and accelerate the transformation of BCFL innovations into clinical validation tools. In addition, benchmark datasets can be stratified based on disease types, patterns, and clinical tasks (eg, diagnosis, prognosis, and treatment response prediction), thereby allowing for more fine-grained evaluations of system performance in different health care settings. Moreover, regulatory frameworks should prioritize patient privacy and data security, delineate the rights and responsibilities of stakeholders, and foster the compliant adoption of BCFL technology. Comprehensive policy support and technical guidance will be instrumental in mitigating data silos and enhancing the efficiency of collaborative model training.
Scalability and communication efficiency remain two of the most critical technical challenges facing BCFL, particularly in large-scale and high-frequency health care environments. To address the scalability limitations, future research should focus on several key technical optimizations. The first is off-chain computing and side-chain technology []. Off-chain computing enables complex training tasks to be executed off-chain, reducing the computational burden on the main blockchain. Meanwhile, side chains can independently handle task-specific transactions, alleviating congestion on the main chain. Layer 2 protocols, such as stateful channels and plasma technology, offer a viable solution by enabling faster transaction processing while preserving blockchain security []. These technologies enhance scalability by minimizing main-chain data storage requirements. Another crucial research direction is cross-chain technologies []. This approach not only decentralizes workloads and mitigates single-chain bottlenecks but also enhances system-wide parallel processing capabilities. Moreover, cross-chain FL is particularly well-suited for cross-regional and cross-organizational health care collaborations, further strengthening resource management and system security.
Regarding communication cost and efficiency, the following directions are worth exploring in depth. One promising direction is gradient compression. Using gradient compression techniques helps reduce communication overhead. For instance, Konecny et al [] proposed that structured updates and sketch updates can significantly lower communication costs. However, this approach also introduces potential challenges, such as the loss of relevant information during compression, which may have an impact on the performance of the global model. Therefore, future research should focus on achieving an optimal balance between gradient compression and global model accuracy []. Lightweight consensus protocols are also another crucial research direction. Another key technological direction is the Digital Twin [], which minimizes the need for long-distance data transmission by enabling the generation of virtual models directly on miner nodes. This approach significantly decreases communication latency and costs, making it particularly well-suited for resource-constrained health care environments.
In the field of IoMT, to address the challenges posed by heterogeneous storage, computing, and communication capabilities of medical devices and sensors, future research should prioritize optimizing resource usage. First, lightweight ML models and algorithms, such as model compression and pruning techniques [], can be developed to alleviate the computational burden on local devices. This not only enhances device operational efficiency but also significantly reduces communication overhead. At the task allocation level, device grouping and hierarchical architectures can be leveraged to allocate computation and data aggregation tasks to devices or intermediate nodes with higher computational capacity, thereby forming a resource-optimized collaborative network.
Another critical aspect is enhancing system robustness in the face of equipment instability. To mitigate this issue, future research should focus on optimizing both failure recovery mechanisms and participant selection strategies. On the one hand, an adaptive connection protocol can be designed to enable devices to automatically rejoin the training process after a connection interruption, ensuring that the global model’s convergence remains unaffected. On the other hand, an optimization strategy based on device availability and performance can be implemented to prioritize the selection of more stable devices for training. Moreover, incorporating a flexible time window would allow devices to complete tasks within a predefined period, thereby enhancing the overall system’s fault tolerance and training efficiency.
Given the highly diverse and statistically heterogeneous nature of medical data in IoMT environments, achieving robust model generalization is another priority. The prevalence of non-IID data across institutions often results in significant discrepancies between global and local models, undermining convergence and performance. Consequently, Wu and Wang [] have proposed an optimal aggregation algorithm that dynamically adjusts the selection probability of each trainer based on the algorithm’s output. However, trainer selection based on model preferences can severely compromise the generalization capability of the global model. Future research should focus on developing a highly accurate BCFL system with enhanced model generalization. One promising direction is the development of automated data normalization tools, which are capable of recognizing various dataset formats and characteristics, performing automatic transformations to ensure that the data can be directly used by model training [].
The technical limitations faced by BCFL systems need to be well addressed in the future, and privacy protection is one of the fundamental challenges. Future research should further weigh the relationship between privacy and utility, selecting appropriate privacy-preserving techniques based on system priorities. For instance, in certain scenarios, prioritizing privacy may necessitate the adoption of computationally intensive yet more secure techniques, whereas high-performance requirements may favor more lightweight solutions. Additionally, more advanced privacy-preserving techniques can be explored. For example, zero-knowledge proof (ZKP) is a promising cryptographic method that enables a prover to demonstrate the validity of a statement to a verifier without disclosing any additional information. This ensures that the data owner can validate the accuracy of an update while keeping the original data confidential. This approach not only minimizes the trust overhead among participants but also alleviates the verification burden on clients. Future research should investigate how ZKP can be seamlessly integrated into the BCFL architecture to optimize the balance between performance and privacy, paving the way for a more efficient and secure system with enhanced privacy measures.
Another pressing issue in the development of BCFL systems is the design of fair and dynamic incentive mechanisms to ensure equitable resource allocation among contributors. On the one hand, the Shapley value-based contribution quantification method can be used to assess each participant’s impact on global model performance improvement, thereby enabling a fairer incentive distribution. On the other hand, since variations in data quality directly impact model training effectiveness, data quality-driven incentives can be introduced so that participants contributing high-quality or more representative data receive greater rewards. Moreover, integrating penalty mechanisms is also another crucial research direction []. For instance, Cui et al [] proposed withdrawing tokens when a trainer’s behavior is identified as malicious. Similarly, Weng et al [] suggested requiring trainers to predeposit tokens, which are forfeited upon detection of malicious activity, but the fairness of this deposit mechanism remains uncertain. Therefore, how to reasonably set the punishment rules under the premise of ensuring fairness still needs further research. These optimization strategies will contribute to building a more effective and equitable BCFL ecosystem, fostering its sustainable development in medical data sharing.
It is undeniable that token-based incentive mechanisms have been widely discussed as a promising approach to encouraging participation in the BCFL network. Although such mechanisms can enhance participation and promote fair resource allocation, they also introduce complex ethical issues. Therefore, in future research, ethical supervision should be embedded in token-based system design, and new solutions should be continuously explored.
Editable mechanisms are expected to be implemented in the near future. Future research could explore how to introduce a moderate degree of editability while ensuring data integrity and security. For instance, the modified Chameleon hash function (also known as the trapdoor hash function) enables controlled modification of blockchain data. When the trapdoor information is available, hash collisions can be efficiently identified, allowing input modifications without altering the hash output. This mechanism facilitates the correction of inaccurate or incomplete data while preserving the structural integrity of the blockchain [].
In parallel, the interpretability of AI models remains a central concern for the deployment of BCFL in clinical practice. Trust, transparency, and accountability are all closely tied to how well clinicians and patients can understand the rationale behind AI-generated predictions. Existing mainstream interpretability tools (eg, SHAP and LIME) have limitations in the medical field, including difficulties in handling complex distributed data environments and the inability to provide clinically relevant interpretations. Future research should focus on enhancing existing tools and developing interpretability methods adapted to BCFL, such as global-local model comparison techniques, to provide more intuitive and trustworthy interpretations.
In addition to technological and application-level innovations, the future development of BCFL also needs to confront the issue that research mainly remains at the simulation, prototype, and preclinical stages, and these studies have not yet formed a direct correspondence with real medical endpoint indicators. Future work should start from real scenarios: (1) conduct prospective validations in multi-institutional, heterogeneous data and complete clinical workflows to evaluate their actual impact on diagnostic efficiency, therapeutic effect improvement, and resource usage; (2) maintain a balance between security, latency, and maintainability in system design to avoid unacceptable computing and communication costs caused by encryption and on-chain operations; and (3) at the governance level, strengthen the connection with the medical system, regulations and ethical frameworks, and clarify data ownership, responsibility attribution, and auditing mechanisms. Only when technical performance is validated in real clinical settings, system design remains practically deployable, and governance mechanisms are clearly defined, BCFL can progress from laboratory prototypes to clinically reliable infrastructure.
While the above guidance outlines strategic directions for policymakers, clinicians, and implementers, these recommendations remain high-level. To move from broad vision to actionable progress, there is a pressing need for specific evaluative structures that can translate theories into measurable outcomes. Therefore, establishing a good evaluation framework is indispensable, which can provide a foundation for future research, allowing for comparison, verification, and ultimately integration into clinical workflows.
A unified, multidimensional evaluation framework is central to the advancement of BCFL in medicine. From a technical perspective, standardized metrics such as model accuracy, convergence speed, robustness against adversarial attacks, communication latency, and scalability across heterogeneous institutional datasets are indispensable. These indicators establish the baseline scientific validity of BCFL. Equally important afterwards are the clinical assessment criteria, which should reflect the sensitivity and specificity of the diagnosis, its universality in different patient populations and multicenter environments, reduce algorithmic bias, and bring about a tangible improvement in patient prognosis. By embedding clinical endpoints into the assessment, this framework can ensure that technological progress is in line with the real-world health care needs. Finally, there are operational indicators, which should cover interoperability with existing medical data systems, cost-effectiveness, and the long-term sustainability of deployment. By integrating these three dimensions into a unified evaluation system, researchers and clinical workers can establish benchmarks that can both horizontally compare different studies and vertically track research progress. This evaluation framework not only enhances the scientific rigor of BCFL research but also provides an evidence-based basis for decision-making sharing, thereby accelerating the transformation from experimental prototypes to clinical applications.
Conclusions
This review systematically compiles the research progress of blockchain and FL in the medical field. First, we introduce the theoretical foundations and core technical features of both technologies, analyzing how blockchain enhances the security, privacy protection, and decentralization characteristics of FL, while FL improves the computational efficiency and scalability of blockchain. In addition, we describe the three frameworks and workflows of BCFL. Next, we summarize the research progress in BCFL applications across cross-institutional health care data sharing, the IoMT, public health monitoring, and telemedicine, highlighting its practical value in privacy protection and data collaboration. Moreover, we discuss key challenges in BCFL, including computational efficiency, scalability, data privacy, and incentive mechanism design, while proposing potential solutions and future research directions. The significance of this review lies in providing a comprehensive overview of how BCFL could reshape medical data collaboration and security paradigms, thereby offering valuable insights for researchers and practitioners exploring this interdisciplinary field. However, it is important to acknowledge certain limitations of this review. This review primarily focuses on the theoretical principles and current applications of BCFL, with relatively limited exploration of specific implementation details and performance evaluations. Additionally, most existing BCFL studies rely on simulation experiments or public datasets, lacking validation with large-scale real-world medical data, which affects assessments of its practical feasibility. As such, although BCFL shows promise in supporting future intelligent diagnostics, precision medicine, and collaborative health care systems, claims about clinical readiness remain premature. Future work should focus on prospective, large-scale validation studies, interdisciplinary collaboration with health care providers, and the development of standardized evaluation protocols to ensure that BCFL solutions are clinically safe, ethically sound, and operationally feasible.
We sincerely thank the journal editor Javad Sarvestan, PhD, and the peer reviewers for their valuable feedback during this study, which has greatly enhanced our research. We used ChatGPT (OpenAI) to check and revise the vocabulary and grammar of this manuscript and polished the language in the introduction.
This research was funded by the National Natural Science Foundation of China (project no. 81974355 and 82172524), the Key Research and Development Program of Hubei Province (project no. 2021BEA161), the National Innovation Platform Cultivation Program (project no. 2020021105012440), and the Education Reform Project of Ningxia Medical University (funded by NYJY2025057).
All data generated or analyzed during this study are included in and .
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Sexual health is a fundamental aspect of overall well-being and quality of life [-]. Sexual dysfunctions, such as problems with sexual desire, arousal, orgasm, or pain, accompanied by clinically relevant distress, as defined by the ICD-11 (International Classification of Diseases, 11th Revision) [], were reported by 17.5% of women in a representative German study []. The biopsychosocial risk factors include relationship-related difficulties, poor mental health, chronic conditions, cultural factors, and lack of knowledge and experience [-].
Among individuals with chronic diseases, including gynecological conditions such as endometriosis, sexual health issues are particularly pronounced []. In a recent representative German study, 75.2% of those with chronic conditions reported problems in sexual function, and 19.3% met ICD-11 criteria for sexual dysfunction, with 2.56-fold higher odds compared to individuals without chronic conditions []. Among all chronic condition groups, women with gynecological conditions showed the highest prevalence of problems in sexual function (84.3%), and 13.5% of these women met criteria for sexual dysfunction with distress []. Chronic diseases frequently contribute to sexual dysfunction through physical, hormonal, psychological, and treatment-related factors [-], with downstream impacts on mental health, relationship satisfaction, and health care costs [-].
Endometriosis affects 6.8% of women worldwide [] and substantially impairs sexual health and quality of life [,]. The association between endometriosis and sexual dysfunction is well-documented, with many reporting sexual pain, decreased sexual satisfaction, and overall reduced sexual functioning [,]. Surgical interventions may further worsen outcomes []. Care access is hindered by shame and stigma, insufficient awareness, high costs, and gaps in provider training [,-].
Effective treatment for sexual dysfunction with distress requires a personalized, multimodal, interdisciplinary approach addressing the individual’s set of biopsychosocial etiological factors [,-]. Recommended strategies combine somatic interventions (eg, pelvic floor therapy and hormonal and medical treatments) [-] with sex and couples therapy (eg, sensate focus and communication training), educational components (eg, psychoeducation on anatomy and physiology) [,], and lifestyle-based strategies such as adapted physical activity [,]. Furthermore, evidence supports strengthening the mind-body integration through exercises on body perception, mindfulness, and reflective techniques [,-]. Underlying mental health and somatic conditions should always be addressed in interdisciplinary approaches, ideally by a multidisciplinary team trained in sexual medicine [,].
Despite high prevalence, sexual dysfunctions remain underreported and undertreated in Germany []. Persistent access barriers—including limited specialist availability, long waiting lists, and regional disparities—contrast with strong interest in digital interventions such as app- or web-based programs with exercises and educational content on sexual health [,,].
Digital self-help interventions can help overcome these barriers [], offering accessibility, anonymity, and cost-effectiveness [-]. Online treatments—especially cognitive-behavioral and mindfulness-based programs—have shown moderate to large effects on sexual function and satisfaction [,-]. In 2019, the German Digital Healthcare Act introduced a regulatory framework for the approval and reimbursement of software as a medical device, referred to as digital health applications (DiGAs) []. Currently, 2 sexual health-related DiGAs are approved and permanently listed in the registry of the German competent authority (Bundesinstitut für Arzneimittel und Medizinprodukte [BfArM]): HelloBetter Vaginismus for vaginismus [,] and Kranus Edera for erectile dysfunction []. Additional DiGAs targeting urogenital health include Endo App for endometriosis symptom management [,] and Kranus Lutera for lower tract symptoms []. Negotiated 90-day prices averaged €221.09 (US $258.70) [], with full statutory reimbursement.
Patient-centered digital intervention development [] has demonstrated high user satisfaction [,] and positive outcomes across diverse populations, including cancer [-], mental health conditions such as depression and anxiety [-], and chronic pain [-]. Yet adherence and engagement remain challenging—especially in self-guided formats [-]—and effects on engagement are mixed [,-]. A major gap persists in digital interventions addressing sexual distress in women with gynecological conditions such as endometriosis, underscoring the need for tailored, patient-centered solutions []. Existing digital tools rarely address sexuality as a relational resource; insights from syndyastic sex therapy may offer a useful framework for future digital models [].
Objective
This study evaluated the pilot implementation of a self-guided smartphone app intervention (Odeya) in women with sexual distress and diagnosed or suspected endometriosis. The Odeya app was developed using evidence-based content for patients with sexual dysfunctions in conjunction with clinically relevant distress and was tailored, within a patient-centered framework, to the specific needs of women with endometriosis. The primary objectives were to assess (1) adherence, (2) acceptance, and (3) safety. The secondary objective was to exploratorily examine effects on sexual and overall health-related outcomes using a mixed methods design.
Methods
Trial Design
A preceding patient-centered, iterative development phase informed the intervention and functioned as a preuse acceptability assessment of burden, usability, relevance, and coherence (E Kosman, MSc, et al, unpublished data, 2026). Building on this foundation, this open-label, 2-arm pilot randomized controlled trial (July 2024 to July 2025) used an expansion-type mixed methods design, collecting quantitative and qualitative data to enrich and explain emerging findings (see [-,,,,,,,,-] for design rationale and prestudy procedures) []. Longitudinal self-reported data were collected in the intervention group (IG) and control group (CG) at baseline (T0) after randomization, peritreatment (T1; week 5 in the CG/after completing module 5 in the IG), posttreatment (T2; week 8 in the CG/after completing module 8 in the IG), and at 6-month follow-up (T3). The IG received Odeya app access; the CG was allowed to use existing treatment options within the health care system (treatment as usual) and offered later access. The trial was preregistered (German Clinical Trials Register: DRKS00034351) [] and adhered to Consolidated Standards of Reporting Trials of Electronic and Mobile Health Applications and Online TeleHealth checklist () [], and National Institutes of Health best practice guidelines for mixed methods research in the health sciences [].
Participants
Participants had to meet the following criteria to be eligible: (1) sufficient understanding of the German language, (2) being at least 18 years of age, (3) a physician-suspected or confirmed diagnosis of endometriosis or adenomyosis, (4) clinically relevant sexual distress (Female Sexual Distress Scale-Desire/Arousal/Orgasm [FSDS-DAO] >18), and (5) owning a smartphone (iOS or Android). The exclusion criteria were (1) current severe depression (Beck Depression Inventory-II ≥29) [], (2) current severe anxiety (Generalized Anxiety Disorder 7-item ≥15) [], (3) suicidal tendencies in the last 5 years, (4) current symptoms of posttraumatic stress disorder (PTSD), (5) substance dependence in the last 2 years, (6) current psychosis or dissociative symptoms, and (7) current pregnancy. If the online screening responses indicated the presence of potential PTSD or substance dependence, the participants were invited via email to take part in an additional telephone screening interview. The International Trauma Questionnaire [] was used to assess PTSD, and the relevant section of the Structured Clinical Interview for the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition) [] was used to assess substance use.
Intervention
The Odeya intervention was a self-guided smartphone app intervention developed to address sexual distress in women with endometriosis, for both single and partnered users.
It was developed at the Institute of Sexology and Sexual Medicine, Charité–Universitätsmedizin Berlin, within the Berlin Institute of Health Digital Health Accelerator program, in collaboration with Hybrid Heroes GmbH, using a patient-centered, iterative process (E Kosman, MSc, et al, unpublished data, 2026). The intervention comprises 8 self-guided modules and a symptom-tracking tool, intended for completion over 12 weeks with a 4-week buffer. Modules unlock weekly, each taking ~60 minutes plus 15-60 minutes of exercises. Users can pause and resume at any time. See and for a list of module topics and examples of user interface screens. Delivery was multimodal (text, video, audio, graphics, or interactive tasks) with exercises such as pelvic floor training, guided body-based activities, mindfulness, and sensate focus for solo or partnered practice. Participants in relationships were encouraged to involve their partners, while equivalent solo options were provided. The symptom-tracking tool monitors 4 domains (body, mind, social, and sexuality) using 1-10 ratings for pain, stress, self-care, and sexual satisfaction, plus standardized and personalized symptoms. See the Methods section in for further details on development, personas, technical notes, and symptom tracking.
Table 1. Overview of the Odeya intervention modules.
Module
Title
Purpose and key topics
Relational focus
1
You are not alone
Introduces sexual dysfunction and its interplay with endometriosis; fosters health literacy and support.
Normalizes relational strain within the biopsychosocial framework of sexual distress.
2
My body
Builds body awareness and sexual anatomy knowledge; covers stimulation techniques for solo and partnered activity.
Encourages exploration of bodily responses during solo and partnered touch.
3
My pain is real
Explains pain mechanisms and central sensitization; provides strategies for sexual pain management.
Promotes communication about pain, joint coping, and coregulation during intimacy.
4
My sexual self
Explores personal needs, preferences, boundaries, and the role of fantasies in sexual agency.
Encourages sharing preferences and boundaries in communication with partners.
5
My emotional network
Addresses dysfunctional beliefs using the Fear-Avoidance Model and cognitive restructuring.
Reflects on how emotional and cognitive patterns affect relationship dynamics and intimacy.
6
My sexual communication
Enhances communication skills and intimacy through couples’ exercises (eg, sensate focus).
The full module has a relationship focus. Emphasizes sexuality as relational communication and includes guided communication strategies and sensate focus exercises for couples.
7
My sexual response
Explains the sexual response cycle in a partnered context, contextual influences, and effects of stress on arousal.
The full module has a relationship focus. Addresses dyadic factors influencing arousal, responsiveness, and shared satisfaction.
8
My resources
Summarizes learnings; develops personalized routines and booster strategies for sustained sexual health.
Encourages maintaining intimacy and open communication with partners.
Figure 1. Interface of the Odeya intervention. Five smartphone screens are shown from left to right: (A) symptom-tracking overview displaying symptom levels (upper section) and present symptoms for a single entry (lower section); (B) overview of intervention modules including modules 1 and 2; (C) example of module content with graphics and an audio exercise; (D) example of psychoeducational content presented as text; and (E) example of psychoeducational content presented as video.
Procedure
Recruitment took place from May to October 2024. Participants were recruited online through study announcements posted by endometriosis associations on their websites, Instagram accounts, and in endometriosis-related Facebook groups in Germany, Austria, and Switzerland. In-person recruitment took place via the Department of Gynecology at Charité–Universitätsmedizin Berlin and through a flyer campaign in outpatient gynecological practices across Germany (see Methods section in for selection procedures). Interested individuals emailed the study team, provided medical documentation of suspected or confirmed endometriosis or adenomyosis, and received study information. A pseudonym and masked email address were created for each participant [] using the tool AnonAddy to protect participant identity. For screening and follow-up, they received a link to the data management platform REDCap (Research Electronic Data Capture; Vanderbilt University) [,] via email hosted at Charité–Universitätsmedizin Berlin. Participants were invited to a telephone interview with a clinical psychologist if further clarification was required. After randomization, the IG received access to the Odeya app. Qualitative interviews were offered to three groups: (1) IG dropouts, (2) IG completers, and (3) CG completers.
Randomization
Balanced block randomization with 4 blocks was generated in R software (version 4.5.1; R Foundation for Statistical Computing) and implemented in REDCap (version 15.5.29) by the research team to ensure a 1:1 allocation ratio between the IG and CG. Randomization was stratified by relationship status (single vs in a relationship) and age (<30 vs ≥30 years). Participants were informed of their allocation.
Quantitative Measures
This study assessed (1) adherence through module completion, app use duration, dropout status (IG: >4 weeks inactivity with module-locked assessments; CG: nonresponse to scheduled questionnaires), and symptom tracking activity; (2) acceptability, measured with the Client Satisfaction Questionnaire-Internet (CSQ-I), the German mHealth App Usability Questionnaire (G-MAUQ), and a single Visual Analog Scale for Client Satisfaction item on overall satisfaction; (3) safety, evaluated with the Inventory for the balanced assessment of Negative Effects of Psychotherapy-Online Intervention (INEP-ON) and self-reported changes in health status; and (4) sexual and health-related outcomes, including the FSDS-DAO, Female Sexual Function Index-German version (FSFI-d), Fear of Sexuality Questionnaire (FSQ), Vaginal Penetration Cognition Questionnaire (VPCQ), Central Sensitization Inventory-German version (CSI-GE), Partnership Questionnaire (PFB), and the Patient-Reported Outcome Measurement Information System-29-Item Profile (PROMIS-29). Detailed descriptions of assessment time points, selected measures, and additional questionnaires implemented in the broader study framework are provided in (Methods section and Table S1 in ).
Statistical Analysis
Descriptive statistics (means, SDs, and medians with IQR) were computed for all sexual and health-related continuous variables at baseline (T0), T1, T2, and 6-month follow-up (T3). Change scores (Δ=valuet – value0) were analyzed with linear models for continuous outcomes. Models included group, time, and their interaction, adjusted for the respective baseline value. Adjusted mean changes and 95% CIs were obtained via the emmeans package in R software (version 4.5.1) and visualized. Adjusted between-group differences (IG-CG) are presented as mean differences with 95% CIs; P values were not shown, given the exploratory design. Standardized effect sizes (adjusted Cohen d; small: ≥0.2; medium: ≥0.5; large: ≥0.8) [] were computed for the comparison of change to baseline means. For descriptive reporting, pooled SDs of the individual change scores per time point were added. As all questionnaires used mandatory fields, no item-level missing data occurred. Missingness was limited to uncompleted measurement time points and handled by available-case analysis without imputation. A predefined target sample size of n=60 was based on feasibility considerations within the mixed methods pilot design.
Qualitative Interviews
Overview
The qualitative substudy included 16 participants sampled from IG dropouts (n=11; mean age 29.36, SD 3.91 years), IG completers (n=3; mean age 37.5, SD 13.44 years), and CG completers (n=2; mean 33.67, SD 5.69 years; see Table S2 in for participant characteristics). While all IG participants were invited, CG completers were recruited stepwise using criterion sampling to ensure sociodemographic diversity (age and relationship status) due to feasibility constraints. Given the small, uneven subgroup sizes, the qualitative data were used to capture a breadth of perspectives and to identify barriers and facilitators relevant to feasibility and implementation. Interviews and analysis proceeded iteratively; recruitment ended when no additional overarching feasibility-relevant issues were identified in the final interviews (further details in Methods section in ). Three distinct semistructured interview guides () were developed for each participant group to explore usability, engagement barriers, perceived helpfulness, and contextual factors.
Qualitative Data Analysis
IG dropout data were analyzed using a free-listing approach, a qualitative elicitation method that generates structured and quantifiable data by asking participants to spontaneously enumerate all responses relevant to a prompt. This method captures how individuals naturally conceptualize and prioritize health-related experiences, making it particularly useful for identifying perceived barriers, needs, and salient usability issues from participants’ own language [,]. Lists were subsequently cleaned, consolidated, and organized into categories through iterative coding and consensus discussions []. Thematic salience was derived by examining the frequency, following established guidelines []. Interviews with IG and CG completers were transcribed using f4transkript [] and analyzed with a qualitative content analysis following Schreier’s toolbox model []. Coding was conducted inductively: an initial coding frame was developed from the data through pilot coding, iteratively refined as further transcripts were coded, and subsequently applied to the full dataset in MAXQDA (version 24.3; VERBI Software GmbH). Analytical rigor was supported through memo-writing and regular peer debriefing within the research team. Further details are provided in (Methods).
Triangulation of Qualitative and Quantitative Data
In line with the expansion-type mixed methods design, quantitative and qualitative data were analyzed separately by researchers with complementary methodological expertise. Integration occurred at the interpretation stage through triangulation, whereby findings from both data strands were systematically compared and synthesized in relation to study end points through iterative discussion within the research team. Integrated findings are presented in a joint display table.
Ethical Considerations
The human participant study was approved by the Ethics Committee of Charité–Universitätsmedizin Berlin (EA4/217/23). Participants received no compensation and could contact the research team for technical support during the study period. Data were collected and managed in REDCap hosted at Charité. Written informed consent was obtained via postal mail.
Results
Characteristics of Study Population
Between July and November 2024, 187 individuals were screened. Of those, 132 completed the screening process, and 72 were excluded due to eligibility screening (). After randomization, 29 women were assigned to the IG, and 31 were assigned to the CG. The mean age of the 60 included participants was 31.12 (SD 6.67) years, with ages ranging from 21 to 59 years (). Most participants (51/60, 85%) were in relationships and reported high relationship satisfaction (median 8.0, IQR 7.0-9.0). The overall level of education was high, with 81.7% (49/60) having 12 or more years of education. Only 1.7% (1/60) had previously accessed sex and couples therapy. However, half of the participants reported previous experience with psychotherapy. For information on motivation, participation expectations, and FSDS-DAO scores before and after randomization, see the Methods section in .
Figure 2. Flowchart of participants. CG: control group; FSDS-DAO: Female Sexual Distress Scale-Desire/Arousal/Orgasm; IG: intervention group.
Table 2. Characteristics of the study population at baseline (T0) stratified by intervention group (IG) and control group (CG). Percentages refer to column totals.
Characteristic
Total (N=60)
IG (n=29)
CG (n=31)
Age (years), mean (SD)
31.12 (6.67)
31.1 (5.29)
31.13 (7.83)
In a relationship, n (%)
51 (85)
26 (89.7)
25 (80.6)
Relationship durationa, mean (SD)
75.80 (67.64)
69.52 (56.86)
82.33 (78.01)
Relationship satisfaction (0-10)b, median (IQR)
8.0 (7.0-9.0)
8.0 (7.0-9.0)
8.0 (7.0-9.0)
Education ≥12 years, n (%)
49 (81.7)
23 (79.3)
26 (83.9)
Urban residence, n (%)
33 (55)
16 (55.2)
17 (54.8)
Heterosexual, n (%)
51 (85)
23 (79.3)
28 (90.3)
Partnered intimacyc,d, n (%)
47 (78.3)
24 (82.8)
23 (74.2)
Masturbationc, n (%)
20 (33.3)
7 (24.1)
13 (41.9)
Religious, n (%)
23 (38.3)
10 (34.5)
13 (41.9)
Histologically confirmed endometriosis, n (%)
52 (86.7)
26 (89.7)
26 (83.9)
Previous operation, n (%)
53 (88.3)
26 (89.7)
27 (87.1)
Hormonal medication, n(%)
Progestin-only contraceptive pill
17 (28.3)
10 (34.5)
7 (22.6)
Combined oral contraceptive pill
6 (10)
2 (6.9)
4 (12.9)
Sex therapy or couples therapy, n (%)
1 (1.7)
0 (0)
1 (3.2)
Psychotherapy, n (%)
31 (51.7)
17 (58.6)
14 (45.2)
Other diagnoses, n(%)
PCOSe
0 (0)
0 (0)
0 (0)
PMSf
13 (21.7)
8 (27.6)
5 (16.1)
Uterus myomatosus
3 (5)
3 (10.3)
0 (0)
Uterus prolapse
1 (1.7)
0 (0)
1 (3.2)
Incontinence
3 (5)
1 (3.4)
2 (6.5)
Infertility
1 (1.7)
1 (3.4)
0 (0)
Vulvodynia
1 (1.7)
0 (0)
1 (3.2)
Lichen sclerosus
0 (0)
0 (0)
0 (0)
Cancer
2 (3.3)
2 (6.9)
0 (0)
Lifestyle, n(%)
Physical activity
41 (68.3)
22 (75.9)
19 (61.3)
Healthy diet
53 (88.3)
25 (86.2)
28 (90.3)
Smoking
5 (8.3)
2 (6.9)
3 (9.7)
Alcohol consumption
8 (13.3)
3 (10.3)
5 (16.1)
BDI-IIg (0-63), mean (SD)
13.3 (7.07)
13.45 (6.99)
13.16 (7.26)
GAD-7h (0-21), mean (SD)
5.63 (3.4)
6.55 (3.72)
4.77 (2.88)
SSP-Fi, n (%)
54 (90)
25 (86.2)
29 (93.5)
BSI GSIj, mean (SD)
54.07 (11.88)
54.86 (12.74)
53.32 (11.18)
BSI PSDIk, mean (SD)
53.53 (10.14)
54.52 (11.33)
52.61 (8.98)
BSI PSTl, mean (SD)
55.05 (12.73)
55.45 (12.96)
54.68 (12.72)
CTQm: Sexual Abusen, n (%)
7 (11.7)
3 (10.3)
4 (12.9)
CTQ: Any Traumao, n (%)
22 (36.7)
9 (31)
13 (41.9)
aIn months.
bHigher values indicate better outcomes.
cMore than once per week.
dPartnered intimacy referred to activities such as cuddling and kissing.
ePCOS: polycystic ovary syndrome.
fPMS: premenstrual syndrome.
gBDI-II: Beck-Depression-Inventory-II.
hGAD-7: Generalized Anxiety Disorder Scale-7.
iSSP-F: Screening for Sexual Problems.
jBSI GSI: Brief Symptom Inventory Global Severity Index.
nSexual trauma was defined using the Childhood Trauma Questionnaire Sexual Abuse Subscale, with a cutoff score of 8 [].
oAny trauma was defined as meeting the Childhood Trauma Questionnaire cutoff for at least moderate severity [].
Main Findings
Overview
The following section reports adherence and user behavior, including module completion, dropout rates, time spent in the app, reasons for discontinuation, and use of the symptom tracker, as well as acceptance and user satisfaction assessed with validated questionnaires. Qualitative facilitators and barriers to adherence and acceptance follow in subsequent sections. Safety outcomes comprised balanced effects, self-reported health changes, and stressful life events.
Adherence and User Behavior
Dropout Rates, Time, and Reasons
Participants in the IG completed a median of 6 (IQR 2-8) modules (mean 4.9, SD 2.97; 4.9/8, 61.2% of the total content). All 8 modules were completed by 34.5% (10/29) of the IG, with a dropout rate of 65.5% (19/29), which is higher than in the CG (7/31, 22.6%). Mean app usage duration, defined as the time from first log-in until either completion of module 8 or the last recorded in-app activity (therapeutic content or symptom tracking), was 15 weeks (range 0-30) in the IG. Among completers, the mean duration was 18 (range 10-30) weeks, whereas noncompleters used the app for an average of 13 (range 0-30) weeks. IG dropouts were distributed throughout the course of the intervention: before starting the app (n=3), after module 1 (n=3), module 2 (n=2), module 3 (n=1), module 4 (n=4), module 5 (n=1), module 6 (n=4), and module 7 (n=1). Baseline characteristics (Table S3) and baseline values of outcomes (Table S4) of IG completers and IG dropouts are presented in . Health care behaviors showed little change; only isolated therapy initiations occurred (Results section in ). Reasons for dropout included time constraints, technical difficulties, life changes, and perceived length of app units (Tables S5-S7 in ).
Adherence Facilitators and Barriers
High initial motivation and the persona-based progress stories supported adherence:
You also got to witness the progress of the three women, and that was motivating and encouraging. [IG1]
Several participants felt acknowledged by the intervention, contrasting it with previous clinical encounters where their concerns had not been taken seriously.
Participants identified unexpectedly high time demands and emotional strain of prolonged self-reflection as key barriers:
It kept getting longer and more extensive…. I think it was supposed to be 50 minutes per module…, but I sometimes needed almost three times as long…. It was really exhausting to constantly engage with myself. [IG1]
Some wished for more interaction with health care professionals or peers and reported difficulties applying insights to daily life and relationships.
When it comes to communication in the relationship that was also a very big topic in the app and I’ve always struggled with that. There were helpful impulses, but I couldn’t really implement them at that point. [IG2]
Technical issues—videos stopping without replay options, restrictive response formats, and static text fields limiting review of longer entries—further affected usability and adherence.
Use of Symptom Tracker: Frequency and Evaluation
Symptom tracking usage varied widely across participants. The average number of tracking entries during intervention was 18.34 (SD 23.77; range 0-99), with completers averaging 23.20 (SD 34.03; range 4-99) and dropouts 15.79 (SD 16.73; range 0-46). The most frequently self-selected tracked symptoms included, for emotional states: feeling relaxed, sad, or tired; for physical symptoms: bowel problems, intake of pain medication or hormones, and bladder problems; for social aspects: participation in social activities, sports, or work-related stress; and for sexuality: no sexual activity, masturbation, or sexual intercourse. shows the mean tracked levels of sexual satisfaction, self-care, pain, and stress.
Qualitative feedback on tracking reminders was mixed; some participants perceived them as redundant or burdensome, while one reported them as helpful.
Figure 3. Rating of intensity levels of sexual satisfaction, self-care, pain, and stress in the IG, stratified for completers (n=10) and dropouts (n=15), showing median rating averaged across the usage weeks.
Acceptance
User Satisfaction
At midintervention, the median satisfaction score was 8.00 (IQR 7.0-8.0; mean 7.53, SD 1.36; n=15), and at postintervention (T2), 7.50 (IQR 6.0-8.0; mean 7.00, SD 1.83; n=10) on a 10-point scale (Visual Analog Scale for Client Satisfaction). At T2, CSQ-I ratings had a mean of 26.60 (SD 4.12) of 32. The CSQ-I sum of items assessing general satisfaction (items 1, 2, 3, 5, and 7) averaged 16.30 (SD 2.45), item 4 assessing recommendation 3.50 (SD 0.71), item 6 assessing helpfulness 3.10 (SD 0.88), and item 8 assessing likelihood of reuse 3.70 (SD 0.67). Based on these ratings, 80% (8/10) of participants agreed or strongly agreed with CSQ-I total items, 90% (9/10) reported that they would recommend or reuse the intervention, and 70% (7/10) reported that the intervention was helpful and fulfilled their general satisfaction. Usability, assessed with the G-MAUQ, showed a total score of 5.38 (SD 0.74), with subscales of ease of use (mean 6.46, SD 0.64), interface satisfaction (mean 5.28, SD 0.78), and usefulness (mean 4.46, SD 1.69). For detailed ratings, including median, minimum, and maximum values, see Table S8 in .
These findings were closely mirrored in participant interviews, where usability, app design, and comprehensive content were highlighted as facilitating factors.
First of all, in terms of the technical aspects and structure, I found the app very clear and easy to use. [IG1]
Qualitatively, some users expressed discomfort or ambivalence about specific exercises, such as guided masturbation or vulva self-exploration.
Facilitators and Barriers to Acceptance
Multimedia elements and the option to proceed at one’s own pace were highly valued. Participants appreciated the intervention’s diverse components, including normalization through personas (“not feeling alone”), psychoeducation, practical and communication exercises, structured reflection, partners’ involvement, and gradual exposure to sensitive topics, especially the sensate focus exercises.
However, several barriers were identified. Technical issues and content overload inhibited satisfaction for some. These frustrations might explain lower satisfaction scores or disengagement in some cases. Weekly exercises were challenging, leading to self-doubt and frustration, although these feelings were partly mitigated by greater awareness of social influences, as one participant described:
Sometimes I felt very frustrated, doubting if it was my fault…. I even felt guilty after reading or doing certain exercises…. However, I was able to balance this by reflecting on what I’d learned from my parents’ attitudes toward sexuality…phrases from friends or relationships. These insights gave me “aha” moments, helping me feel less desperate and understand that other factors might be involved. [IG3]
Safety
Balanced Effects
The most frequently endorsed positive effects in the INEP-ON outcome were the helpfulness of new ways of thinking (T2: 10/10, 100%; T3: 6/7, 85.7%) and moderator support (T2: 9/10, 90%; T3: 4/6, 57.1%; Table S9 in ). Increased motivation for psychotherapy was reported by 60% (6/10) at T2 and 42.9% (3/7) at T3. Improved overall well-being was noted by 90% (9/10) postintervention, but only 28.6% (2/7) at follow-up, while 57.1% (4/7) reported deterioration. Negative responses on balanced items were otherwise rare (≤10% at T2 and <30% at T3; see Results section in ). Items addressing exclusively negative effects (Table S10 in ) showed longer periods of not feeling well in 50% (5/10) at T2 and 71.4% (5/7) at T3 (see Results section in for details).
Self-Reported Health Changes and Stressful Life Events
A total of 6 women in the IG and 8 in the CG reported events, including new medical diagnoses (eg, cardiac arrhythmia, asthma, adenomyosis, suspected lipedema, and migraine), hospitalizations, bereavement, or relationship breakups. In the IG, only 1 affected participant dropped out; the others completed the study.
Secondary Outcomes
Overview
Secondary outcomes included exploratory analyses of changes in sexual health (FSDS-DAO, FSFI-d, FSQ, VPCQ, CSI-GE), relationship (PFB), and overall health (PROMIS-29). Table S11 in reports mean scores and Table S12 baseline-adjusted changes with effect sizes for all outcomes. illustrates sexual health outcomes based on total scores.
Figure 4. Sexual health outcomes over time as changes from baseline (Δ) for intervention group (IG) and control group (CG) across T0-T3. Points and lines depict estimated marginal means of the change scores; error bars indicate 95% CIs. Above the x-axis, the between-group effect size (d) from the linear model (Δ ~ Group × Time + Baseline) is shown for each time point. Shaded backgrounds denote study phases (on treatment: T0-T2; follow-up: T3). Panels A-D correspond to outcomes. CSI-GE: Central Sensitization Inventory-German version; FSDS-DAO: Female Sexual Distress Scale-Desire/Arousal/Orgasm; FSFI-d: Female Sexual Function Index-German version; PFB: Partnership Questionnaire; T0: baseline; T1: after module 5/5 weeks after baseline; T2: after module 8/8 weeks after baseline; T3: 6-month follow-up after baseline.
Sexual Health and Relationship-Related Outcomes
The IG showed a stronger early reduction in sexual distress (FSDS-DAO) (T1: IG Δ=–10.39 vs CG Δ=–3.68; between groups Δ=–6.71, 95% CI –13.13 to –0.29; d=–0.66) while both groups improved by T2 ( and Table S12 in ). At T3 the IG again showed a stronger reduction (T3: Δ=–8.05, 95% CI –15.89 to –0.22; d=–0.79).
Sexual function (FSFI-d) improved continuously in the IG (T1-T3 Δ=1.29-3.63), with the largest between-group difference at T3 (Δ=6.51, 95% CI 1.48-11.55; d=1.00).
Fear of coitus (FSQ) showed small between-group effects, with the IG showing greater improvement at T1 (d=−0.21) and T3 (d=−0.14), but the CG at T2 (d=0.17). Fear of noncoital activity decreased continuously in the IG by about 0.6 points at T2 (scale ranging from 5-25), with small to large effects between groups (d=−0.14 to −0.84; Table S10 in ).
Regarding vaginal penetration cognition (VPCQ), results were mixed, with small to moderate improvements in the IG, primarily at T1 and T3 (eg, between-group effects: at T3, control cognitions d=0.48; catastrophic and pain cognitions d=−0.17). The only scale showing continuous improvement favoring the IG was incompatibility cognitions, with between-group effect sizes ranging from d=−0.34 to −0.89.
Central sensitization (CSI-GE) improved in both groups. Small effects favored the IG at T1 (d=−0.15) and T3 (d=−0.30), while at T2 a small effect favored the CG (d=0.32).
Partnership quality (PFB) demonstrated only small changes over time (T1: Δ=1.58; d=0.21; T2: Δ=−0.42; d=−0.05; T3: Δ=−1.34; d=−0.18). Subscale effects were likewise small: disruptive behavior worsened slightly in the IG at T3 (d=0.35), whereas communication improved in the IG at T1 (d=0.26) and T3 (d=0.21), with no change at T2 (d=−0.01).
Qualitative interviews indicated that IG participants frequently reported new positive sexual experiences:
Using the app helped me to rediscover how to have positive experiences with sexuality—across the entire spectrum of what sexuality can be. [IG1]
Several reported that their perception and communication of sexual pain improved, although complete pain relief was rarely achieved:
From the communication perspective, I felt that I was often able to say, “Okay, I have pain now,” and I could localize it, to give him that feedback. [IG3]
Further, IG participants described gaining more courage and reduced fear and avoidance of sexual activities, reporting increased confidence in approaching both coital and noncoital intimacy:
To find the courage to start again and actually do something. And also, those small steps—that was exactly what I needed. I can imagine many others feel the same, because there’s always that fear of penetrative sex. [IG1]
Qualitative analyses revealed that participants in the IG experienced increased openness and more frequent communication with their partners regarding sexuality and recognized the issue as a shared concern affecting both partners, rather than an individual problem. One participant noted:
In the beginning, when I started using the app, I really tried to communicate more with my partner…. It’s usually such an uncomfortable topic for me—I’ve never talked about it openly. But I tried to tell him about the app, showed him parts of it…. I realized, it’s not just my issue, it’s something that affects the relationship too, and we’re both suffering in some way. [IG3]
Additionally, several participants described a shift toward a more constructive and collaborative dynamic, moving away from blame:
We talked about that very openly—that it’s not about anyone being at fault or him hurting me or anything like that, but rather that we go through the process together and figure out, together, what feels good and what doesn’t. [IG3]
Although CG participants had no access, qualitative interviews suggested that study participation itself fostered self-reflection prompted by study questionnaires.
I actually liked that it forced me to keep reflecting…on what’s going on in that part of my life, my sexuality or lack of libido. [It was really] helpful]. [CG1]
CG participants also reported positively impacting life events over the course of the study (eg, hormonal changes and changes in sexual life circumstances) and expressed continued interest in the intervention after the study.
Health-Related Quality of Life
Results for the PROMIS-29 domains were mixed overall. Anxiety, depression, fatigue, and physical function improved in both groups at T2, although between-group comparisons favored the CG at several time points. In contrast, social participation, pain interference, and pain intensity improved across all time points in the IG, with between-group effects favoring the IG at T2 and T3 and showing moderate to large effect sizes.
Qualitatively, participants in the IG attributed enhanced emotional regulation and reduced distress to the intervention, linking these psychological changes to better physical health management:
That was one of my goals—to somehow get out of this negative emotional state. And it actually worked. Of course, there are still phases or moments when you think, “Hmm…” But then things come to mind—things you can do, right? And you realize, okay, even if it’s not working right now, that’s okay too—it’s not the end of the world. [IG1]
Self-reflection and body awareness were recurring qualitative themes reported by both groups, further elucidating why both showed improvement in those domains.
Integration of Quantitative and Qualitative Findings
The integration of quantitative outcomes and qualitative interviews () provides a nuanced picture of the intervention’s effects on sexual health, relationships, well-being, and user engagement. Quantitative findings contextualized quantitative improvements and highlighted adherence challenges and the complexity of symptom management.
Table 3. Synthesis: integration of quantitative and qualitative results. Intervention group (IG) > control group (CG) indicates between-group differences in favor of the intervention group, whereas IG < CG indicates differences in favor of the control group.
Outside app: start of psychotherapy, no change in costs
Facilitator: persona stories
Barriers: emotional strain, time demands
Reminders: helpful or confusing
Dropout: time, technical issues, life changes
Need: more professional/patient interaction
High dropout aligns with time/emotional strain; progress stories may support adherence. Mixed feedback on tracking suggests the need for customizable reminders. Behavior outside app indicates broader health-seeking changes.
Acceptance
CSQ-Ia mean 26.6 (SD 4.1) of 32
G-MAUQb mean 5.4 (SD 0.7) of 7; “Ease of use” mean 6.5 (SD 0.6) of 7
Mean price €83 (range €0-€300; €1=US $1.17); 30% (3/10) no self-pay
High satisfaction matches positive reports; barriers may explain disengagement. Payment variability underscores the need for reimbursement.
Safety
INEP-ONc: learned strategies, moderator support
“Negative events” rare (≤10% in T2, <30% in T3); mainly periods of not feeling well
Reports of health changes, stressful life events in both groups
Findings suggest overall safety; negative responses linked more to external stressors than intervention.
Sexual health
FSDS-DAOd: improved in both groups, IG > CG at T1, T3
FSFI-de: IG > CG
FSQf “coital”: improved in both, IG vs CG mixed; “noncoital”: IG > CG
VPCQg “control,” “catastrophic”: improved in both groups, IG vs CG mixed; “Incompatibility”: IG > CG
PFBh: no change at T2; IG vs CG mixed; “Communication”: IG > CG at T1, T3
CSI-GEi: improved in both groups, IG > CG at T1; T3, IG < CG at T2
Facilitators: rediscovery of positive experiences, improved pain communication, openness, collaboration
Barriers: ambivalence toward some exercises
CG: self-reflection via questionnaires and self-applied tools/books
Quantitative gains align with reports of reduced distress and better sex-related communication. Although some changes were not strongly reflected in quantitative scores, participants described meaningful qualitative improvements. CG improvements may be attributed to study participation and self-reflection.
Overall Health
PROMIS-29j: “anxiety,” “depression,” “fatigue,” “physical function” improved at T2 in both groups, mostly IG < CG; “Social participation”: IG > CG at T3; “Pain interference”, “pain intensity”: IG > CG at T2, T3.
Both groups, especially the CG, showed broad improvements in psychological well-being and physical functioning, while the IG exhibited reductions in pain interference and intensity. Qualitative reports of greater emotional regulation, open partner communication, and body awareness in the IG support these findings, indicating that self-regulatory and relational processes may reduce pain-related avoidance and attentional capture by pain.
iCSI-GE: Central Sensitization Inventory-German Version.
jPROMIS-29: Patient-Reported Outcome Measurement Information System-29-Item Profile.
Discussion
Principal Findings
This pilot implementation study examined the adherence, acceptability, safety, and exploratory effects of the self-guided Odeya app for women with sexual distress and diagnosed or suspected endometriosis. Adherence was moderate, with most completers working through several modules over an average duration of 18 weeks, while dropout was high. Overall satisfaction with the app was strong, and no negative life and health changes were attributed to the intervention. Quantitative outcomes showed reductions in sexual distress in both groups, with some advantages for the IG in sexual function, penetration-related fears, and partner communication. Broader health and mental health outcomes showed mixed but generally positive changes across groups. Qualitative feedback supported these trends, describing more positive sexual experiences, improved partner communication, greater pain awareness, and reduced fear of penetration.
Adherence
Overview
The intervention dropout rates in our study were notably high, a phenomenon typically reported in digital interventions for chronic conditions. A meta-analysis found a pooled dropout rate of 43% across app-based interventions for chronic diseases []. In the HelloBetter Vaginismus DiGA effectiveness trial, Zarski et al [] reported dropout of 22% postintervention among women with dyspareunia, with participants completing on average 6 of 8 modules. In an early study, the Endo App retained 64.4% (29/45) of endometriosis patients in week 12 []. The German DiGA registry (BfArM) reports low attrition in IGs for several DiGAs, including the Endo App (3.75% dropout; NCT04883073) [], Kranus Edera for erectile dysfunction (4.1%) [], and Kranus Lutera for incontinence (5.4%) []. In comparison with these trials, dropout in our study was higher in the IG than in the CG [,,,]. Several factors may explain the comparatively high dropout in our sample.
Health Status
First, our study population comprised individuals with endometriosis and sexual distress, a particularly complex clinical profile that may require more intensive support than fully self-guided interventions can provide []. Unlike trials with stricter exclusion criteria (eg, excluding chronic pain or moderate depression [,,], recent medication or surgery changes [], chronic infections [], or postsurgical/cardiovascular risk) [], our study was deliberately inclusive, excluding only severe depressive or anxiety symptoms. This approach enhanced external validity but may have increased attrition among participants facing multiple health burdens. At baseline, dropouts reported lower quality of life and greater symptom burden (anxiety, depression, stress, fatigue, and pain) yet less sexual distress and better sexual function than completers. Visual inspection of the symptom-tracking data suggested slightly higher median values among dropouts for sexual satisfaction and pain compared to IG completers. This health status pattern may also explain the continuous dropout observed in our study, as disengagement likely reflects participants’ health burden rather than low initial motivation or lack of interest in the intervention—contrasting with the early-stage dropout typically reported in digital mental health apps [,]. Some studies found no link between depression or pain and attrition [,]. However, a systematic review and a meta-analysis identified baseline depression and comorbid anxiety as predictors of dropout [,]. Further, most participants were in relationships. Being partnered has been identified as a risk factor for sexual dysfunction with distress [], whereas being single has already been associated with higher dropout [].
App Engagement
Second, app engagement and adherence may operate in both directions. While active tracking could facilitate greater adherence, it is also plausible that individuals with better health or mental well-being were more likely to engage with the app. Dropouts tracked symptoms less frequently, indicating lower initial engagement with self-monitoring features. Early engagement predicts adherence in digital health programs [,,] and may mark a critical window for retention strategies. Evidence on the link between tracking and adherence in digital health interventions is mixed: some studies report benefits of consistent symptom tracking, while others show only transient effects [,,]. In our study, reduced tracking and higher dropout among participants with elevated anxiety and depression suggest that mental health symptoms hinder sustained engagement. This aligns with meta-analytic evidence showing that engagement modestly predicts improved outcomes, with specific indicators such as module completion being most predictive [].
Program Design
Third, compensation structures and intervention duration may have influenced adherence. Studies with better adherence rates often provided participant compensation or had shorter intervention periods [], factors that can significantly impact engagement and completion rates. Although a 12-week duration is clinically appropriate and common [,,,,], it may be demanding for participants with chronic pain and its psychological burden, as noted in prior studies []; furthermore, modules were reported to exceed the displayed module duration of 60 minutes.
Acceptance and User Experience
Overview
Quantitative results showed high user satisfaction, aligning with findings in women with genito-pelvic pain/penetration disorder (CSQ-I mean 28.03, SD 3.96; 67/72, 93% satisfaction) [] and supported by qualitative reports of positive experiences. Similar satisfaction levels have been reported in other digital interventions across different patient populations using the CSQ-I [,]. Usability (G-MAUQ) ratings were particularly high for ease of use, likely reflecting the patient-centered development process (E Kosman, MSc, et al, unpublished data, 2026) and the app’s reduced functions, with scores comparable to other digital interventions [,]. Perceived usefulness was slightly lower in our study, which may have contributed to attrition.
Content Overload and Usability as Barriers
Content overload, reflected in the high time demands of the modules and the emotional strain of sustained self-reflection [], highlighted the challenge of balancing comprehensive content with practical feasibility for participants. Similar findings show that time-intensive digital interventions hinder engagement in this population []. Moreover, even seemingly minor usability issues can significantly affect user experience and sustained engagement [,,].
Guidance
Interview participants frequently requested more clinician and peer contact—consistent with evidence that human support drives engagement []—and reported difficulties applying content and feeling insufficiently guided. This underscores the potential value of integrating human support into digital tools for complex chronic conditions. Interventions including personal guidance achieved better adherence [,] compared with our fully self-directed approach. Blended care models may enhance both engagement and acceptance by providing structured support alongside digital self-management [].
Willingness to Pay
Despite high user satisfaction, willingness to pay for a comparable DiGA was limited. Participants expressed generally low willingness to pay, substantially lower than prices of comparable DiGAs in Germany [,], and many felt that such tools should be financed by the health care system rather than by users themselves. Similar patterns have been reported, linking reluctance to pay to expectations of public coverage, skepticism toward digital tools, and limited awareness of benefits [].
Safety
None of the health and life changes was attributable to the intervention. Comparisons with other digital interventions are limited, as adverse events are rarely assessed or reported in German DiGAs []. Our findings accord with a recent meta-analysis indicating that mental health apps rarely cause harm and do not increase adverse events versus controls [].
Changes in Sexual and Overall Health
Although not powered for clinical end points, the trial provided exploratory indications of improvements, with small to large effects based on between-group mean differences. Our findings are consistent with evidence that both face-to-face and online interventions can reduce sexual distress and improve sexual function and satisfaction [,,], that psychoeducation increases knowledge and reduces performance anxiety [], and that mindfulness improves desire, arousal, and satisfaction while mitigating sexual distress [,]. An online self-help trial for women with dyspareunia reported medium to large improvements in genital pain and penetration-related cognitions and small to medium improvements in sexual function, anxiety, and well-being []. A psychoeducational web-based program for cancer survivors improved sexual communication but showed limited impact on relationship outcomes []. Consistent with this pattern [,,,], our trial also did not affect relationship satisfaction. Qualitative data aligned with established mechanisms [,]: participants described sensate focus and self-stimulation exercises as helpful for improving desire, and cognitive restructuring for reducing maladaptive sexual beliefs. These processes were reflected quantitatively by reduced incompatibility cognitions in the IG. However, it should be noted that some between-group differences may reflect random error rather than genuine effects.
Consistent with prior research [], control participants also improved in health (eg, depression, anxiety, and fatigue) and sexual distress. However, at follow-up, reductions in sexual distress persisted strongly only in the IG. Qualitative data suggest that repeated self-assessment may have acted as a catalyst for self-reflection and proactive coping, a Hawthorne-like effect []. Additional mechanisms include mere-measurement effects [], hope and expectancy [], natural recovery [,], and enhanced self-monitoring []. Despite these effects, CG participants continued to desire intervention access, indicating ongoing unmet needs.
Implications for Future Digital Interventions
Despite the high dropout rate, satisfaction metrics in the IG indicated that engaged participants found the program helpful. Thus, content appears valuable, whereas delivery and support structures may require optimization. Following qualitative feedback indicating a desire for more guidance and professional contact, we propose key design considerations for future digital interventions ().
Textbox 1. Key design considerations to enhance engagement, adherence, and user experience in future digital sexual health interventions targeting individuals with endometriosis
Hybrid approaches: incorporating human support elements—and, where appropriate, AI-driven assistance—such as periodic check-ins with health care providers, trained coaches, or moderated peer communities, may enhance engagement, adherence, and outcomes [-].
Tailored content delivery: reducing the time burden and emotional intensity of modules while maintaining therapeutic effectiveness could improve completion rates. Flexible options—such as individually selectable modules, exercises, or notifications and supplementary materials (eg, workbooks)—may better address diverse user needs [,].
Technical optimization: addressing usability issues and ensuring a seamless user experience—potentially through the integration of gamification features—is fundamental to sustaining user engagement [].
Screening for intervention readiness: given the relationship between baseline psychological distress and dropout, screening for intervention readiness and providing additional support for participants at high risk of dropout may improve outcomes [].
Patient involvement: beyond developing the intervention through a patient-centered approach, we recommend involving patients or patient influencers in the go-to-market phase to ensure effective reach and trust [,-].
Integrating the Relational Dimension
Beyond the structural and delivery-related design considerations outlined above, the relational dimension constitutes a further critical aspect of future digital sexual health interventions. Our findings address a recently identified structural limitation in current DiGAs for sexual dysfunctions: the insufficient consideration of the relationship dimension. While existing DiGAs primarily address individual functional parameters and measure success through standardized scores such as the International Index of Erectile Function‐5, sexual dysfunctions are frequently embedded in relational components that require dyadic intervention approaches [].
Future Research
Future research should test efficacy for core sexual health outcomes, particularly sexual distress, in adequately powered randomized controlled trials and, in larger samples, identify subgroups most likely to benefit. Incorporating participants’ recommendations—such as integrating the program into blended-care models with chat support or video consultations []—may enhance value and warrant evaluation. Longer, more flexible trials should also examine motivational drivers and dropout trajectories, as a 4-week inactivity threshold may be overly restrictive (eg, for women recovering from endometriosis surgery). Moreover, future iterations could include both optional and mandatory partner components addressing partners directly (eg, psychoeducation modules on endometriosis and its relational impact).
Limitations
This study has several limitations. First, several methodological aspects limit the interpretability of findings. The trial was not powered to detect efficacy or between-group differences, constraining comparative analyses. Additionally, although aspects of preuse acceptability were addressed during development, they were not systematically assessed in this study. Without a dedicated prepilot acceptability study, some usability issues (eg, module length and navigation) may not have been identified beforehand. Finally, combining assessments of acceptability and preliminary effectiveness within a single early-phase trial may have constrained sampling strategies and methodological specificity. Second, dropout rates were high—particularly in the IG—resulting in substantially reduced sample sizes for both quantitative and qualitative components. This further restricts within-group analyses and impedes meaningful comparisons between the IG and CG. Participants who discontinued showed higher baseline symptom severity (eg, anxiety and depression), potentially biasing findings toward more favorable outcomes. Third, satisfaction outcomes may be influenced by recruitment and measurement constraints. Because the recruitment strategy relied primarily on online channels, the sample may overrepresent women with higher digital literacy or a stronger preference for online support formats. Satisfaction was also only assessed among women who completed the intervention due to predefined measurement timing, likely inflating acceptability ratings. Fourth, sample characteristics and condition-specific focus limit generalizability. The sample covered a broad age range (up to 59 years), representing heterogeneous life stages (eg, perimenopause and menopause) with potentially distinct biopsychosocial profiles. Participants were also highly educated and predominantly partnered, which is not representative of the wider population. Moreover, although the intervention was developed through a patient-centered approach with women affected by different gynecological conditions (E Kosman, MSc, et al, unpublished data, 2026), it was only tested in women with endometriosis or adenomyosis, further limiting generalizability. Fifth, while the intervention included optional partner-focused components, systematic involvement of partners was limited, and dyadic processes were not comprehensively addressed. This represents a conceptual limitation, given that sexual health in the context of chronic conditions such as endometriosis is inherently relational. Finally, a coding error prevented proper attribution of INEP-ON items, obscuring whether reported positive or negative changes were related to life circumstances or the intervention itself.
Conclusions
The digital intervention Odeya appeared acceptable and safe for women who engaged with the app, with initial indications of improvements in sexual health–related outcomes among completers. Sustained engagement in fully self-guided formats, however, appears to vary by baseline psychosocial status, with individuals reporting better psychosocial health showing higher adherence. Adequately powered trials should establish efficacy, identify moderators of benefit, and determine the support intensity needed to maximize impact. Self-guided digital interventions may present one accessible and scalable component within stepped-care models for sexual health, although some users are likely to require additional guidance and personalization.
We sincerely thank all patients who participated in the study and the qualitative development process. We also acknowledge the support of the Digital Health Accelerator Program of the Berlin Institute of Health at Charité–Universitätsmedizin Berlin, which enabled the collaboration with Hybrid Heroes for the development of the app-based intervention.
The authors declare the use of generative artificial intelligence (GenAI) in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GenAI tools under full human supervision: Literature search and systematization; Code generation; Code optimization;Proofreading and editing; Summarizing text; Adapting and adjusting emotional tone; Translation; Reformatting; Quality assessment; Identification of limitations; Recommendations; and Publication support.
The GenAI tools used were ChatGPT (GPT-5.1), Claude (Claude Sonnet 4.5), and Perplexity AI. Responsibility for the final manuscript lies entirely with the authors. GenAI tools are not listed as authors and do not bear responsibility for the final outcomes.
This work was supported by the Digital Health Accelerator Program of the Berlin Institute of Health, which also enabled the development of the app-based intervention in collaboration with Hybrid Heroes. The funder had no role in the study design, data collection, analysis, interpretation, or the decision to submit the manuscript for publication.
The datasets analyzed during this study are not publicly available due to restrictions imposed by the ethics approval.
None declared.
Edited by A Stone; submitted 26.Oct.2025; peer-reviewed by G Cabagno; comments to author 20.Nov.2025; revised version received 27.Dec.2025; accepted 31.Dec.2025; published 19.Feb.2026.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
UK Data Privacy and Cybersecurity Outlook for 2026: What Financial Services Firms Need To Know | Data Matters Privacy Blog
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
The COVID-19 pandemic presented an urgent need to transition mental health services from in-person to virtual delivery, leading to research on feasibility and effectiveness. It also raised questions about the digital divide, referring to disparities between communities in access to technology, and the implications for structurally marginalized groups already facing barriers to accessing care []. Like many services, early psychosis intervention (EPI) programs rapidly shifted to virtual care during the COVID-19 pandemic, with the hope of maintaining access to and engagement in services. This was particularly crucial, given that approximately half of youth with psychosis do not access treatment and one-third disengage from EPI services early [-], despite the emphasis on identification and treatment early in the course of illness.
Research suggests that barriers preventing youth from accessing EPI treatment generally include stigma and a lack of knowledge about psychosis and where to seek treatment [-], with youth often viewing pathways to EPI care as complex []. Families can represent key sources of support throughout the help-seeking process and can help maintain engagement with services, particularly given the typical age of onset for psychosis [,]. Unfortunately, pathways to EPI treatment often begin with referrals from acute services, including emergency departments (EDs) or inpatient units [,], which also face barriers to ensuring timely outpatient follow-up. These include patient-level barriers, such as transportation or financial issues, and health systems–level issues, such as coordination between the ED and outpatient services, insufficient funding, and political disinterest [,]. Building partnerships between EPI and external services, including schools and shelters, can facilitate early detection, increase referrals from nonacute pathways, and improve access to vulnerable patients [,].
Emerging evidence suggests that virtual EPI care is well received by youth with psychosis. In recent cross-sectional studies, youth expressed satisfaction with the virtual delivery of EPI services and found it comparable to in-person treatment [], highlighting the convenience, ease of use, accessibility, and comfort of virtual care []. Virtual care can also be conducive to a client-centered approach, bolstering youth autonomy and decision-making power over care []. While many youth have acknowledged the value of virtual care, some have reported feeling more isolated and disconnected from clinicians, finding it more difficult to express themselves during virtual appointments []. Other, more practical challenges include technological difficulties and privacy/confidentiality concerns. The extent to which the benefits of virtual care extend to improvements in initial EPI attendance is unclear.
Using service use data from EPI programs in the United States and Australia, 3 studies examined attendance rates before and after the implementation of virtual care, yielding inconsistent results. One study found an increase in missed appointments after virtual care implementation compared with pre–virtual care (13.3% and 7%, respectively) [], whereas another found a 5% increase in attendance at EPI appointments following virtual care implementation []. A more recent study examined the ranges of missed appointments before and after virtual care implementation and found greater variability in the percentage of missed appointments in the post–virtual care (2.7%‐9%) than pre–virtual care implementation (2.8%‐6.4%) []. Another study examining EPI services before and after the pandemic noted a significant increase in video appointments offered postpandemic; however, no significant difference in video appointments attended []. Given these conflicting findings, more research is needed to evaluate the impact of virtual care implementation on EPI appointment attendance and identify factors that may lead to nonattendance.
We recently completed a study that examined factors associated with attendance at the EPI consultation appointment from 2018 to 2019, when this appointment was offered exclusively in person []. We found that older patients and those referred from the ED were less likely to attend the consultation appointment. Additionally, older patients and those who identified as Black or belonging to other racial and ethnic groups were more likely to be referred to EPI services from the ED compared with White patients, indicating that structurally marginalized groups are facing barriers accessing care early in the course of illness (before it becomes an emergency) []. While the digital divide and disparities in access to technology may be expected to increase barriers to care [], it remains unclear if the aforementioned health equity and service use factors continue to affect access to EPI care when services are delivered virtually.
Using retrospective data from a large EPI program, we examined factors associated with attendance at the first EPI consultation appointment following the transition to most of these being scheduled virtually, with a focus on self-reported health equity and service use factors. Initial engagement was defined as attendance at the consultation appointment. We hypothesized that patients who were older, referred from the ED, or identified as Black would be less likely to attend the consultation appointment when it was delivered virtually [].
Methods
Setting
The Centre for Addiction and Mental Health (CAMH) EPI program is the largest in Canada, serving downtown Toronto, a large urban center. The program provides consultation and 3 years of coordinated specialty care for people up to 29 years of age with affective, nonaffective, and substance-induced psychosis. The program receives referrals from the ED, CAMH inpatient units, CAMH outpatient psychiatrists, or externally through primary care providers or external inpatient and outpatient psychiatrists. The ED also houses a “bridging” clinic, where they may triage patients who are determined to be at lower acuity levels during business hours; it is also used to provide short-term follow-up after an inpatient discharge. External referrals are triaged by nurses to appropriate services. The program aims to offer consultation appointments within 2 weeks [].
Study Design and Population
This was a retrospective cohort study that followed Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines for cohort studies []. We replicated methods from our previous paper examining initial engagement with in-person services []. Electronic medical record (EMR) data were obtained for all patients aged 16 to 29 years who were referred to the CAMH EPI program from April 2020 to December 2020, the period immediately following virtual care implementation. We classified consultation appointments between 2018 and 2019 as “pre–virtual care,” as they were offered exclusively in-person during this period. Appointments between April and December 2020 were classified as “post–virtual care,” as the vast majority were conducted virtually, consistent with institutional practices and jurisdictional restrictions at the time. We focused on this specific period because in-person consultations resumed during subsequent phases of the pandemic. Referral data were excluded from the analysis if the referral was canceled for any reason and for patients who were previously enrolled in the CAMH EPI program. Missing data were manually extracted from EMR charts when possible, and data from second and third referrals were excluded from the analysis. Statistical comparisons were made with 2018 to 2019 data prior to virtual care implementation [].
Variables and Missing Data
The primary outcome was rate of attendance at the first consultation appointment. Patients were coded as having not attended if they declined services, could not be reached to schedule their appointment, or did not attend their scheduled appointment. Demographic variables were self-reported by patients through CAMH’s standardized health equity form that is routinely completed around the time of their first appointment and during ED visits and inpatient admissions. These variables included age, gender, racial and ethnic group (“other racial and ethnic groups” included Indigenous, Latin American, Middle Eastern, and other not specified), country of birth, and sexual orientation. Low prevalence categories were combined for data cells of fewer than 5 people to protect the privacy of participants and facilitate statistical comparisons. Recoded variables are outlined in . Service use factors were derived from clinical documentation: referral source (inpatient, ED or bridging clinic, and those from outpatient psychiatrists, primary care providers, or other external providers) and days to consult (calculated as the number of days from referral to consultation appointment). We retained “do not know” or “prefer not to answer” responses in the analysis because we considered them valid responses to self-reported health equity questions. Nonresponse data were categorized as missing, and data were missing for three variables (25.9%-27.6%). We performed Little’s MCAR test, which indicated that data were MCAR (χ213=11.09, P=.60). However, we did not impute missing data because it is routinely completed by patients around the time of their first appointment and during ED visits and inpatient admissions. To minimize potential sources of bias, we applied consistent inclusion and exclusion criteria and used standardized EMR data fields to minimize misclassification.
Statistical Analysis
Descriptive statistics were calculated, including means, SDs, and percentages, to describe the demographic and service use characteristics of the full sample. We used chi-square tests for categorical variables and independent t tests for continuous variables to compare service use pre– versus post–virtual care. Effect sizes were calculated using Cohen d for continuous variables and Phi (2 × 2 tables) and Cramér V (2 × 3 tables) for categorical variables. We used binary logistic regression to model the odds of attendance at the consultation appointment, controlling for demographic (ie, age, gender, racial and ethnic group, sexual orientation, and country of birth) and service use (ie, referral source and days to consult) factors. Simulation work by Vittinghoff and McCulloch (2007) demonstrated that logistic regression can yield reliable estimates with 5 to 9 events-per-variable (EPV) under commonly encountered modeling conditions (ie, when covariates are not extremely sparse, effect sizes are moderate, and covariates are not highly collinear). Our data satisfy all these characteristics, and the candidate covariates were prespecified based on theoretical and clinical consideration []. Univariable tests were performed and variables statistically significant at an a priori level of P<.20 were included in the multivariable model. Backward stepwise selection was used to determine the final adjusted model by removing variables with a P≥.20. This criterion was used as a heuristic to safeguard against type II error during model development, following recommendations suggesting that conservative thresholds (eg, .05 or .10) may prematurely exclude important variables []. However, we retained age and gender in the adjusted model, regardless of statistical significance, due to their theoretical and clinical importance. A 2-sided P<.05 was considered statistically significant. All statistical analyses were performed using Stata BE (version 17.0; StataCorp) [].
Ethical Considerations
This study was approved by the Research Ethics Board at CAMH (#060/2020‐01). Informed consent was not required by CAMH’s Research Ethics Board, as retrospective deidentified EMR data were used for the analysis and posed minimal risk. Data were deidentified and additional efforts were taken to protect the privacy of participants, including combining data cells of fewer than 5 people for low prevalence categories. There was no compensation for participants in this study, as it was an analysis of retrospective EMR data. We have taken efforts to ensure that no identification of individual participants would be possible via the data in the results or supplementary material. There were no images included in the manuscript that could lead to identification of participants.
Results
Participant Characteristics
CAMH’s EPI program received 383 patient referrals between April 2020 and December 2020. Referral data that did not meet study eligibility were excluded (70/383, 18.3%), and second and third referrals were filtered out of the analysis (12/313, 3.8%), leaving 301 unique patient referrals. displays the demographic and service use characteristics of patients. Patients had a mean age of 23.2 (SD 3.3) years, 71.1% (214/301) identified as male, 29.2% (88/301) identified as White, 40.2% (121/301) identified as heterosexual, and 46.2% (139/301) were born in Canada.
Table 1. Characteristics of patients referred to the Centre for Addiction and Mental Health early psychosis intervention program in 2018‐2019 and 2020.
Characteristics
All referrals
Pre–virtual care, 2018‐2019 (n=999)
Post–virtual care, 2020 (n=301)
Age (y), mean (SD)
22.5 (3.5)
23.2 (3.3)
Age (y), median (IQR)
22 (20‐25)
23 (21‐26)
Gender, n (%)
Male
654 (65.5)
214 (71.1)
Female
323 (32.3)
81 (26.9)
Trans, nonbinary, two-spirit, other, or prefer not to answer
22 (2.2)
6 (2.0)
Racial and ethnic group, n (%)
Asian
199 (19.9)
36 (12.0)
Black
176 (17.6)
55 (18.3)
White
384 (38.4)
88 (29.2)
Other racial and ethnic groups
143 (14.3)
28 (9.3)
Do not know or prefer not to answer
24 (2.4)
16 (5.3)
Missing
73 (7.3)
78 (25.9)
Sexual orientation, n (%)
Heterosexual
667 (66.8)
121 (40.2)
LGBTQ2S+,
159 (15.9)
41 (13.6)
Do not know or prefer not to answer
54 (5.4)
56 (18.6)
Missing
119 (11.9)
83 (27.6)
Born in Canada, n (%)
No
294 (29.4)
69 (22.9)
Yes
606 (60.7)
139 (46.2)
Do not know or prefer not to answer
26 (2.6)
15 (5.0)
Missing
73 (7.3)
78 (25.9)
Referral source, n (%)
Outpatient psychiatrists, PCPs, or other external providers
525 (52.6)
122 (40.5)
ED/Bridging
217 (21.7)
65 (21.6)
Inpatient
257 (25.7)
114 (37.9)
Days to consult, mean (SD)
18.3 (13.9)
12.6 (8.9)
Days to consult, median (IQR)
15 (10-22)
11 (7-15)
Monthly referrals, mean (SD)
44.2 (9.9)
34 (4.4)
aPre–virtual care data from Polillo et al [].
b2018‐2019 data includes two-spirit and prefer not to answer.
cOther racial and ethnic groups include Indigenous, Latin American, Middle Eastern, and other not specified.
dLGBTQ2S+: lesbian, gay, bisexual, trans, queer (or sometimes questioning), and two-spirited.
e2018‐2019 data includes two-spirit and 2020 data includes prefer not to answer.
fPCP: primary care provider.
gED: emergency department.
hIncludes data only for participants who booked a consultation appointment (n=280).
Service Use
Service use characteristics are displayed in and . Approximately one-fifth of patients (65/301, 21.6%) were referred from the ED or bridging clinic, 37.9% (114/301) from inpatient units, and 40.5% (122/301) from other referral sources. There were significantly higher rates of inpatient referral (114/301, 37.9%) and lower rates of referral from outpatient and other providers (122/301, 40.5%) post–virtual care compared with pre–virtual care (χ22=18.7, P<.001), with a small effect size and moderately narrow CI (Cramér V=0.120, 95% CI 0.06 to 0.17). The mean number of days from referral to consultation appointment was 12.6 (SD 8.9), which was significantly lower (t1149=6.44, 95% CI 3.95 to 7.41; P<.001) than wait times pre–virtual care (mean 18.3, SD 13.9), with a moderate effect size and narrow CI (Cohen d=0.44, 95% CI 0.31 to 0.58). The mean number of monthly referrals post–virtual care (mean 34, SD 4.4) was significantly lower (t1298=−17.3, 95% CI −10.9 to −8.75; P<.001) than pre–virtual care (mean 44.2, SD 9.9), with a large effect size and a narrow CI (Cohen d=−1.14, 95% CI −1.27 to −1.00). Overall, 84.1% (253/301) of patients attended their consultation appointment post–virtual care, which was significantly higher than attendance pre–virtual care (770/999, 77.1%); however, it was a small effect with a moderately wide CI (χ21=6.71, φ=0.072, 95% CI 0.02 to 0.12; P=.01). Post–virtual care, patients referred from the ED or bridging clinic had the highest rate of nonattendance (18/65, 27.7%) at the consultation appointment compared with those referred from inpatient units (19/114, 16.7%) or other providers (11/122, 9%).
Table 2. Rates of attendance at early psychosis intervention consultation appointments at the Centre for Addiction and Mental Health by referral source in 2018‐2019 and 2020.
Referral source
Outcome of referral
Pre–virtual care, 2018-2019
Post–virtual care, 2020
Patients, n
Attended consult, n (%)
Did not attend consult, n (%)
Patients, n
Attended consult, n (%)
Did not attend consult, n (%)
All referral sources
999
770 (77.1)
229 (22.9)
301
253 (84.1)
48 (15.9)
Inpatient
257
215 (83.7)
42 (16.3)
114
95 (83.3)
19 (16.7)
ED/Bridging
217
145 (66.8)
72 (33.2)
65
47 (72.3)
18 (27.7)
Outpatient psychiatrists, PCPs, or other external providers
525
410 (78.1)
115 (21.9)
122
111 (91.0)
11 (9.0)
aPre–virtual care data from Polillo et al [].
bNonattendance at consult includes those who declined services, could not be reached for booking, or booked and did not attend.
cED: emergency department.
dPCP: primary care provider.
Factors Associated With Attendance at Consultation Appointment
Following univariable tests and stepwise backward selection, identifying as Black (odds ratio 0.45, 95% CI 0.21 to 0.97) and being referred from the ED or bridging clinic (odds ratio 0.24, 95% CI 0.08 to 0.72) were associated with decreased odds of attendance at the consultation appointment in the final adjusted model ().
Table 3. Logistic regression analysis of factors associated with attendance at early psychosis intervention consultation appointment at the Centre for Addiction and Mental Health during virtual care delivery in 2020 (n=301).
Variable
Attendance at EPI consultation appointment
Univariable
Multivariable
OR (95% CI)
P value
OR (95% CI)
P value
Age (y)
0.94 (0.85‐1.03)
.18
0.91 (0.80‐1.02)
.11
Gender
Male
Reference
—
Reference
—
Female, trans, nonbinary, or other
0.98 (0.50‐1.94)
.96
1.24 (0.51‐3.00)
.64
Racial and ethnic group
Asian
0.65 (0.24‐1.82)
.42
—
—
Black
0.42 (0.18‐0.99)
.046
0.45 (0.21‐0.97)
.04
White
Reference
—
Reference
—
Other racial and ethnic groups, do not know, or prefer not to answer,
1.58 (0.48‐5.21)
.45
—
—
Sexual orientation
Heterosexual
Reference
—
Reference
—
LGBTQ2S+, do not know, or prefer not to answer
1.39 (0.67‐2.88)
.37
—
—
Born in Canada
Yes
Reference
—
Reference
—
No, do not know, or prefer not to answer
0.91 (0.45‐1.87)
.80
—
—
Referral source
Outpatient psychiatrists, PCPs, or other external providers
Reference
—
Reference
—
ED/Bridging
0.26 (0.11‐0.59)
.001
0.24 (0.08‐0.72)
.01
Inpatient
0.50 (0.22‐1.09)
.08
0.48 (0.17‐1.41)
.18
Days to consult
1.04 (0.98‐1.10)
.22
—
—
aEPI: early psychosis intervention.
bOR: odds ratio
cNot applicable
dVariables removed from the adjusted model through backward stepwise selection.
eOther racial and ethnic groups include Indigenous, Latin American, Middle Eastern, and other not specified.
fLGBTQ2S+: lesbian, gay, bisexual, trans, queer (or sometimes questioning), and two-spirited.
gPCP: primary care provider
hED: emergency department
Discussion
Principal Findings
Using retrospective cohort data from 301 patients, we found that patients referred to EPI services after the transition to virtual care had higher rates of attendance at the consultation appointment and encountered shorter wait times compared to those referred pre–virtual care. We also found higher rates of inpatient referral and lower rates of referral from outpatient and other providers post–virtual care compared to pre–virtual care. Approximately one-third of patients did not attend their consultation appointment; ED or bridging clinic referrals had the highest rate of nonattendance at the consultation appointment compared with other referral sources. Equity-related and service use factors, notably identifying as Black and being referred from the ED or bridging clinic, were associated with decreased odds of attending the consultation appointment when it was mostly delivered virtually. These findings suggest that virtual care can improve initial engagement in EPI services; however, patients from structurally marginalized groups and those referred from acute sources still face barriers to care, even when appointments are delivered virtually.
Comparison With Prior Work
We observed a 7% increase in attendance at the EPI consultation appointment following the implementation of virtual care, which is higher than the 5% increase found across all appointments in a prior study []. This difference is likely attributable to the focus on the consultation appointment (rather than all appointments). Findings from both studies, when taken together, lend support to virtual care as a tool for improving initial engagement in EPI services, as well as throughout treatment; however, it is important to acknowledge that other factors, such as the COVID-19 lockdown which restricted nonessential outings and may have made youth more available to attend their appointments, may have influenced appointment attendance. We also observed a decrease in wait times from EPI referral to the consultation appointment post–virtual care compared to pre–virtual care, consistent with emerging evidence that virtual care may allow for increased efficiency and streamlined processes []. Although the association of shorter wait times with virtual care is an important finding, it is also important to note that monthly referrals to EPI were significantly lower post–virtual care compared with pre–virtual care, which, in turn, could be influencing the shorter wait times. Nevertheless, longer wait times for EPI services have been associated with worse patient outcomes, not only during the initial period of untreated psychosis but also later in treatment [].
Our results also align with existing research on disparities in virtual care access. Patients referred from the ED or bridging clinic had decreased odds of attending the consultation appointment post–virtual care, consistent with our prior work showing lower attendance rates at in-person initial appointments for those referred from the ED or bridging clinic []. Individuals seeking care from acute services may experience greater symptom burden, such as disorganization, paranoia, and a fear of surveillance, which may affect the utilization of virtual care. For example, patients may be too disorganized to navigate connecting online at the prescribed time []. Additionally, these patients may be more likely to experience socioeconomic disadvantage, which is linked to increased ED usage, decreased engagement with outpatient mental health care [], and digital equity barriers. In contrast to findings from our previous study examining in-person attendance [], we also found that Black patients had a decreased odds of attending the consultation appointment post–virtual care, suggesting that virtual care may amplify health inequities through the digital divide. This is consistent with other research reporting that Black patients were less likely to use telehealth services compared to White patients [,], likely due to systemic barriers that can limit access to devices, the internet, or digital literacy skills [,]. This is layered upon a long history of racism, oppression, and stigmatization of Black people by the medical system, and ongoing inequities, including a greater risk of police involvement and involuntary hospitalization for Black youth experiencing psychosis []. Overcoming systemic barriers to psychosis care for Black patients will require systemic solutions, such as increasing the racial and ethnic diversity of clinicians, ensuring a trauma-informed and anti-racist approach to services, and including lived experience perspectives of Black patients and their families in EPI evaluation [].
Implications
These findings have several implications for the design and delivery of EPI services in a postpandemic context. The use of a hybrid model in EPI, integrating virtual care with in-person appointments, can be a way to engage patients in treatment in a way that meets their needs while still maintaining the benefits of an in-person therapeutic relationship. Global surveys of clinicians and clinical leaders show strong support for hybrid models, though challenges exist around infrastructure, training, and digital literacy []. Mental health care organizations, including EPI programs, should consider building up technological infrastructure and capacity to support clinicians’ ability to offer hybrid models of care, although we acknowledge that this may be challenging for more low-resourced settings. Additionally, our results showed a shift in referral patterns post–virtual care, with increased inpatient and decreased outpatient referrals. This may reflect delays in early detection of psychosis due to school closures or reduced access to primary care and other outpatient providers. It is clear that outpatient providers represent irreplaceable sources for early detection and EPI access. Leveraging hybrid care models can improve connection to outpatient providers, though consideration of systemic solutions to reduce inequities in digital access is required.
Limitations
This study has several limitations. Due to the wide CIs in the final adjusted model, the findings should be interpreted with caution. Variables, including race/ethnicity, sexual orientation, and country of birth, had more than 10% of missingness, which was not imputed due to it being self-reported health equity data. There were increases in the percentage of missing data for certain demographic categories from pre–virtual care to post–virtual care, including racial and ethnic group and sexual orientation. These changes highlight potential challenges with the reliability of self-reported data, particularly in the virtual care context. While the overall missingness is reported as more than 10%, the variability in the self-reported health equity data between the two timepoints may affect the accuracy and generalizability of the findings. Future research should explore methods to improve the completeness and reliability of self-reported data within virtual settings. The time period of the study was also a limitation, as it was during the COVID-19 pandemic and may not be representative of virtual care in real-world practice. Thus, findings may be partially related to pandemic effects rather than the transition to virtual care. The use of data from 2020 may also limit the relevance of our findings, given changes in healthcare delivery and policy over time []. We were not able to include other relevant patient- and system-level factors, such as socioeconomic status and illness severity, as covariates in the model. These factors may better explain the associations observed in our final adjusted model. For example, lower referral volumes and treatment delays may have led to patients presenting to care when they were more ill and had a higher symptom burden []. As such, not having specific data on illness severity limits our ability to understand its potential impact on EPI attendance and engagement and should be an area for future investigation. It is also important to note that patients referred from the ED may differ from those referred through outpatient and inpatient settings [], limiting the generalizability of our findings across referral sources. Another limitation is that the study setting was a well-resourced hospital with infrastructure and technological support dedicated to virtual care implementation, the findings may not be generalizable to lower-resourced mental health settings.
Conclusions
This retrospective cohort study of patients referred to EPI services is innovative in that it examines the self-reported health equity and service use factors that may contribute to nonattendance when most EPI appointments are delivered virtually. This is unlike previous studies, which solely focused on differences in attendance rates. Although the time period of the study is a limitation, as it was during the COVID-19 pandemic and may not be representative of virtual care in real-world practice, we found improved attendance rates at the first EPI consultation appointment and shorter wait times, suggesting that virtual care may improve initial engagement in EPI services. Despite this, barriers to care still exist for Black patients and those referred from the ED. The use of a hybrid model can be a way to improve connection to EPI, though targeted approaches are needed to improve the digital divide and ensure that structurally marginalized and high-acuity patients have equitable access to EPI care. Future research should examine virtual engagement in EPI services in a postpandemic context.
There was no use of generative artificial intelligence (AI) technology in the generation of text, figures, or other informational content of this manuscript.
This work was supported by the University of Toronto Department of Psychiatry’s Reasons for Hope Fund and the Canadian Institutes of Health Research.
The data that supports the findings of this study are available from the corresponding author, AP, upon reasonable request to protect the privacy and confidentiality of participants.
Methodology: AP (lead), NK (equal), AHCW (support), GF (support), WW (support), AV (support)
None declared.
Edited by Stefano Brini; submitted 29.Jul.2025; peer-reviewed by Donald Hilty, Lavlin Agrawal; final revised version received 01.Dec.2025; accepted 04.Dec.2025; published 19.Feb.2026.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.