We reported this study according to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline [14]. This study was registered on the Open Science Framework (https://osf.io/k6etb).
Search strategy and eligibility criteria
We first identified the top 10 medical general journals that frequently published clinical trials and were ranked by journal impact factor in the category of “Medicine, General & Internal” based on Journal Citation Reports (as of June 2023). After excluding those journals that primarily focused on basic science or published less than 10 clinical trials annually, a total of 6 journals were chosen, including The Lancet, The New England Journal of Medicine (New Engl J Med), Journal of the American Medical Association (JAMA), British Medical Journal (BMJ), JAMA Internal Medicine, and Annals of Internal Medicine (Ann Intern Med). Subsequently, we searched MEDLINE (via PubMed) to systematically retrieve clinical trials published in these journals (Additional file 1: Table S1 shows the search strategy used). Given that ICMJE required a data sharing plan in trial registration from Jan 2019 onwards, we only included trials that started participant enrollment on or after Jan 1, 2019, and trial publications published between Jan 1, 2021, and Dec 31, 2023, to allow sufficient time and samples of trials for evaluation. We included clinical trial publications with primary results; methods papers, publications of secondary results, relevant reviews, commentaries, perspectives, or editorials were excluded. The detailed selection process is presented in Additional file 1: Fig. S1.
If the same trial was registered on different platforms, we only extracted and analyzed information from ClinicalTrials.gov. Some publications may pool different trials for analysis, thereby having ≥ two registration identifiers (IDs). We treated such publications as different trials by their corresponding registration IDs for data extraction and analysis; i.e., each registration ID represented an individual trial. Some publications with updated data may share the same registration ID with prior publications; in this case, the most recent publication was kept for analysis to avoid double counting.
Study outcomes
The outcomes were the inclusion of a plan to share data in the trial registration and the concordance between registered and published plans to share data.
Trials that clearly responded with a “Yes” to “Plan to share” in registration were considered as planning to share data, while those reporting with a “No” were considered as not planning to share data. In this study, the data that were planned to share included study protocols, statistical analysis plans, analytic codes, and IPD, where trials reporting a plan to share any of these data were considered to “plan to share data” in registration. We searched trial registration platforms based on registration IDs to determine the stated plan to share data. If a trial had multiple registration records, we used the latest registration record before the trial publication was published. On the registration platform, all information on the data sharing plan description was extracted, including plans to share IPD and supporting information (study protocols, statistical analysis plans, and analytic codes). If the question “Plan to share IPD” was left blank or answered “Undecided,” responses were pooled as “undecided/missing”.
We further explored the data sharing concordance between registered and published plans to share. From data sharing statements in trial publications, trials that clearly stated a willingness to share data were defined as published plans to share data. We also treated trials as having published plans to share data if a link to a data repository was provided, even if the shared data were accessible only after the user registered and signed a data use agreement. Trials that were unwilling to share data or did not report/obtain data sharing statements were considered not to have a published plan to share. Subsequently, data sharing concordance was assessed: (1) plan to share data in registration and publication (both Yes in registration and publication, i.e., “Yes/Yes”) and (2) no plan to share data in registration and publication (both No in registration and publication, i.e., “No/No”). Discordance between registered and published plans to share data included (1) plan to share data in registration but no plan to share data in the publication (Yes in registration but No in publication, i.e., “Yes/No”) and (2) no plan to share in registration but a plan to share in publication (No in registration but Yes in publication, i.e., “No/Yes”). Seven trial publications pooled two trials/registration IDs in which case the registration IDs with a later study start date were used to assess data sharing concordance.
We also assessed the details of the data sharing plans, which are elaborated in registration platforms and statements in trial publications. Therefore, the specific information extracted included the following: (1) data sharing content (analytic code, statistical analysis plan or study protocol, IPD), (2) data access time after publication or trial completion (< 12 months, ≥ 12 months, unclear), and (3) data access method (public, private, unclear). If trial authors clearly stated that the shared data would be publicly available, we considered the trials to have a public data access method. If the shared data were only available from trial authors, funders, or trial review committees after review, trials were grouped to have a private data access method. Trials were categorized as having an unclear data access method if no relevant details were provided regarding how the trial authors would share data.
Data extraction
Data extraction and coding were completed independently by four study authors in pairs (J. Z. and X. B., Y.L., and G. L.). Any disagreement was resolved by discussion between the study authors and, if no consensus could be reached, resolved through consultation with the senior author (D. M.).
Data on trial characteristics from registration platforms were extracted, including whether the trial was multicenter, country of origin, design information (with or without control group, parallel or crossover, with or without randomization), trial phase (1–2, or 3–4), planned sample size, intervention type (drug or other, where “drug” included both drug alone or drug in combination with non-drug), whether the trial was COVID-19-related, and funding source (industry or other, where “industry” included industry alone or the combination of industry and non-industry funder) [15]. For those that did not report a trial phase, they were classified as phase 3–4 if they planned to enroll ≥ 400 participants and grouped as phase 1–2 if the planned sample size was < 400 [10, 16].
We predefined data extraction from trial publications, where the extracted data included the year of publication, whether trial publication mentioned authors’ conflict of interest (yes or no), and the risk of bias (ROB). If the trial publication mentioned authors’ conflict of interest, we further categorized the conflict of interest as either financial, non-financial, or both [17]. We did not aim to assess the ROB for each outcome of the included trials; therefore, the ROB 1.0 tool was used to evaluate the overall ROB for individual trials [18]. A trial was grouped as having high ROB if at least one domain (random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, and others) was rated as high ROB. Trials were defined to have low ROB if all domains were rated as low ROB, while they were considered to have unclear ROB if there was ≥ one domain rating as unclear ROB [19].
Statistical analysis
We described continuous trial characteristics with medians and lower and upper quartiles (Q1, Q3) and categorical variables using counts and percentages. We used the McNemar’s test to evaluate whether the concordance between registered and published plans to share data was significant [20]. We plotted the proportions of trials with plans to share data in registration and proportions of data sharing concordance from 2021 to 2023 by country of trial origin and journal.
We assessed the associations between trial characteristics and registered plans to share data and between trial characteristics and data sharing concordance. Trials with undecided/missing plans to share data were treated as “No” plan to share data in our association analysis. The univariate logistic regression analysis was used to explore trial characteristics in relation to registered plans to share data, taking those trials without registered plans to share data as reference. For the association between trial characteristics and data sharing concordance (Yes/Yes, and No/No), trials with the two types of discordance (Yes/No, and No/Yes) were combined as the reference group, given the concern over small sample sizes for each type of discordance. This approach to combine these two types of discordance was also used by some previous methodological studies [7, 21].
We performed univariable logistic regression analysis for each trial characteristic in relation to registered plans to share data, including the year of publication, whether being multicenter, funding source, planned sample size, whether being a COVID-19 trial, intervention type, country of origin, phase of clinical trial, and whether a parallel design. Similarly, we conducted univariable analysis to investigate whether the trial characteristics (including the year of publication, whether being multicenter, planned sample size, whether being a COVID-19 trial, intervention type of drug, country of origin, trial phase, whether a parallel design, funding source, authors’ conflict of interest, and ROB) were associated with data sharing concordance between registered and published plans to share data. Odds ratios (ORs) with 95% confidence intervals (CIs) were used for the relationship between trial characteristics and registered plans to share data and between trial characteristics and data sharing concordance. An OR > 1.0 presented that the trial characteristic was associated with increased odds of registered plans to share data in registration or data sharing concordance.
Regarding the associations between trial characteristics and registered plans to share data, we performed a prespecified sensitivity analysis by removing trials with undecided/missing plans to share data from the association analysis. We performed another post hoc sensitivity analysis by excluding non-randomized trials from the association analysis.
We redescribed the counts and percentages of data sharing concordance between registered and published plans by removing trials with undecided/missing plans to share data and by treating trials with undecided/missing plans as having registered plans to share data. Moreover, for trial characteristics in relation to data sharing concordance, we conducted two post hoc sensitivity analyses by replacing the seven registration IDs that had a later study start date with those having an earlier study start date and by excluding non-randomized trials. We performed a third post hoc sensitivity analysis by using the two types of discordant groups as a separate control group for the association analysis (i.e., Yes/Yes and No/No vs Yes/No, Yes/Yes and No/No vs No/Yes).
Furthermore, we evaluated the differences in data sharing content, data access time after publication or trial completion, and data access method among those trials having Yes/Yes in registration and publications. The McNemar’s test was used to assess whether a significant discordance existed.
All statistical tests were two-sided with a significance level of 0.05. Analyses were conducted in R software version 4.4.1.