miRNA biomarkers for prognosis and therapy monitoring in a multi-ethnic cohort with SARS-CoV-2 infection

Patient demographics

This study encompassed a total of 31 control individuals (23 males (74.2%) and 8 females (25.8%), and 154 infected patients (116 males (75.3%) and 38 females (24.7%). The United Arab Emirates (UAE) is a diverse and multinational country, home to people from various parts of the world. While the participants included in this research hailed from 40 different countries (Fig. 1a), a significant portion of them originated from India (n = 44; 23.8%), Philippines (n = 22; 11.9%), Pakistan (n = 18; 9.7%), UAE (n = 10; 5.4%), Egypt (n = 9; 4.9%), and Iran (n = 7; 3.8%). Additionally, there were 6 participants from Jordan and Iraq (3.2%), 5 from Bangladesh (2.7%), as well as 4 from Lebanon and Palestine (2.2%). Furthermore, Belgium, China, Nepal, Poland, Senegal, Sudan, and Syria were represented by 3 participants each (1.62%), while Australia, Ethiopia, Russia, South Africa, Sri Lanka, UK, and USA had 2 participants each (1.08%). Finally, Afghanistan, Armenia, Burundi, the Czech Republic, Eritrea, Germany, Ghana, Indonesia, Malaysia, Morocco, Switzerland, Thailand, Vietnam, and Yemen each had 1 participant, with one participant remaining unclassified (0.54%).

Clinical data shows distorted levels of ferritin, CRP, WBC, ALT, urea, & GFR in infected patients

Figure 1b shows the comparison of other demographic and clinical parameters between the control and infected groups. The mean age of the control and infected groups was 44.23 ± 11.3 and 46.67 ± 13.1 years, respectively. The age variation was not statistically different within either control or infected groups (Fig. 1; Table 2). Both the control and infected groups showed similar body mass index (BMI) of ~ 27–28, revealing an overall overweight cohort. Although the controls showed presence of common morbidities like hypertension and diabetes, they were never reported to be previously infected with SARS-CoV-2 at the time of data collection. Some clinical laboratory data for the control group could not be included in this study due to the unavailability of test reports or because these tests were not conducted since the individuals were not experiencing any illness. Nonetheless, the available data for the control group fell within the already defined normal value range. Among the 154 infected patients at the time of infection, 55 (35.7%) had received vaccinations. The vaccines administered included Sinopharm BIBP (n = 38), Pfizer-BioNTech (n = 10), and other vaccines (n = 7), with the majority having received two doses of any vaccine (n = 44). A majority of the infected participants (n = 86; 55.8%) had cough, fatigue, fever, and shortness of breath at the time of infection. While X-ray reports were not available for controls, they were taken for 104/154 (67.5%) of the infected patients. Among the enrolled patients, 35 (22.7%) required respiratory support with oxygen delivery rates of up to 6 L/min and 51 (33.1%) with > 6 L/min.

Fig. 1

Visual representation of demographic and clinical outcomes of the participants. (a) A map displays the diverse origin of the participants of this study, representing 40 different countries across the globe. The intensity of green colors on the map indicates varying participant numbers from each country, with darker shades representing more participants and lighter shades indicating fewer. The world map was created using Microsoft Excel (Microsoft Excel LTSC Professional Plus 2024, 64-bit). (b) Comparative analysis of age, BMI, plasma levels of IL-6, ferritin, D-dimer, CRP, lymphocyte, WBC, and platelet count, hemoglobin levels, ALT, AST, urea, creatinine, and glomerular filtration rate (GFR) between uninfected and SARS-CoV-2 infected patients. The red or green scatter plots represent significantly up- or downregulated clinical markers, while orange indicates non-significant change.

Table 2 Clinical characteristics of the patients (n = 154) and control (n = 31) subjects.

Routine or pre-defined laboratory tests were conducted for each infected patient. These tests involved the measurement of various clinical markers in patient blood, including ferritin, D-dimer, IL-6, C-reactive protein (CRP), white blood cell (WBC) & lymphocyte count, platelet count, hemoglobin levels, alanine transaminase (ALT), aspartate aminotransferase (AST), urea, creatinine, and glomerular filtration rate (GFR). WBCs represent a broader category of cells involved in the body’s immune response than lymphocytes which are a specific subtype of white blood cells that play key roles in adaptive immunity, including T cells, B cells, and natural killer cells. For example, lymphopenia, which is a decrease in the number of lymphocytes in the blood, is a common finding in patients with severe COVID-19, while the total WBC count can vary in the same patients, depending on factors such as the presence of secondary bacterial infections or other inflammatory conditions. The levels of these clinical markers were then compared with either control or established normal ranges (Fig. 1; Table 2). A significant elevation was identified in the plasma concentrations of ferritin, CRP, and ALT among infected patients, surpassing the defined normal range. Induced levels of WBC, hemoglobin, and urea were also noted, but within the normal range, compared to the control. A marked reduction in GFR was observed compared to controls despite remaining within normal ranges, while a slight reduction in lymphocytes was observed. Although elevated levels of D-dimer and IL-6 were observed in infected patients, insufficient data precluded their inclusion in subsequent analysis.

Finally, in terms of vaccines, at the time of sample collection, SARS-CoV-2 vaccination was not yet widespread. Among the 13 healthy controls, 2 individuals (15.4%) had received Pfizer vaccinations, while the remaining 11 were unvaccinated. In the infected group, 17 of our infected patients (44.7%) were vaccinated: 6 patients had received the Pfizer vaccine, 9 had received Sinopharm, and 2 had received Covishield. The remaining 21 infected patients were unvaccinated.

SARS-CoV-2 infection significantly alters expression of a small subset of host MiRNAs

RNA was extracted from the nasopharyngeal swabs. Depending upon the amount and quality of the RNA, a select group of samples were analyzed by miRNAseq. Nevertheless, statistically significant number of samples were selected, including 13 control and 38 individuals infected with SARS-CoV-2. The average age of the control and infected patient was 47.62 ± 12.76 and 47.66 ± 12.92 years, respectively, while the male/female ratio was 3:1 for both groups (Table 3).

Table 3 Gender and age of the patients selected for MiRNAseq data analysis.

Figure 2 shows the quality of data and the overall results obtained from the miRNAseq analysis. The heatmap displaying the raw expression data (Fig. 2a), the bar graphs representing transcript expression profiles from individual samples (Fig. 2b), as well as the two groups with data represented as box plots (Fig. 2c), demonstrated consistent distribution within each sample from both groups, highlighting reliability of the data. The initial data analysis yielded a total of 1788 transcripts, with 1456 of them being previously known and 332 identified as novel miRNAs (Supplementary Data S1). Among the miRNAs detected, 1218 (67%) were expressed in both the control and infected groups, 78 (4.3%) miRNAs were uniquely expressed in the control group, while 492 (27.1%) were uniquely expressed in the infected group (Fig. 2d). Pearson correlation of miRNA expression changes between control and infected groups revealed that SARS-CoV-2 infection altered ~ 2.6% of the host miRNAs (Fig. 2e). Of the significantly altered miRNAs, 44 were known and 4 were novel miRNAs (29 up- and 15 downregulated; with p and Q values < 0.05) that may be able to distinguish between control from infected groups (Fig. 2f). Despite miRNAseq being a well-established and reliable technique that may not necessitate additional validation66, we took the extra step of confirming our overall miRNAseq findings. This was achieved by randomly selecting 15 RNA samples (7 control and 8 infected) that had been sequenced and subjecting them to TaqMan RT-qPCR assays for 16 miRNAs (two as endogenous controls). The results from our RT-qPCR analysis indicated that, with the exception of two miRNAs (miR-5010-3p and miR-2110), the remaining samples exhibited a similar trend in miRNA expression (non-significant variance) to what was observed in the miRNAseq analysis (Fig. 3). Our findings demonstrate a superiority over the previously established normal range, with a 15–20% non-concordance in gene expression between RT-qPCR and miRNAseq.

Fig. 2
figure 2

Summary of the miRNA expression analysis of the raw data. In this study, 13 control and 38 infected samples were subject to miRNAseq. MiRNA sequencing resulted in 1788 transcripts that were identified as potential miRNAs. The data was analyzed using the fully automated program Dr. Tom from BGI. (a) Heatmap of the hierarchal clustered raw data representing control and infected groups. (b) Tukey box plots showing the expression of miRNAs in individual control and infected samples after normalization. (c) Tukey box plot comparing the whole group level distributions of miRNAs expression data after normalization. (d) Pearson’s correlation plot representing the correlation (r) values between control and infected groups. (e) Venn diagram of intersection of miRNAs expressed in control and infected groups representing 1296 control and 1710 infected miRNAs in sub groups with fold change (FC) > 0, 1 and 2. (f) Volcano plot of the differentially regulated miRNAs. Red and green dots represent up- and downregulated miRNAs with FC ≥ 1.

Characterization of the differentially regulated miRNAs observed in infected patients

Next, we closely analyzed the 48 miRNAs with p/Q values < 0.05 observed in the infected patients. As shown, among the 44 known miRNAs, 29 were upregulated and 15 were downregulated (Fig. 4a and b). However, only 36 of the 44 showed a fold change of > 1, with 24 being upregulated and 12 downregulated (Fig. 4b). Additionally, 14 miRNAs (12 upregulated and 2 downregulated) showed a fold change greater than ± 2 (Fig. 4b). Notably, miR-146b-3p and miR-365b-3p were significantly upregulated, while miR-202-5p was markedly downregulated (as shown in Fig. 4c; Table 4). Among the four novel miRNAs examined, novel-miR-285-5p displayed significant upregulation, whereas the other three novel miRNAs, miR-115-5p, miR-189-5p, and miR-264-3p, were downregulated in patients infected with SARS-CoV-2 (Table 4).

Table 4 Differentially regulated known MiRNAs (n = 44) in infected patients (FC ≥ 0.5, Q < 0.05)*, **.
Fig. 3
figure 3

Quantitative RT-PCR validation of the miRNAseq data. The relative normalized expression values (log 2 of infected/control) of the individual samples obtained from the miRNAseq analysis (n = 13 control and 38 infected) were compared with those obtained from the RT-qPCR data (n = 7 control and 8 infected). The red and green box and violin graphs show up- and down-regulated miRNAs, while the blue depicts the results that contradict the findings of the miRNAseq analysis. The floating line for every bar represents the mean expression value. The p value shows difference in mean (student’s t-test) for each miRNA in the miRNAseq and RT-qPCR cohort.

Fig. 4
figure 4

Summary of the differentially regulated miRNAs in the infected patients. DEG analysis identified 48 miRNAs with p/Q values < 0.05 with 44 known and 4 novel miRNAs. (a) Heatmap of the hierarchal clustered DEGs in individual infected samples. The blue boxes represent downregulated miRNAs, while yellow boxes show upregulated miRNAs in infected samples when compared to the control. These boxes represent log2 (Infected/Control) values for individual infected samples. (b) Bar graph representing up- and downregulated miRNAs in infected vs. control groups. Red bars represent up- while green bar represent downregulated miRNAs. (c) Volcano plot of differentially expressed up- (red dots) and down- (green dots) regulated 14 miRNAs in the infected group when compared to the control.

miRNAs may serve as possible prognostic markers in defining SARS-CoV-2 infected patients

The primary objective of this study was to evaluate the potential of differentially regulated miRNAs as prognostic markers when identifying SARS-CoV-2-infected patients. This was accomplished by conducting ROC analysis of each miRNA that helps discriminate the true negatives from true positives. Thus, ROC curves were created for each known miRNA that showed differential regulation in our study (with a fold change of ± 2 or more), as well as for the novel miRNAs identified during our research. Furthermore, we constructed ROC curves for four previously-reported miRNAs with an FC < 2 in our study (miR125-5p, miR-151b, miR590-3p, and miR-625-5p) (Table 5), but that had been shown to have potential as biomarkers to test their overall diagnostic performance in our SARS-CoV-2-infected patients67. These ROC curves were drawn using normalized miRNA read counts levels observed in both the control and infected groups. The analysis of the ROC area under the curve (AUC) was used as a feature used to measure the accuracy of our biomarkers. Several miRNAs, including miR-146b-3p (AUC = 0.999, p < 0.0001), miR-154-5p (AUC = 0.891, p = 0.002), miR-335-3p (AUC = 0.874, p < 0.0001), and miR-30c-5p (AUC = 0.761, p = 0.004), demonstrated excellent discriminative ability between infected and control samples. (Fig. 5a). Among the novel miRNAs, N-miR-264-5p showed a high diagnostic value with an AUC of 0.902 (p < 0.0001), while N-miR-115-5p also displayed strong potential (AUC = 0.792, p = 0.001). (Fig. 5b).

Fig. 5
figure 5

Receiver Operating Characteristic (ROC) curve analysis of differentially regulated miRNAs in COVID-19 patients compared to healthy controls. ROC analysis was conducted using RNA sequencing data from 51 nasal swab samples (control: n = 13, infected: n = 38). (a) ROC curves for 12 upregulated and 2 down regulated known miRNAs showing FC ≥ ± 2 identified through RNA sequencing. (b) ROC curves of 4 novel miRNAs identified in this study. (c) ROC curves for previously reported circulating miRNAs validated using expression data from our cohort. (d) Combined ROC curves showing the improved discriminative power when multiple miRNAs were combined as biomarker panels. ROC curves were drawn for these reported miRNAs using read count expression data from our study. In all ROC plots, red and green lines represent predictive curves for upregulated and downregulated miRNAs, respectively; the dashed blue line indicates the random classifier (reference line), whereas orange line represents predictive curves for combined miRNAs. The Area Under the Curve (AUC) values and corresponding p-values are indicated for each miRNA.

The ROC curves for the previously reported circulating miRNAs associated with viral infections were evaluated using expression data from our study. MiRNAs such as miR-125-5p (AUC = 0.746, p = 0.007), miR-151b (AUC = 0.749, p = 0.006), and miR-590-3p (AUC = 0.823, p < 0.0001) demonstrated robust predictive performance, validating previous reports (Fig. 5c). The optimal sensitivity and specificity values for identified markers in our study ranged from 65.91 to 100% and 61.54–100%, respectively, revealing high specificity with low to zero false rates since an AUC value of 1 indicates 100% accuracy (Supplementary Data S2)68. The cut-off values for differentially regulated miRNA biomarkers were also determined based on normalized read counts to achieve an optimal balance between sensitivity and specificity using ROC curve analysis (Supplementary Data S2). This approach provides a robust estimate of the classification potential of the identified miRNAs within our cohort, aligning with methodologies previously described in biomarker research68,69.

To account for multiple comparisons, the FDR correction using the Benjamini-Hochberg method was applied to ROC p-values. Only FDR-adjusted p-values (p-adjusted) below 0.05 were considered statistically significant. This correction ensured that the significance of miRNAs as biomarkers is not due to random chance when evaluating multiple candidates. After correction, miR-146b-3p, miR-154-5p, miR-5010-3p, miR-127-3p, miR-335-3p, miR-30c-5p and miR-202-5p emerged as the most promising biomarker candidates (Fig. 5a). These miRNAs demonstrated strong discriminative power, with AUC values ranging from 0.75 to 0.99 and statistically significant p-values after multiple testing correction. Finally, combined ROC curve analyses, integrating multiple top-performing miRNAs (such as miR-146b-3p, miR-154-5p, miR-335-3p, miR-127-3p, miR-30c-5p, and miR-202-5p) significantly enhanced diagnostic performance, achieving AUC values between 0.939 and 0.972 (p < 0.0001) (Fig. 5d). These findings highlight the potential advantage of using multi-miRNA panels over individual biomarkers for improving COVID-19 diagnosis from nasal swab samples.

Clinical markers and miRNAs showed significant correlation among each other

In our study, we noticed notable changes in both clinical indicators and miRNA expression among individuals infected with SARS-CoV-2 when compared to those who were healthy. To delve deeper into their connections, we conducted a correlation analysis by calculating the Pearson’s coefficient that evaluates the linear relationship between variables (Fig. 6).

Fig. 6
figure 6

Correlation analysis and diagnostic performance of clinical markers and miRNA biomarkers in SARS-CoV-2 infection. Pearson’s correlation coefficient (r), p-values, and R² values were calculated to assess associations between clinical markers and differentially expressed miRNAs. Analyses were conducted using RNA sequencing data and clinical parameters from up to 51 samples (controls: n = 13; infected: n = 38). (a) Heatmap representing Pearson’s correlation coefficients among clinical markers, including ferritin, C-reactive protein (CRP), white blood cell count (WBC), alanine aminotransferase (ALT), urea, and glomerular filtration rate (GFR). Significant correlations are indicated by asterisks (*p < 0.05, **p < 0.01). (b) Heatmap showing Pearson correlation values among selected potential biomarker miRNAs. (c) Scatter plots illustrating significant individual correlations between specific clinical markers (ferritin, CRP, GFR, urea) and candidate miRNAs (N-miR-115-5p, N-miR-264-5p, N-miR-30c-5p, and miR-5010-3p). (d) ROC curve analysis assessing the diagnostic performance of individual clinical markers (ferritin, CRP, urea, and GFR), their combined performance, and performance in combination with selected miRNAs. Integration of miRNA expression profiles with clinical markers markedly improved diagnostic accuracy (AUC = 0.982, p < 0.0001).

This analysis uncovered significant associations not only within the clinical markers or miRNA expression levels, but also between these two categories. Within the clinical markers, we observed both positive and negative correlations. For instance, ferritin displayed significant positive correlations with CRP, WBC, and ALT, while CRP exhibited a positive correlation with urea but a negative correlation with GFR. Additionally, GFR showed a strong negative correlation with urea levels in COVID-19 infected patients (Fig. 6a).

Notably, we also detected significant correlation among differentially regulated miRNAs. For this analysis, we selected miRNAs with an AUC greater than 0.75 and a p-value less than 0.05. These miRNAs included miR-154-5p, miR-5010-3p, miR-335-3p, miR-30c-5p, miR-202-5p, miR-590-3p, miR-625-3p, novel-miR-115-5p, and novel-miR-264-5p (Supplementary Data S2 and Fig. 5). Our results revealed positive correlation between several miRNAs pairs, such as miR-154-5p/miR-5010-5p, miR-5010-3p/miR-30c-5p, miR-335-3p/miR-30c-5p, miR-30c-5p/miR-5010-3p, miR-202-5p/miR-625-5p, miR-590-3p/miR-625-5p, novel-miR-115-5p/novel-miR-264-5p, as well as miR-625-5p/miR-202-5p (Fig. 6b). Additionally, significant negative correlations were observed between miR-154-5p/novel-miR-264-5p, miR-5010-3p/novel-miR-115-5p, miR-335-3p/novel-miR-264-5p, miR-30c-5p/novel-miR-115-5p and novel-miR-264-5p (Fig. 6b).

Remarkably, we also detected meaningful associations between the expression values of clinical markers and miRNAs expression levels (Fig. 6c). Individual scatter plot analyses further confirmed significant correlations between selected miRNAs and key clinical markers. N-miR-115-5p levels negatively correlated with ferritin concentrations, while N-miR-30c-5p levels positively correlated with urea and CRP. Furthermore, miR-5010-3p demonstrated a positive association with urea and a negative association with GFR. Similarly, N-miR-264-5p levels were significantly associated with GFR, highlighting their potential relevance to kidney function during SARS-CoV-2 infection (Fig. 6c).

Next, ROC curve analysis was conducted to evaluate the diagnostic performance of individual clinical markers and their combination with miRNAs (Fig. 6d). Among the clinical markers, ferritin (AUC = 0.931), CRP (AUC = 0.844), GFR (AUC = 0.777), and urea (AUC = 0.753) demonstrated reasonable discriminative ability between infected and control groups. When clinical markers were combined, diagnostic performance improved (AUC = 0.951, p = 0.002). Importantly, integrating selected miRNA biomarkers with clinical markers further enhanced diagnostic accuracy, achieving an AUC of 0.982 (p < 0.0001), suggesting that a combined miRNA-clinical marker panel could serve as a highly sensitive and specific approach for detecting SARS-CoV-2 infection. Overall, our findings demonstrate a notable connection between the expression levels of specific miRNAs and clinical markers, revealing a biological association between the two. This association has the potential to serve as valuable prognostic markers for distinguishing SARS-CoV-2 infected patients from healthy individuals.

Differentially-expressed miRNAs share crucial biological pathways associated with disease severity

Next, we aimed at identifying the targets of the differentially-regulated miRNAs observed in our study. While numerous computer-based tools are available for predicting potential miRNA targets, we chose to focus solely on experimentally verified target genes for our specific miRNAs. To identify these targets, we conducted a thorough search of online databases, including miRTarBase, miRpathDB, and miRwayDB, which revealed that out of 14 miRNA that exhibited FC > ± 2, ten miRNAs, including miR-30c-5p, miR-132-3p, miR-127-3p, miR-18a-3p, miR-154-5p, miR-335-3p, miR-146b-3p, miR-202-5p, and miR-103a-2-5p targeted a total of 40, 34, 18, 7, 6, 5, 5, 3, and 2 human genes, respectively, while there were no experimentally-validated targets identified for the remaining four miRNAs (Table 5).

To gain further insights into whether the genes we predicted as experimental targets share various biological pathways, we initially constructed a protein network encompassing all proteins transcribed by these genes using the STRING database. Among the 122 genes, 118 were found to be interconnected (resulting in 118 nodes and 344 edges) at higher confidence STRING settings (Fig. 7a). Subsequently, we conducted searches for associated KEGG pathways, biological processes, molecular functions, and diseases using the DAVID database. DAVID analysis yielded 112 biological pathways (Supplementary Data S3) with a p-value < 0.05, including prominent pathways such as FoxO, MAPK, neurotrophin, PI3-AKT, prolactin, relaxin, TGF-β, AGE-RAGE, ErbB, and apelin signaling (Fig. 7b). The number of genes associated with each pathway are indicated by the size of the circle shown (Fig. 7b).

Table 5 Experimentally verified target genes associated with differentially regulated miRNAs.

Network analysis further revealed that many genes associated with multiple pathways could be regulated by similar miRNAs (Fig. 7c & Supplementary Data S4). For example, among the miRNAs identified in this study, the MAPK signaling pathway could be influenced by miR-132-3p, miR-18a-3p, miR-28-3p, miR-30c-5p, miR-202-5p, and miR-146-3p. Similarly, the crucial PI3-AKT pathway is linked to miR-132-3p, miR-18a-3p, miR-28-3p, miR-30c-5p, miR-335-3p, and miR-154-5p. The same was true of many other genes, including AGE-RAGE, Apelin, FoxO, Relaxin, Neurotrophin, Prolactin, ErbB, and chemokine.

Gene ontology analysis demonstrated that the predicted experimentally verified genes in this study regulate numerous biological processes, including transcription, apoptosis, morphogenesis, protein phosphorylation, and cell proliferation, among others, as shown in Fig. 7d & Supplementary Data S5) and molecular functions, such as protein binding, DNA binding, enzyme binding, miRNA binding, and transcription factor regulation, as illustrated in Fig. 7e & Supplementary Data S6). Furthermore, expression of these genes was associated with various diseases, especially many different types of cancers (breast, lung, ovarian, colorectal, prostate, oral, stomach, etc.), bone mineral density disease (osteoporosis), hypercholesterolemia, kidney failure, and human papillomavirus infection (Fig. 7f & Supplementary Data S7). These findings underscore the potential implications of dysregulated miRNAs in SARS-CoV-2 infected patients as they may have significant consequences.

Fig. 7
figure 7

Gene Ontology (GO) and pathway analysis of experimentally verified targeted genes. Ten of the differentially regulated miRNAs (FC > ± 2) identified in our study were used to predict their target genes and associated biological pathways. (a) Protein-protein interactions in miRNAs-associated targets during SARS-CoV-2 infection using STRING. Proteins showing two or more interactions were included in this figure. (b) Biological pathways associated with target genes in our study. (c) Interaction analysis demonstrating the connections between miRNAs (shown in blue), associated genes (shown in black), and the targeted pathways (shown in red). The blue lines show the connection between the miRNAs and their target genes, while the green lines represent the connections between miRNAs and the targeted pathways. This network of identified pathways, genes, and miRNAs was constructed and visualized using Cytoscape and resulted in 54 nodes and 154 edges. The genes associated with only one miRNA or pathway were excluded. (d) Biological processes associated with the targeted genes. (e) Molecular functions associated with the targeted genes. (f) Association between the targeted genes and various diseases.

Continue Reading