A prospective cohort study of households with a SARS-CoV-2 positive index patient (IP) and SARS-CoV-2 negative household contacts (HHC) was conducted, which primarily evaluated viral transmission dynamics and biomarker profiles (Groh et al., 2024). IPs were recruited into the prospective cohort study at the time of their initial SARS-CoV-2 diagnosis, which was confirmed by reverse transcription-polymerase chain reaction (RT-PCR) at the acute care clinic at University Hospital Frankfurt. The subsequent testing mentioned in the manuscript refers to the planned, longitudinal follow-up testing protocol for the study. As part of the study design, both the IPs and their household contacts (HHCs) were tested regularly with RT-PCR over a 30-day period to monitor viral transmission dynamics and biomarker profiles. More specifically, in the Methods it is noted that IPs were recruited ≤ 48 h after diagnostic RT-PCR and tested daily on days 0–7, then every 3–4 days until day 30 ± 6. SARS-CoV-2 infection status of each IP and HHC was evaluated using two RT-PCR tests in nasal swab specimens. A computer-assisted self-interview (CASI) questionnaire was developed for this study (Additional Material. 1). An excel document was created from the developed questionnaire and at each prescribed visit, the questionnaire was given to participants to gather information about the type and severity of up to 12 symptoms (Table 1). Full study details have been previously described [4]. Ethical approvals were obtained from Ethikkommission des Fachbereichs Medizin der Goethe-Universität c/o Universitätsklinikum (Reference number: 2021-119-MPG) and all procedures were performed in accordance with the relevant guidelines and regulations, and each participant provided informed consent.
Data points used for analyses
This analysis includes the nine HHCs who became positive for SARS-CoV-2 during the study period after suspected exposure to the infected IP in the participant’s household. Each HHC had between 7 and 13 visits. At each visit, these HHCs were tested for SARS-CoV-2 infection. For positive tests, the HHC were tested at that and all subsequent visits to determine viral level. Multiple test types and repeat testing were conducted at each visit (see section “Quantitative analyses of SARS-CoV-2”). Negative nucleic acid amplification tests (NAAT) results were imputed as zero for analysis purposes. This approach simplifies interpretation and assumes undetectable VL for negatives. Any visits without testing or symptoms data were excluded from further analysis.
The resulting number of data points for each analysis type are as follows: qualitative analyses, all non-antibody HHC Tests and Visits (n = 1410); quantitative analyses, E gene SARS-CoV-2 Tests and Visits (n = 89).
Number of symptoms derivation
Each of the 12 possible patient symptoms was coded as indicator variables: “1” if the symptom was indicated and as a “0”, otherwise. The total number of symptoms (NoS) was derived as the sum of indicator variables for each of the 12 symptoms at each visit, which were further grouped into three categories (NoSC): 0 symptoms, 1–2 symptoms and 3 + symptoms. Note that a simpler (categorized) representation of the number of symptoms may still capture the essential statistical information contained in the continuous version. This aligns with the statistical principle of sufficiency (Casella & Berger, 2002), whereby a reduced statistic can retain the same—or nearly the same—inferential content about the parameter of interest as the full dataset.
Symptoms severity score derivation
A symptoms severity score (S3) was derived from the patient self-reported symptoms following an approach to measure symptoms severity as follows. First, we assigned values to the reported symptoms based on their severity (Table 1). The severity values for eight symptoms (cough, breathing difficulty, fatigue, body aches (myalgia), headache, sore throat, congestion/runny nose, and nausea) were categorized as follows: “None”=0, “Mild”=1, “Moderate”=2, and “Severe”=3 based on patient responses. For vomit and diarrhoea, the assigned values were: “0″=0, “1–2″=1.5, and “3–4″=3.5 based on symptom frequency. For loss of taste and loss of smell, the assigned values were: “Same as usual”=0, “Less than usual”=1.5, and “No taste or smell”=3.5.
A symptoms severity score (S3) was calculated by summing the recoded values across all visits for each subject to obtain a comprehensive measure of symptoms severity (see Supplementary Material S.1 and Additional Material 1).
For a simpler representation of symptoms severity, the S3 construct values were grouped into three categories (S3C): an S3 value of 0 indicated asymptomatic individuals, a score of 1–2 denoted mildly to moderately symptomatic individuals, and a score of 3 or higher represented severely symptomatic individuals. To note, these representations refer to actual survey response choices in this study (see Table 1), and should not be confused with the CDC [5] or WHO definitions [6].
Quantitative analyses of SARS-CoV-2
Quantitative analyses were carried out using the Cobas® SARS-CoV-2 assay for use on the Cobas 6800/8800 System (Roche Diagnostics). NAAT were conducted for E gene SARS-CoV-2 and ORF1 gene SARS-CoV-2. Of the two available cycle threshold (Ct) values reported, only the E gene SARS-CoV-2 Ct was used as the results were very similar between the two targets for all SARS-CoV-2 positive patient visits. The quantitation scheme used to transform this semi-quantitative result can be found in the Supplementary Materials section, S.2.
Statistical analysis
S3 construct assessment
The internal consistency of the S3 construct was assessed using the standardized Cronbach’s alpha (Ca), a measure of reliability [7]. To confirm the underlying dimensionality of the reduced 12-item set, a factor analysis with varimax rotation was conducted [8,9,10,11].
Descriptive statistics
The Spearman rank correlation coefficient was used to assess the correlations between S3 with the Number of Symptoms (NoS). To assess the association between S3C and NoSC the Rao-Scott modified chi – squared test was used. Pearson product‑moment correlations were also calculated for normally distributed variables and are reported where appropriate. Given that S3 is mathematically derived from the same 12 symptom items counted in NoS, this head‑to‑head comparison illustrates near‑collinearity and supports use of the simpler NoS when an ordinal scale is unavailable.
In addition, the cross tabulation between S3C and NoS with qualitative VL result (i.e., positive/negative) was constructed. A Taylor linearization approach was used to account for multiple visits per participant [12], and stratification by various study testing assays was used.
Adjusted descriptive statistics, including the mean, median, standard deviation, minimum, and maximum values of log10 copies/mL E gene SARS-CoV-2 VL (see Supplementary Material), were calculated for results in each of the S3C and NoSC categories.
Additional normalizing transformations using the logarithm base 10 (log10) were performed as needed on S3 and NoS continuous-level variables (i.e., S3Log and NoSLog, respectively).
Modelling
A generalized estimation equation (GEE) marginal model [13,14,15,16] accounting for the correlation of subjects within multiple visits within the same participant was performed to model the relationship between the following:
-
A.
S3C and the binary test outcome (i.e., RNA positive/negative);
-
B.
NoSC and the binary test outcome;
-
C.
S3Log and the Log10 VL outcome;
-
D.
NoSLog and the Log10 VL outcome;
-
E.
S3 and the Log10 VL outcome;
-
F.
NoS and the Log10 VL outcome (see Supplementary Material 3 for C–F).
The models for binary test outcomes employed a binomial distribution and logit link, adjusted for within-subject correlated visit data with a specified working correlation structure. The continuous outcome models with Log10 VL used a normal distribution and identity link and similarly adjusted for correlated data also with a so-called specified working correlation structure. To model the within-subject dependencies, we evaluated three working correlation structures: (a) compound symmetric (exchangeable), (b) AR [1] or autoregressive with lag 1, and (c) independent (SAS Institute, 2016). To find the optimal working correlation structure for each model, the Quasi likelihood under the Independence model Criterion (QIC) goodness-of-fit (GOF) statistic [15, 17] was computed to assess the model fit, with a lower value indicating a better fit of the model with the specific correlation structure. Table 2 shows the GEE models considered.
Models 1 and 2 are equivalent to logistic regression, but within the GEE framework, they account for data correlation induced by multiple visits per patient. These two GEE models are specified using a binomial distribution, logit Link function, and an assumed or working correlation structure for the data. Analogously, Models 3A, 3B, 4A, 4B correspond to linear regression and, under the GEE framework, also address data correlation. The latter two models employ a normal distribution and identity link function while assuming a working correlation structure to account for within-subject dependencies.
Estimates from the categorical analysis models are reported as odds ratios (OR) along with their 95% confidence interval. P-values less than 0.05 were considered statistically significant. All analyses were carried out using SAS v9.4 [12].