The primary goal of this study was to predict muscle response to FES therapy from baseline sEMG signals for individuals with cervical SCI. sEMG has the potential to be integrated into point-of-care tools to provide biomarkers for clinical decision support and enable precision rehabilitation approaches. By knowing which muscles are likely to benefit, therapists can make informed decisions into the use of the limited therapy time available. We evaluated classification models trained on clinical variables alone, sEMG features alone, and combinations of both. Leave-one-participant-out (LOPO) cross validation was used to assess robustness across participants.
Our findings indicate that sEMG feature sets, particularly the FWD feature set (slope sign changes, mean and median frequencies, and M2), consistently outperform models using clinical variables alone. The FWD feature set combined with a random forest classifier achieved the highest performance across multiple metrics, including MCC (0.41), accuracy (0.76), and macro F1 (0.68). Combining clinical variables with sEMG features did not improve model performance, underscoring the unique predictive values of sEMG features. Additionally, training models on specific AIS subgroups (motor complete and motor incomplete) improved performance, particularly in AIS A-B, compared to models trained on the entire dataset.
Limitations of clinical variables alone
Although the progression of AIS grade from A to D correlates with an increasing percentage of responders (Fig. 1), AIS was not identified as the single most important predictor in logistic regression with forward feature selection. Resulting clinical variables also include NLI, Proximity, and Distance. The logistic regression model trained on these variables demonstrated moderate performance (Table 2), with MCC, macro F1 score, accuracy, precision, and TNR all above expected values by chance. Recall (0.21) from the model is below the expected chance level (0.33), indicating the difficulty in correctly identify true responders. This low recall suggests that some muscles that could benefit from the FES therapy might be overlooked.
When evaluating the logistic regression model’s performance per AIS grade (Fig. 3), the limitations of relying solely on clinical variables become more evident, particularly for AIS A, B, and C, where MCC, precision, and recall are all zero. The poor performance on AIS B is especially notable, as it includes a larger number of participants compared to AIS A and C. Training the model on AIS subgroups slightly improved performance for the motor incomplete group (AIS C-D) but did not enhance results for motor complete group (AIS A-B). In fact, the model predicted all AIS A-B muscles to be non-responders, resulting in 14 (25%) false negatives. With no positive predictions, MCC, precision, and recall remained at zero. By relying on this model with only clinical variables, no muscles from patients with AIS A or B would be selected for FES therapy—or potentially for other treatment as well.
Prediction with sEMG features
In contrast, models using sEMG features alone consistently outperformed those based on solely clinical variables, particularly the FWD set with RF. The FWD set achieved the highest values in all metrics, including MCC (0.41), accuracy (0.76), and macro F1 (0.68), though it did not achieve the highest TNR due to the class imbalance. When trained on all participants, FWD set was the only one to obtain above-chance performance across all AIS grades (Fig. 2), suggesting that it captures essential sEMG characteristics relevant to predicting the response to the FES therapy.
The FWD feature set (SSC, M2, mean and median frequencies) captures diverse aspects of motor unit firing and recruitment patterns by integrating both time- and frequency-domain information. This breadth is likely key to its strong predictive performance, as no single feature alone can fully characterize the neuromuscular output, especially after SCI. For example, M2, a time-domain feature characterizing frequency-domain behavior, quantifies the temporal variability of the signal by computing the squared difference between consecutive time samples. Higher M2 values may reflect more abrupt signal changes and complex activation patterns, which contributes towards predicting positive (responder) class in the SHAP analysis (Appendix C). SSC is related to the frequency of slope changes and indicative of motor unit firing irregularity. Mean and median frequencies, which summarize the distribution of spectral power, also contributed but with greater variability across samples. These findings highlight the physiological relevance of the selected features and support the use of diverse and broad feature sets to improve prediction robustness in the heterogeneous SCI population.
Moreover, Fig. 3 shows that training models specifically on motor completeness subgroups (AIS A-B vs. C-D) leads to further performance improvement for RF with the FWD set. This subgroup-specific training enhances MCC, accuracy, macro F1, precision, and TNR, particularly for AIS A-B, highlighting the advantages of tailoring models to motor completeness levels. This approach appears to capture more distinct sEMG patterns within each subgroup, allowing for improved classification performance. Given the small dataset, we did not separate subgroups further by individual AIS grade, though this may provide further improvements with a larger sample size.
Impact of imbalanced dataset on recall (TPR) and TNR
The consistently high TNR across feature sets and models can be attributed to the dataset’s class imbalance, where non-responders are more frequent than responders. This imbalance leads to models that are effective in identifying non-responders but struggle with recall, particularly in AIS A and C, where MCC scores were close to or below zero (Fig. 2) for most models. This low recall indicates that while models perform well in identifying non-responders, they may overlook true responders, limiting the practical utility. Results from Fig. 4 suggests that subgroup-specific training partially alleviates this issue by improving recall within more homogeneous groups, especially in the motor incomplete subgroup, where the model appears better suited to capturing true responder characteristics.
Variability across muscles and participants
Precision variability across participants from RF on the FWD set (Fig. 5) underscores the challenge of achieving consistent responder classification. Precision had the highest variability, followed by MCC, TNR and recall. This variability suggests that while the FWD feature set with RF generally performs well, individual differences in muscle response create inconsistencies in predicting true responders. Notably, the high variability in precision and MCC indicates that certain participants’ sEMG signals are easier to classify than others, possibly due to differences in baseline after SCI or other individual-specific factors. The results in Fig. 5 should however be interpreted with caution, considering the low number of muscles per participant that may impact the robustness of the metrics in this portion of the analysis.
We recognize that differences in muscle type and size may influence sEMG signal characteristic and classification outcomes. Although this variability was not stratified in the current analysis, future studies with more data may explore muscle-specific stratification approaches.
In the context of existing literature
There exists intensive literature in predicting functional recovery after SCI [23, 24, 49]. However, to the best of our knowledge, no prior study exists to provide prediction of muscle response to FES therapy, a promising intervention for restoring motor function. Our study is the first attempt to address the gap in the literature and focuses on muscle-level prediction of FES therapy outcomes. While clinical variables such as AIS grade and NLI provide general prognostic information [22,23,24], our findings indicate that baseline sEMG features, particularly the FWD feature set, are more effective for predicting responses to FES therapy. Unlike available clinical information, sEMG captures neuromuscular activation patterns that reflects residual motor connectivity. The FWD feature set combined with a random forest classifier consistently outperformed models using clinical variables alone, suggesting that sEMG features capture unique, functionally relevant information at the muscle level. Notably, combining clinical variables with sEMG features did not enhance model performance, reinforcing the unique predictive value of sEMG alone.
Prior studies have shown that stratifying SCI patients into specific subgroups based on motor completeness or baseline neurological impairment can improve prognostic accuracy, such as the Unbiased Recursive Partitioning regression with Conditional Inference Trees (URP-CTREE) model [50]. The URP-CTREE model has been used to stratify patients with acute traumatic SCI into homogeneous subgroups to optimize recovery predictions and enhance the design of clinical trials. We explored training separate models for motor complete (AIS A-B) and motor incomplete (AIS C-D) groups. Our findings similarly suggest that subgroup-specific training improves classification performance, particularly in identifying responders, by allowing models to capture subgroup-specific sEMG patterns related to motor completeness.
Choice of MMT as an outcome measure
MMT is a practical and commonly used clinical assessment for muscle strength and was used as the primary outcome measure. Because of its simplicity and accessibility in a clinical setting, MMT was administered before each FES therapy session to track the target muscle strength, without the need for additional scheduling. To ensure consistency across sessions and raters (therapists), we implemented standardized protocols from ISNCSCI and GRASSP.
While not feasible for frequent longitudinal data collection, other modalities such as imaging or motor evoked potentials could provide quantitative insights into muscle structure and corticospinal connectivity in response to FES therapy. Combining these tools with baseline sEMG could offer a comprehensive evaluation framework, capturing both functional and structural aspects of the recovery. A multi-modal approach with measurements before and after FES therapy or at multiple timepoints throughout the therapy cycle could help refine responder identifications and provide more accurate evaluation, enabling more robust predictive model development.
Limitations and future directions
In this section, we discuss several limitations that should be considered when interpreting the findings and future research directions.
Expanding dataset diversity and demographics
First, there was only one female participant (less than 6%), which does not reflect the proportion seen in the SCI population [51] and restricts the study’s generalizability across sex. A more balanced sample would provide a clearer understanding of potential differences in response to FES therapy between male and female participants.
The dataset also has a higher number of non-responders (67%) than responders. This imbalance likely contributed to the high true negative rate (TNR) observed across models, as well as the relatively low recall, indicating that the models may be better at identifying non-responders than true responders. While we attempted to compensate for this effect by evaluating models with robust metrics such as MCC and F1 score, future studies with more balanced responder and non-responder groups would help validate these findings and improve the model’s sensitivity to true responders.
Overrepresentations of AIS D injuries (45%) and C3–C4 level of injury (74%) are also observed. AIS D often reflects more treatment options and better prognostics. Individuals with motor complete cases (AIS A or B) often face lower expected recovery potential. Our results show that clinical information alone typically predicts all muscles in AIS A cases to be non-responders, effectively closing the door to FES therapy for this subgroup. This exclusion is problematic, as AIS A patients represent a group in dire need of interventions. These imbalances hinder the generalizability of findings to the broader SCI population, which exhibits great demographic and clinical diversity.
While we obtained promising results by training models on specific AIS subgroups, the relatively small sample size prevented further stratification by individual AIS grades. Future studies with larger datasets may benefit from more granular subgroup analysis to capture subtle differences in muscle response within each AIS grade, potentially enhancing predictive accuracy.
Although our primary goal in this study was to explore generalizable predictive patterns in baseline sEMG across target muscles for FES therapy, a larger dataset would allow for investigation of the impact from muscle anatomical variability, including muscle type (e.g., biarticular vs uniarticular) and size. Along with sex and other person- and muscle-level variables, muscle type and size could be explored as predictive variables.
Beyond binary classification
In this study, binary classification results were used to evaluate model performance. With a larger dataset, future work could move beyond binary prediction to estimate changes in MMT scores directly. Predicting both the magnitude and the timing of MMT improvement could provide clinicians with more detailed guidance for treatment planning and help set realistic expectations for patients. Also, the confidence level of the prediction could be investigated to indicate the likelihood of responding, providing additional decision support to treating therapists beyond the current binary classification.
Integration of multi-channel perspectives
The experiment was designed with a specific clinical point-of-care implementation in mind: take sEMG measurements from a potential target muscle during voluntary contractions based on MMT protocols, and predict its responsiveness to FES therapy in real time or a short amount of time. As such, simplicity and clinical feasibility were the top priorities—a simple setup with bipolar electrodes and no posture restrictions, with only one recording session. Analysis was also done on individual muscles, instead of multiple muscles together.
While this approach aligns with the implementation goals, it limits the depth of electrophysiological insights. Incorporating signals from multiple channels to analyze agonist and antagonist interactions or co-activation patterns could be beneficial. In our experiments, firing of non-target muscles are often observed even though only the target muscle was voluntarily contracted. Compared to using information solely on the target muscle, these patterns could potentially provide more information regarding the systemic effect of SCI, leading to a more robust muscle-specific prediction.
