Sternum length measurement techniques vary across studies. Some researchers utilized bone collections to measure the sternum [1, 6, 21], while others used measurements made at autopsy [7, 8]. In addition, several studies have been conducted using chest radiographs [22, 23] and computed tomography of the sternum [5, 21]. In this present study, we opted to measure sternum length using thorax CT.
Consistent with previous studies, the difference between sternum measurements according to sex was found to be statistically significant in our study [7, 21, 22]. In this study, assuredly, the female sternum was generally smaller than the male sternum. This finding is consistent with previous studies conducted in the Indian, Croatia, and Japanese populations, where male sternal dimensions were reported to be significantly larger than those of females [22,23,24]. These sex differences are probably due to body size differences between the two sexes or different endocrine factors affecting the growth of the sternum in the two genders.
In our study, only SI among the variables was greater in women than in men. This inverse relationship was also observed in studies by Ramadan et al. [14] and Das et al. [1], suggesting that SI may be a population-sensitive indicator of sex.
The study results showed that deep neural networks (DNN) applied to CT images yielded the lowest performance among the evaluated models, while K-nearest neighbor (KNN), random forest, XGBoost, naïve Bayes, and logistic regression performed better. The best performance was observed with linear discriminant analysis (LDA), consistent with previous studies [18, 21, 22, 25]. However, when comparing the area under the curve (AUC) values of the lowest (DNN) and highest (LDA) performing models, the difference was not statistically significant.
Although the finding that LDA outperforms DNN is not unexpected, it highlights the critical importance of aligning model selection with dataset characteristics. Deep learning models generally require large, diverse, and high-dimensional datasets to achieve optimal performance. The relatively small sample size and structured nature of sternal morphometric data likely limited the learning capacity of the DNN in this study.
Despite its lower relative performance in this context, DNNs offer practical advantages that are valuable in forensic applications. They can operate directly on raw CT images without manual measurement or landmarking, enabling fully automated workflows. This reduces operator bias and saves time—especially in cases with limited expert availability or requiring rapid analysis. In addition, DNNs are scalable and adaptable to other bones or populations. As more annotated datasets and higher-resolution imaging become available, their predictive performance is expected to improve.
Similar to other recent studies employing deep learning for sex classification using skeletal imaging [13, 26], our findings confirm that the performance of DNNs is strongly influenced by dataset size, image quality, and anatomical complexity. Despite its lower relative performance in this context, the DNN approach remains a promising complementary tool in forensic anthropology, particularly for future applications involving large imaging archives and minimal human intervention.
They can analyze raw imaging data directly, without requiring manual landmarking or measurements, which enables fully automated workflows. This can be particularly advantageous in time-sensitive forensic scenarios or settings where expert availability is limited. Similar to other recent studies using deep learning for sex classification from skeletal imaging [13, 26], our findings confirm that the performance of DNNs is highly dependent on factors such as image resolution, anatomical region, and dataset scale. Therefore, despite its lower comparative accuracy in this context, the DNN approach remains promising as a supplementary tool—especially as larger annotated datasets and higher-resolution imaging become more widely available.
Importantly, the novelty of our study lies in its broader context. To our knowledge, this is the first investigation applying a DNN to sternum-based sex estimation using CT images in a Turkish population. Moreover, we conducted a comprehensive comparison of multiple predictive models using the same dataset, providing practical insights into their relative strengths and limitations.
Compared to previous studies that utilized traditional statistical methods such as discriminant analysis and logistic regression on sternal measurements [5, 17, 22], our study is among the first to systematically evaluate and compare the performance of both classical machine learning models and deep learning approaches (DNN) within a single dataset. While prior research often focused on single-method evaluation or non-imaging data, we incorporated both morphometric and image-based data to assess prediction accuracy. Furthermore, the application of DNN to CT-derived sternal images in a Turkish population has not been previously documented, adding novelty to our approach. This comprehensive comparison across multiple models provides a more holistic understanding of sex estimation performance in forensic settings.
Although no prior study has applied DNN to sternal images, several studies have used deep learning approaches for sex estimation based on other skeletal parts. For instance, Lee et al. applied convolutional neural networks to pelvic CT scans and achieved an AUC of 0.96 [27], while in our study, the DNN model achieved an AUC of 0.937—competitive despite being applied to a relatively small and anatomically complex bone such as the sternum. The lower performance of the DNN model may be attributed to limited training data and variability in image quality. Adams et al. similarly emphasized the need for large and diverse datasets to fully exploit the predictive potential of deep learning in skeletal analysis [10].
Despite its relatively lower performance, DNN-based analysis offers practical advantages such as full automation, elimination of manual measurements, and potential usability in time-sensitive or resource-constrained forensic contexts. These findings support the relevance of DNN as a supplementary tool and underscore the importance of contextualizing model performance within the specific constraints of forensic applications.
Although we suppose that the present study will contribute to the literature, it has limitations. In this study, sex was assumed to be binary (male–female), whereas sex actually exists along a spectrum and includes additional sex categorizations and gender identities such as people who are intersex/have differences of sex development. The dataset in our study was relatively small, which may have affected the performance of our prediction models. Furthermore, our study lacked external validation, which may have limited the generalizability and reliability of our results. Our study contains up-to-date data on men and women belonging to a certain part of the modern Turkish population. However, the data obtained were collected only from the cases who admitted to our hospital. Since physical characteristics may differ in different populations, our results cannot be generalized to the entire Turkish population. Finally, the use of default hyperparameters may limit the predictive performance of our models. The application of hyperparameter tuning on larger datasets may provide improved performance in future studies.
In conclusion, this study indicated that the sternum of Turkish subjects showed high sexual dimorphism. KNN, random forest, XGBoost, naïve Bayes, logistic regression, and LDA using the measurements of the sternum and DNN using the sternum images provided sex classification accuracy rates of approximately 85–95%. The equations obtained in this study may be useful in forensic contexts, particularly in cases where the pelvis or skull is not available for analysis or as an adjunct to other information.
Future studies may benefit from the development of larger and more diverse datasets to improve the performance and generalizability of deep learning models for sex estimation. Moreover, combining sternal imaging with data from other skeletal parts may yield higher predictive accuracy. Cross-population validation studies using standardized imaging and AI protocols are also needed to assess the robustness of existing models.