This study investigated the prevalence of stroke and predictive factors among adults of different genders in Shanxi Province, China. The stroke prevalence in Shanxi Province was 3.3%, higher than the national average [27]. The stroke prevalence was 4.46% in males and 2.46% in females. Traditional logistic regression analysis revealed that males had a higher risk of stroke, with factors such as abnormal lipid levels, hypertension, diabetes, family history of stroke, coronary heart disease, secondhand smoke exposure, and snoring being identified as risk factors. Urban residents also had a higher risk of stroke, and the risk increased with age.
Additional risk factors for males included respiratory pauses and smoking, while females did not share these factors. Furthermore, the impact of similar risk factors differed between genders. For example, females with a family history of stroke had a 5.56-fold increased risk, whereas males had a 4.22-fold increased risk. Using the MMHC algorithm, Bayesian Networks (BNs) in this study indicated that in males, abnormal lipid levels, hypertension, age were direct risk factors for stroke, while snoring, education level, and respiratory pauses were indirect risk factors. In females, age, hypertension, and secondhand smoke exposure were identified as direct risk factors, while snoring was an indirect risk factor for stroke.
In the study of stroke risk factors, traditional logistic regression methods are typically constructed under the assumption that variables are independent, failing to fully leverage data information and accurately reflect the impact of feature variables on stroke [28]. Traditional logistic regression methods use probabilities to reflect the strength of associations, lacking the ability to comprehensively explain the complex relationships between risk factors [21], and are unable to detect direct or indirect risk factors. Therefore, logistic regression models are not flexible enough in capturing patterns and relationships between data.In contrast, Bayesian Networks (BNs) demonstrate more advantages in building risk factor models compared to logistic regression [29]. Firstly, Bayesian Networks do not require any prior assumptions and have the ability to integrate different variables and analyze their relative importance [28]. Therefore, in recent years, many clinical researchers prefer using Bayesian Networks for quantifying the identification of risk factors in specific pathological diagnoses, prognosis, and supporting medical decision-making in diseases [26, 30].We applied Bayesian Networks (BNs) to the study of stroke risk factors by gender. This not only reveals the risk factors for stroke but also determines their direct and indirect impacts on stroke, providing in-depth insights into the complex network relationships among them. It is noteworthy that when constructing Bayesian Networks, the network becomes more complex with the increase of feature variables, so the construction of Bayesian Networks should be based on the selection of different feature variables. In this study, single-factor chi-square and multi-factor logistic analysis were used to screen variables.
Based on our understanding, our study is the first to apply a Bayesian network to analyze risk factors for stroke based on gender. Compared to traditional logistic regression models, Bayesian networks using the MMHC algorithm have significant advantages in analyzing stroke risk factors. First, the Bayesian network with the MMHC algorithm is a data-driven model constructed on the knowledge base associated with the disease [31], without strict requirements on data distribution. This enables it to better discover potential, less obvious but important data information. This data-driven approach provides a more scientific and comprehensive foundation for the assessment, prediction, and prevention of stroke. Therefore, the application of Bayesian networks allows for a deeper understanding of stroke risk factors, providing more accurate guidance for personalized prevention and control strategies.The second advantage involves the interactions between variables. Logistic regression [31] can only provide risk indications for stroke risk factors. However, when analyzing interactions, logistic regression needs to introduce them into the model through addition or multiplication, which adds complexity and may introduce potential biases. Moreover, logistic regression struggles to clearly illustrate the interactions between variables and is limited in exploring the complex relationships of multiple variables. In contrast, Bayesian networks with the MMHC algorithm allow for an intuitive description of the interconnections between these risk factors through graphical methods and can comprehensively explore their direct and indirect interactions [19].
Stroke is relatively common in the elderly population, and previous research reports indicate that age is an unmodifiable risk factor for stroke, applicable to both men and women [32]. Similar to these study findings, our research reveals a higher incidence of stroke in both men and women in the age range of 60–75 years. The potential mechanisms through which age influences stroke include the natural narrowing and hardening of arteries as individuals age. This change is attributed to alterations resulting from endothelial dysfunction and impaired autoregulation of the brain [33].Additionally, the elderly population often experiences a concomitant state of multiple chronic diseases, such as diabetes, hypertension, atrial fibrillation, as well as coronary artery and peripheral artery diseases. The prevalence of these conditions also increases gradually with age [34].
In this study, hypertension, diabetes, dyslipidemia, coronary heart disease, family history of stroke, secondhand smoke exposure, and snoring were identified as common risk factors for stroke in both men and women. These findings are consistent with previous research results [28, 35]. Hypertension has adverse effects on arteries, leading to atherosclerosis and narrowing, which can result in thrombosis or embolism, triggering a stroke [36]. Elevated blood sugar levels can damage endothelial cells, leading to atherosclerosis and narrowing of arteries, causing vascular damage. Impaired kidney function can increase blood volume, reducing the elasticity of blood vessels [37]. Higher cholesterol levels may increase the inflammation and apoptosis of plaques, making them more prone to rupture. After plaque rupture, platelets and coagulation proteins in the blood aggregate at the site of rupture, forming a clot, ultimately leading to a stroke [36]. Family history may be due to shared lifestyles and habits or genetic factors, such as the association of white matter lesions with vertebrobasilar artery atherosclerotic brain disease, which is an autosomal dominant cerebrovascular disease [38] . Harmful substances in secondhand smoke may have a direct toxic effect on the nervous system, increasing the risk of stroke [39]. Snoring is often accompanied by sleep apnea, where breathing briefly stops during sleep. This can lead to a decrease in blood oxygen levels, increasing the risk of stroke. Snoring may also lead to an increase in inflammation in the body, and inflammation is associated with cardiovascular disease and stroke [40, 41].
The study has several limitations. Firstly, in Bayesian Networks (BNs), directed edges cannot accurately represent causal relationships between nodes and can only express probabilistic dependencies. Secondly, due to the face-to-face survey method used, participants may rely on memory to answer questions, introducing potential reporting or recall biases in estimating the prevalence of various diseases. Additionally, the survey did not collect some important information, including: (a) variables related to women’s characteristics such as menstrual history and reproductive history, making the analysis of risk factors in women potentially incomplete; (b) data on some inflammatory factors, electrocardiogram data, carotid ultrasound, and coronary artery ultrasound; (c) effectiveness data related to dietary factors. Therefore, we were unable to assess the impact of these factors on the risk of stroke. Furthermore, as the study focused on Bayesian Networks using the MMHC algorithm, it did not compare with other hybrid algorithms. Stroke was not differentiated into ischemic and hemorrhagic, and the study did not analyze the differential effects of biochemical indicators such as blood glucose, blood pressure, and cholesterol on stroke in the absence of medication factors. This will be a focus of our future work. Despite these limitations, the findings of this study provide valuable information for the development of health planning and programs aimed at reducing the burden of stroke in Shanxi Province, China.