Apple leaf disease severity grading based on deep learning and the DRL-Watershed algorithm

The experimental process

To enhance the model’s generalization ability and convergence speed, pretrained backbone network parameters were used. During training, the input image size was set to 480 × 480, and the number of epochs was adjusted to 100. The optimal combination of hyperparameters and the best optimization algorithm were selected to ensure the model could effectively learn a sufficient number of features. The Adam optimizer was used for all models, with an initial learning rate of 0.001. If the loss showed little variation, the learning rate was reduced by 50%.

Table 5 indicates that the selection of hyperparameters significantly affects the segmentation performance of the HRNet model in apple leaf disease segmentation tasks.

Table 5 Comparison of hyperparameter performance of HRNet Network

The performance of the HRNet model demonstrates remarkable stability across different hyperparameter settings, with minimal variations in evaluation metrics. The results indicate that HRNet is relatively stable when applied to the apple leaf disease segmentation task. The optimal segmentation performance is achieved when the learning rate is set to 0.0001 and the Adam batch size is 8.

To evaluate the segmentation performance of the Improved HRNet model for four types of apple leaf diseases, a comparison with the original HRNet model was made. Figure 7 compares the performance of HRNet and Improved HRNet in terms of IoU and PA for four diseases: Alternaria Blotch, Brown Spot, Grey Spot, and Rust.

Fig. 7

Comparison of Segmentation Performance between HRNet and Improved HRNet

The results show that, compared to HRNet, the Improved HRNet significantly improved in the Alternaria Blotch segmentation task, with an 11.54% point increase in IoU and a 7.98% point increase in PA. In the Brown Spot segmentation task, IoU increased by 1.5% points and PA by 0.38% points. In the Grey Spot segmentation task, IoU increased by 9.14% points and PA by 2.6% points. However, in the Rust segmentation task, IoU increased by 2.71% points and PA by 0.65% points. These results indicate that the introduction of the NAM attention mechanism in the Improved HRNet model enhances the focus on local features of the lesions, improving the effectiveness of feature extraction and, consequently, the segmentation accuracy. On the other hand, the Brown Spot and Rust diseases have more distinctive features, so HRNet already performed well in these categories, resulting in a smaller improvement in the Improved HRNet model for these tasks.

Comparison of different backbone networks

This study improved the HRNet model by testing three different backbone network widths: HRNet_w18, HRNet_w32, and HRNet_w48, to compare their performance in apple leaf disease segmentation. Figures 8(a) and 8(b) present the training results for apple leaf disease image segmentation using models with different backbone network widths. The experimental results are shown in Table 6.

Fig. 8
figure 8

(a)Loss value change curve; (b) mIoU value change curve

Table 6 Comparison results of different Backbones.

The results show that HRNet_w32 provides the best overall performance. Its average IoU (mIoU) is 82.21%, which is an improvement of 2.07% points over HRNet_w18 and 0.07% points over HRNet_w48. The average pixel accuracy (mPA) is 89.59%, which is an improvement of 2.71% points over HRNet_w18 and 0.97% points over HRNet_w48. Additionally, its average precision (mPrecision) reaches 89.53%. While HRNet_w48 achieves the highest precision of 91.92%, the improvements in mIoU and mPA are marginal. This is likely due to the excessively large network width of HRNet_w48, which leads to overfitting during training and causes a performance bottleneck on the test set. Furthermore, the larger network increases computational complexity and cost, resulting in slightly underwhelming performance in terms of mIoU and mPA. Overall, HRNet_w32 strikes an optimal balance between performance and computational complexity, avoiding the overfitting issue observed with HRNet_w48 while delivering a solid segmentation performance. Therefore, HRNet_w32 is selected as the backbone network for apple leaf disease segmentation tasks.

Comparison of different attention mechanisms

To analyze the performance of different attention mechanisms in the apple leaf disease segmentation task, we conducted comparative experiments using CBAM, SENet, and NAM attention mechanisms. In each experiment, the attention module was added at the same position in the encoder. Figures 9(a) and 9(b) display the training outcomes for models with various attention mechanisms applied to apple leaf disease image segmentation. The experimental results are presented in Table 7.

Fig. 9
figure 9

(a)Loss value change curve; (b) mIoU value change curve

Table 7 Comparison results of different attention Mechanisms

As shown in the table, when the SENet module was added, the mIoU increased by 0.4%, reaching 85.22%. With the addition of the CBAM module, mIoU improved by 2.54%, reaching 87.36%. The introduction of the NAM module resulted in the highest increase of 3.51%, with an mIoU of 88.33%. This demonstrates that NAM outperforms the other attention modules in terms of segmentation accuracy. NAM enhances the model’s ability to learn apple leaf disease features by effectively weighting multi-source information and reducing the interference of background noise and redundancy. Furthermore, during the feature fusion process across different resolutions, NAM improves the quality of multi-scale feature integration, thereby enhancing overall segmentation performance. Compared to CBAM and SENet, NAM exhibits superior capability in capturing local details, suppressing background noise, and adapting to multi-scale features, which significantly boosts the model’s segmentation performance. In apple leaf disease segmentation tasks, the NAM attention mechanism proves to be the most suitable choice.

Performance analysis of ablation experiment

To systematically evaluate the impact of each module on the overall model performance, we designed four ablation experiments. These experiments assess the effects of replacing the backbone network, adding the attention mechanism, and using Focal Loss and Dice Loss functions. The experimental results are summarized in Table 8.

Table 8 Ablation experiment.

The table shows that using HRNet_w32 as the backbone network significantly improved segmentation performance, with mIoU and mPA increasing by 4.68 and 5.22% points, respectively. The introduction of the NAM attention mechanism further boosted mIoU and mPA by 3.51 and 1% point, respectively, due to NAM’s enhancement in the multi-scale feature fusion process, which better refines features across different resolutions. The use of Focal Loss effectively addressed the issue of class imbalance, improving mIoU and mPA by 0.47 and 0.17% points, respectively. Dice Loss enhanced segmentation accuracy for small targets and imbalanced classes, with mIoU and mPA increasing by 0.55 and 0.21% points, respectively.

When HRNet_w32, NAM attention mechanism, Focal Loss, and Dice Loss were combined, the model achieved the highest performance, with mIoU and mPA improving by 8.77 and 7.25% points, respectively. This significantly enhanced the model’s segmentation performance for apple leaf disease.

Comparison with other segmentation methods

To further validate the segmentation performance of the improved HRNet model, we compared it with several classic semantic segmentation models commonly used for plant disease tasks, including DeeplabV3 +[26] , U-Net27and PSPNet28. The results are presented in Fig. 10.

Fig. 10
figure 10

Comparison Results of Different Models

As shown in Fig. 9, the proposed model outperforms the others in disease segmentation, achieving the best accuracy with an mIoU of 88.91% and an mPA of 94.13%. The DeeplabV3 + model performed the worst, with an mIoU of 79.20% and an mPA of 87.35%. The U-Net model showed relatively superior segmentation performance, with an mIoU of 80.85% and an mPA of 86.38%. The PSPNet model had an mIoU of 79.71% and an mPA of 87.86%. The experimental results indicate that the NAM attention mechanism incorporated into HRNet enhances the model’s feature extraction and representation abilities. Additionally, the optimization of the loss functions im-proves the model’s segmentation accuracy for diseased areas and addresses the seg-mentation accuracy issues caused by data sample imbalance during training. Overall, the HRNet model, with its high-resolution feature representation, is better suited to the requirements of apple leaf disease segmentation tasks.

This study visualized the segmentation results of five algorithms: Improved HRNet, HRNet, DeeplabV3+, U-Net, and PSPNet, as shown in Fig. 11.

Fig. 11
figure 11

Comparison of segmentation effects

Figure 11 reveals distinct performance variations among models in disease segmentation tasks. The morphological and chromatic similarity between Alternaria Blotch and Grey spot lesions induced misclassification errors in Models C, D, and E, which erroneously identified Alternaria Blotch as Grey spot. These models also demonstrated inadequate precision in segmenting overlapping healthy leaf regions. In Brown Spot segmentation, Models D and E showed minor false positives, while Models B, C, and E suffered significant under-segmentation issues. Grey spot detection revealed two critical failures: Models D and E produced misclassifications, Models B, D, and E generated oversimplified healthy tissue delineation, and Model C even segmented non-existent targets. For Rust identification, Model E exhibited false positives, while Models B-D displayed insufficient resolution in overlapping healthy leaf areas.

Notably, the Improved HRNet achieved accurate four-disease differentiation with exceptional edge delineation and complete lesion morphology while achieving pixel-level precision at disease-leaf boundaries. This architecture demonstrated superior robustness and segmentation accuracy through its hierarchical feature integration mechanism, effectively addressing the critical challenges of inter-class similarity and complex edge topology that compromised conventional models.

Assessment of disease severity levels

To accurately assess the severity of apple leaf diseases, this study refers to the local standard of Shanxi Province, “DB14/T 143–2019 Apple Brown Spot Disease Monitoring and Survey Guidelines,” to establish grading parameters for apple brown spot disease. Based on pixel statistics, Python was used to calculate the pixel count of the diseased and healthy leaf areas. The leaf disease severity was classified into six levels: Level 0 (healthy leaf), Level 1, Level 3, Level 5, Level 7, and Level 9. The detailed leaf disease grading standards are shown in Table 9.

Table 9 Classification table for Apple leaf Diseases

Where, represents the ratio of the diseased area to the area of a single leaf, and is calculated using the following formula:

$${text{k}}=frac{{{A_{scab}}}}{{{A_{leaf}}}}=frac{{sumnolimits_{{(x,y) in scab}} {pixel(x,y)} }}{{sumnolimits_{{(x,y) in leaf}} {pixel(x,y)} }}$$

(8)

In the formula, Ascab denotes the area of the diseased region, Aleaf represents the area of a single leaf, and pixel(x, y) is used to count the number of pixels corresponding to the diseased and leaf regions, respectively.

In the process of grading apple leaf diseases in complex backgrounds, the diversity of leaf shapes and the complexity of the background affect pixel statistics, which in turn influences the grading results. To address this, pixel statistical analyses were performed under three scenarios: a single leaf, separated multiple leaves, and overlapping multiple leaves. The DRL-Watershed algorithm was used to accurately count the pixels of the disease and the leaf area in each scenario, ensuring the accuracy of the grading results. The visualized segmentation results of the DRL-Watershed algorithm for the three cases are shown in Fig. 12:

Fig. 12
figure 12

Visualization of DRL-Watershed Algorithm Results

Pixel statistical analysis for a single leaf

To verify the effectiveness of the DRL-Watershed algorithm in pixel counting for a single leaf, a comparative experiment was conducted using the pixel statistics from the improved HRNet model. The grading results for disease severity on a single leaf are shown in Table 10.

Table 10 Example of single leaf disease grading Results

As shown in Table 10, for the improved HRNet model, the total number of pixels in the leaf area (sum of leaf and disease pixels) is 127,917, with the disease occupying 46% of the area, resulting in a disease level of Level 9. In the DRL-Watershed algorithm, the number of leaf pixels is 126,026, with the disease area occupying 47%, and the disease level is also Level 9. This demonstrates that both the Improved HRNet model and the DRL-Watershed algorithm were able to accurately count the leaf pixels and calculate the disease proportion, yielding corresponding disease severity levels.

Pixel statistical analysis for separated multiple leaves

The principle of using the watershed algorithm for handling multi-leaf separation in disease severity assessment is illustrated in Fig. 13. To assess the performance of the DRL-Watershed algorithm for scenarios with multiple separated leaves, pixel statistics were compared with the results from the improved HRNet model. The disease severity grading results for the separated leaves are shown in Table 11.

Fig. 13
figure 13

Example of the principle of the watershed algorithm

Table 11 Example of grading results for separated multiple Leaves

As shown in Table 11, the improved HRNet model calculates the disease-to-leaf pixel ratio across the entire image, resulting in a disease proportion of 18.22% and a disease level of Level 5, which reduces the overall disease proportion and severity. In contrast, the DRL-Watershed algorithm separately counts the pixels for each individual leaf and computes the disease ratio for each leaf, providing a more accurate reflection of the disease severity. For example, the DRL-Watershed algorithm calculates that the disease proportion for Leaf Area 2 is 26.26%, corresponding to a disease level of Level 7, which accurately represents the disease severity on each leaf and offers a more precise grading assessment.

Pixel statistical analysis for overlapping multiple leaves

Similarly, the principle of applying the watershed algorithm for assessing disease severity in overlapping multi-leaf scenarios is demonstrated in Fig. 14. To evaluate the DRL-Watershed algorithm’s performance for scenarios with overlapping leaves, a comparative experiment was performed using pixel statistics from the improved HRNet model. The disease grading results for the overlapping multiple leaves are shown in Table 12.

Fig. 14
figure 14

Example of the principle of the watershed algorithm

Table 12 Example of grading results for overlapping multiple Leaves

As shown in Table 12, the improved HRNet model calculates the disease-to-total-leaf pixel ratio, yielding a disease proportion of 24% and a corresponding disease level of Level 5. This method results in an underestimation of the disease severity because the disease pixels are compared to the total leaf area across all leaves. However, the DRL-Watershed algorithm effectively segments the overlapping leaf regions, allowing it to calculate the pixel count for each individual leaf. In the two overlapping areas, the DRL-Watershed algorithm calculates the disease proportion for Area 1 as 32%, with a corresponding disease level of Level 7, and for Area 2 as 22%, corresponding to Level 5. This approach provides a more accurate reflection of the disease severity in each overlapping leaf region, yielding a grading assessment closer to the actual situation.

Severity grading statistical analysis

In this study, the performance of the HRNet model and the DRL-Watershed algorithm in grading apple leaf diseases on the test set was evaluated and analyzed using confusion matrices. The true severity levels were determined based on the ratio of the disease area to the leaf area during the data annotation process. Figure 15 compares the predicted results of the improved HRNet model with the true results, and the disease severity evaluation results of the DRL-Watershed algorithm on the same test set, respectively. The vertical axis represents the true labels, while the horizontal axis represents the model’s predictions. Each cell in the matrix contains the number of samples where a true category was predicted as a specific category. Higher values in the diagonal blocks indicate that the model correctly predicted the disease severity levels. The diagonal elements of the confusion matrix represent the number of correctly classified samples, while the off-diagonal elements represent misclassifications. The intensity of the diagonal color corresponds to the accuracy of the grading for each level.

Fig. 15
figure 15

Disease severity confusion matrix. (a) Confusion Matrix for Grading Evaluation of the Improved HRNet Model.; (b) Confusion Matrix for Grading Evaluation of the DRL-Watershed Algorithm

In the confusion matrix of the HRNet model, 89 samples were correctly classified as Level 1. For Level 3, 21 samples were misclassified as Level 1, while 36 samples were correctly classified as Level 3. In Level 5, 2 samples were misclassified as Level 3, but most samples were correctly classified as Level 5. Levels 7 and 9 exhibited some classification confusion, particularly with samples from Level 7 being misclassified as either Level 5 or Level 9. The HRNet model demonstrates high accuracy in predicting lower disease severity levels. However, there is some error in predicting higher severity levels, which can be attributed to the HRNet model’s tendency to underestimate disease severity in multi-leaf scenarios. This occurs because the HRNet model does not distinguish between individual leaves when processing multiple leaves, leading to lower predicted severity levels compared to the actual severity.

The confusion matrix for the DRL-Watershed algorithm shows significant improvements, particularly for Level 3. In the 55 samples for Level 3, only 2 were misclassified as Level 1, with the rest correctly classified as Level 3. Classification accuracy for Level 5 also improved, with 31 samples correctly classified and only 1 misclassified as Level 3. For Level 9, all 13 samples were correctly classified. Compared to Figure a, the classification results for Level 7 were notably better, with 8 samples correctly classified and no significant misclassifications. These results suggest that the DRL-Watershed algorithm, by separately analyzing the disease proportion in each leaf region, provides a more accurate assessment of disease severity, especially in complex and overlapping leaf scenarios.

In the confusion matrix for the DRL-Watershed algorithm, two Level 3 samples were misclassified as Level 1, one Level 5 sample as Level 3, and one Level 9 sample as Level 3. These errors may be caused by noise introduced by lighting, shadows, or other environmental factors, which result in unclear boundaries in the overlapping leaf regions. In these regions, the gradients are less pronounced, leading to inaccurate seg-mentation of the leaf area, which in turn affects the final disease severity predictions.

Continue Reading