Deep nested U-structure network with frequency attention for building semantic segmentation

  • Zhang, X. et al. An improved encoder–decoder network based on strip pool method applied to segmentation of farmland vacancy field. Entropy https://doi.org/10.3390/e23040435 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Li, D. et al. Building extraction from airborne multi-spectral lidar point clouds based on graph geometric moments convolutional neural networks. Remote Sensing 12, 3186 (2020).

    ADS 

    Google Scholar 

  • Peng, B., Al-Huda, Z., Xie, Z. & Wu, X. Multi-scale region composition of hierarchical image segmentation. Multimed. Tools Appl. 8, 1–23 (2020).

    Google Scholar 

  • Al-Huda, Z., Peng, B., Yang, Y. & Ahmed, M. Object scale selection of hierarchical image segmentation using reliable regions. In 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), 1081–1088 (IEEE, 2019).

  • Algabri, R. & Choi, M.-T. Deep-learning-based indoor human following of mobile robot using color feature. Sensors 20, 2699 (2020).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Algabri, R. & Choi, M.-T. Target recovery for robust deep learning-based person following in mobile robots: Online trajectory prediction. Appl. Sci. 11, 4165 (2021).

    CAS 

    Google Scholar 

  • Yu, B., Yang, L. & Chen, F. Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 11, 3252–3261 (2018).

    ADS 

    Google Scholar 

  • Ok, A. O. Automated detection of buildings from single vhr multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote. Sens. 86, 21–40 (2013).

    ADS 

    Google Scholar 

  • Ghanea, M., Moallem, P. & Momeni, M. Building extraction from high-resolution satellite images in urban areas: Recent methods and strategies against significant challenges. Int. J. Remote Sens. 37, 5234–5248 (2016).

    Google Scholar 

  • Gao, H., Tang, Y., Jing, L., Li, H. & Ding, H. A novel unsupervised segmentation quality evaluation method for remote sensing images. Sensors 17, 2427 (2017).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ahmadi, S., Zoej, M. V., Ebadi, H., Moghaddam, H. A. & Mohammadzadeh, A. Automatic urban building boundary extraction from high resolution aerial images using an innovative model of active contours. Int. J. Appl. Earth Obs. Geoinf. 12, 150–157 (2010).

    ADS 

    Google Scholar 

  • Sun, Y., Zhang, X., Zhao, X. & Xin, Q. Extracting building boundaries from high resolution optical images and lidar data by integrating the convolutional neural network and the active contour model. Remote Sens. 10, 1459 (2018).

    ADS 
    CAS 

    Google Scholar 

  • Vakalopoulou, M., Karantzalos, K., Komodakis, N. & Paragios, N. Building detection in very high resolution multispectral data with deep learning features. In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 1873–1876 (IEEE, 2015).

  • Chen, L.-C., Papandreou, G., Schroff, F. & Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).

  • Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

  • Wang, F. et al. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3156–3164 (2017).

  • Jégou, M., Drozdzal, D., Vazquez, A. & Romero, Y. B. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Computer Vision and Pattern Recognition, 1175–1183. (IEEE, 2017).

  • Cai, J. & Chen, Y. Mha-net: Multipath hybrid attention network for building footprint extraction from high-resolution remote sensing imagery. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 14, 5807–5817 (2021).

    ADS 

    Google Scholar 

  • Wei, S., Ji, S. & Lu, M. Toward automatic building footprint delineation from aerial images using cnn and regularization. IEEE Trans. Geosci. Remote Sens. 58, 2178–2189 (2019).

    ADS 

    Google Scholar 

  • Sun, K., Xiao, B., Liu, D. & Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5693–5703 (2019).

  • Zhu, Q., Liao, C., Hu, H., Mei, X. & Li, H. Map-net: Multiple attending path neural network for building footprint extraction from remote sensed imagery. IEEE Trans. Geosci. Remote Sens. 59, 6169–6181 (2020).

    ADS 

    Google Scholar 

  • Ji, S., Wei, S. & Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 57, 574–586 (2018).

    ADS 

    Google Scholar 

  • Zhu, Y., Liang, Z., Yan, J., Chen, G. & Wang, X. Ed-net: Automatic building extraction from high-resolution aerial images with boundary information. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 14, 4595–4606 (2021).

    ADS 

    Google Scholar 

  • Yang, G., Zhang, Q. & Zhang, G. Eanet: Edge-aware network for the extraction of buildings from aerial images. Remote Sens. 12, 2161 (2020).

    ADS 

    Google Scholar 

  • Takikawa, T., Acuna, D., Jampani, V. & Fidler, S. Gated-scnn: Gated shape cnns for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5229–5238 (2019).

  • Xu, Y., Wu, L., Xie, Z. & Chen, Z. Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens. 10, 144 (2018).

    ADS 

    Google Scholar 

  • Alshehhi, R., Marpu, P. R., Woon, W. L. & Dalla Mura, M. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 130, 139–149 (2017).

    ADS 

    Google Scholar 

  • Guo, Z. et al. Village building identification based on ensemble convolutional neural networks. Sensors 17, 2487 (2017).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440 (2015).

  • Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017).

    PubMed 

    Google Scholar 

  • Ronneberger, O. Invited talk: U-net convolutional networks for biomedical image segmentation. In Bildverarbeitung für die Medizin 2017, 3–3 (Springer, 2017).

  • Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017).

    PubMed 

    Google Scholar 

  • Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder–decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), 801–818 (2018).

  • Kang, J. et al. Picoco: Pixelwise contrast and consistency learning for semisupervised building footprint segmentation. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 14, 10548–10559. https://doi.org/10.1109/JSTARS.2021.3119286 (2021).

    Article 
    ADS 

    Google Scholar 

  • Fenglei, W., Xin, G., Zongze, Z., Lida, X. & Chao, M. A boundary-enhanced semantic segmentation model for buildings. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. (2025).

  • Li, J., Hu, Y. & Huang, X. Casaformer: A cross-and self-attention based lightweight network for large-scale building semantic segmentation. Int. J. Appl. Earth Obs. Geoinf. 130, 103942 (2024).

    Google Scholar 

  • Pu, X., Jia, H., Zheng, L., Wang, F. & Xu, F. Classwise-sam-adapter: Parameter efficient fine-tuning adapts segment anything to sar domain for semantic segmentation. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. (2025).

  • Jin, Q. et al. Iterative pseudo-labeling based adaptive copy-paste supervision for semi-supervised tumor segmentation. Knowl. Based Syst. 8, 113785 (2025).

    Google Scholar 

  • Jin, Q. et al. Inter-and intra-uncertainty based feature aggregation model for semi-supervised histopathology image segmentation. Expert Syst. Appl. 238, 122093 (2024).

    Google Scholar 

  • Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

  • Wang, X., Girshick, R., Gupta, A. & He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794–7803 (2018).

  • Cho, K., Courville, A. & Bengio, Y. Describing multimedia content using attention-based encoder–decoder networks. IEEE Trans. Multimed. 17, 1875–1886 (2015).

    Google Scholar 

  • Jaderberg, M. et al. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 32 (2015).

    Google Scholar 

  • Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141 (2018).

  • Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), 3–19 (2018).

  • Yang, Y. & Soatto, S. Fda: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4085–4095 (2020).

  • Qin, Z., Zhang, P., Wu, F. & Li, X. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 783–792 (2021).

  • Patro, B. N., Namboodiri, V. P. & Agneeswaran, V. S. Spectformer: Frequency and attention is what you need in a vision transformer. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 9543–9554 (IEEE, 2025).

  • Rao, Y., Zhao, W., Zhu, Z., Lu, J. & Zhou, J. Global filter networks for image classification. Adv. Neural. Inf. Process. Syst. 34, 980–993 (2021).

    Google Scholar 

  • Guibas, J. et al. Adaptive Fourier neural operators: Efficient token mixers for transformers. arXiv preprint arXiv:2111.13587 (2021).

  • Xu, Z., Gong, H., Wan, X. & Li, H. Asc: Appearance and structure consistency for unsupervised domain adaptation in fetal brain mri segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 325–335 (Springer, 2023).

  • Qin, X. et al. U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recogn. 106, 107404 (2020).

    Google Scholar 

  • Zheng, Z., Zhang, S., Shen, J., Shao, Y. & Zhang, Y. A two-stage cnn for automated tire defect inspection in radiographic image. Meas. Sci. Technol. 32, 115403 (2021).

    ADS 
    CAS 

    Google Scholar 

  • Shi, W., Jiang, F. & Zhao, D. Single image super-resolution with dilated convolution based multi-scale information learning inception module. In 2017 IEEE International Conference on Image Processing (ICIP), 977–981 (IEEE, 2017).

  • Iandola, F. et al. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869 (2014).

  • Wang, Z., Simoncelli, E. & Bovik, A. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, 1398–1402. https://doi.org/10.1109/ACSSC.2003.1292216 (2003).

  • Guo, J.-M., Markoni, H. & Lee, J.-D. Barnet: Boundary aware refinement network for crack detection. IEEE Trans. Intell. Transport. Syst. https://doi.org/10.1109/TITS.2021.3069135 (2021).

    Article 

    Google Scholar 

  • Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder–decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) (2018).

  • Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017).

    PubMed 

    Google Scholar 

  • Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015 (eds Navab, N. et al.) 234–241 (Springer, 2015).

    Google Scholar 

  • Zuo, X., Shao, Z., Wang, J., Huang, X. & Wang, Y. A cross-stage features fusion network for building extraction from remote sensing images. Geo-Spatial Inf. Sci. 6, 1–15 (2024).

    Google Scholar 

  • Zhu, W. et al. A method for building extraction in remote sensing images based on swintransformer. Int. J. Digit. Earth 17, 2353113 (2024).

    ADS 

    Google Scholar 

  • Cao, S. et al. Bemrf-net: Boundary enhancement and multiscale refinement fusion for building extraction from remote sensing imagery. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. (2024).

  • Wang, W. et al. Tdfnet: Twice decoding v-mamba-cnn fusion features for building extraction. Geo-spatial Inf. Sci. 6, 1–20 (2025).

    Google Scholar 

  • Ma, X., Zhang, X. & Pun, M.-O. Rs 3 mamba: Visual state space model for remote sensing image semantic segmentation. IEEE Geosci. Remote Sens. Lett. 21, 1–5 (2024).

    Google Scholar 

  • Continue Reading