Zhang, X. et al. An improved encoder–decoder network based on strip pool method applied to segmentation of farmland vacancy field. Entropy https://doi.org/10.3390/e23040435 (2021).
Google Scholar
Li, D. et al. Building extraction from airborne multi-spectral lidar point clouds based on graph geometric moments convolutional neural networks. Remote Sensing 12, 3186 (2020).
Google Scholar
Peng, B., Al-Huda, Z., Xie, Z. & Wu, X. Multi-scale region composition of hierarchical image segmentation. Multimed. Tools Appl. 8, 1–23 (2020).
Al-Huda, Z., Peng, B., Yang, Y. & Ahmed, M. Object scale selection of hierarchical image segmentation using reliable regions. In 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), 1081–1088 (IEEE, 2019).
Algabri, R. & Choi, M.-T. Deep-learning-based indoor human following of mobile robot using color feature. Sensors 20, 2699 (2020).
Google Scholar
Algabri, R. & Choi, M.-T. Target recovery for robust deep learning-based person following in mobile robots: Online trajectory prediction. Appl. Sci. 11, 4165 (2021).
Google Scholar
Yu, B., Yang, L. & Chen, F. Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 11, 3252–3261 (2018).
Google Scholar
Ok, A. O. Automated detection of buildings from single vhr multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote. Sens. 86, 21–40 (2013).
Google Scholar
Ghanea, M., Moallem, P. & Momeni, M. Building extraction from high-resolution satellite images in urban areas: Recent methods and strategies against significant challenges. Int. J. Remote Sens. 37, 5234–5248 (2016).
Gao, H., Tang, Y., Jing, L., Li, H. & Ding, H. A novel unsupervised segmentation quality evaluation method for remote sensing images. Sensors 17, 2427 (2017).
Google Scholar
Ahmadi, S., Zoej, M. V., Ebadi, H., Moghaddam, H. A. & Mohammadzadeh, A. Automatic urban building boundary extraction from high resolution aerial images using an innovative model of active contours. Int. J. Appl. Earth Obs. Geoinf. 12, 150–157 (2010).
Google Scholar
Sun, Y., Zhang, X., Zhao, X. & Xin, Q. Extracting building boundaries from high resolution optical images and lidar data by integrating the convolutional neural network and the active contour model. Remote Sens. 10, 1459 (2018).
Google Scholar
Vakalopoulou, M., Karantzalos, K., Komodakis, N. & Paragios, N. Building detection in very high resolution multispectral data with deep learning features. In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 1873–1876 (IEEE, 2015).
Chen, L.-C., Papandreou, G., Schroff, F. & Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
Wang, F. et al. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3156–3164 (2017).
Jégou, M., Drozdzal, D., Vazquez, A. & Romero, Y. B. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Computer Vision and Pattern Recognition, 1175–1183. (IEEE, 2017).
Cai, J. & Chen, Y. Mha-net: Multipath hybrid attention network for building footprint extraction from high-resolution remote sensing imagery. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 14, 5807–5817 (2021).
Google Scholar
Wei, S., Ji, S. & Lu, M. Toward automatic building footprint delineation from aerial images using cnn and regularization. IEEE Trans. Geosci. Remote Sens. 58, 2178–2189 (2019).
Google Scholar
Sun, K., Xiao, B., Liu, D. & Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5693–5703 (2019).
Zhu, Q., Liao, C., Hu, H., Mei, X. & Li, H. Map-net: Multiple attending path neural network for building footprint extraction from remote sensed imagery. IEEE Trans. Geosci. Remote Sens. 59, 6169–6181 (2020).
Google Scholar
Ji, S., Wei, S. & Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 57, 574–586 (2018).
Google Scholar
Zhu, Y., Liang, Z., Yan, J., Chen, G. & Wang, X. Ed-net: Automatic building extraction from high-resolution aerial images with boundary information. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 14, 4595–4606 (2021).
Google Scholar
Yang, G., Zhang, Q. & Zhang, G. Eanet: Edge-aware network for the extraction of buildings from aerial images. Remote Sens. 12, 2161 (2020).
Google Scholar
Takikawa, T., Acuna, D., Jampani, V. & Fidler, S. Gated-scnn: Gated shape cnns for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5229–5238 (2019).
Xu, Y., Wu, L., Xie, Z. & Chen, Z. Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens. 10, 144 (2018).
Google Scholar
Alshehhi, R., Marpu, P. R., Woon, W. L. & Dalla Mura, M. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 130, 139–149 (2017).
Google Scholar
Guo, Z. et al. Village building identification based on ensemble convolutional neural networks. Sensors 17, 2487 (2017).
Google Scholar
Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440 (2015).
Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017).
Google Scholar
Ronneberger, O. Invited talk: U-net convolutional networks for biomedical image segmentation. In Bildverarbeitung für die Medizin 2017, 3–3 (Springer, 2017).
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017).
Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder–decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), 801–818 (2018).
Kang, J. et al. Picoco: Pixelwise contrast and consistency learning for semisupervised building footprint segmentation. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 14, 10548–10559. https://doi.org/10.1109/JSTARS.2021.3119286 (2021).
Google Scholar
Fenglei, W., Xin, G., Zongze, Z., Lida, X. & Chao, M. A boundary-enhanced semantic segmentation model for buildings. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. (2025).
Li, J., Hu, Y. & Huang, X. Casaformer: A cross-and self-attention based lightweight network for large-scale building semantic segmentation. Int. J. Appl. Earth Obs. Geoinf. 130, 103942 (2024).
Pu, X., Jia, H., Zheng, L., Wang, F. & Xu, F. Classwise-sam-adapter: Parameter efficient fine-tuning adapts segment anything to sar domain for semantic segmentation. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. (2025).
Jin, Q. et al. Iterative pseudo-labeling based adaptive copy-paste supervision for semi-supervised tumor segmentation. Knowl. Based Syst. 8, 113785 (2025).
Jin, Q. et al. Inter-and intra-uncertainty based feature aggregation model for semi-supervised histopathology image segmentation. Expert Syst. Appl. 238, 122093 (2024).
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
Wang, X., Girshick, R., Gupta, A. & He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794–7803 (2018).
Cho, K., Courville, A. & Bengio, Y. Describing multimedia content using attention-based encoder–decoder networks. IEEE Trans. Multimed. 17, 1875–1886 (2015).
Jaderberg, M. et al. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 32 (2015).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141 (2018).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), 3–19 (2018).
Yang, Y. & Soatto, S. Fda: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4085–4095 (2020).
Qin, Z., Zhang, P., Wu, F. & Li, X. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 783–792 (2021).
Patro, B. N., Namboodiri, V. P. & Agneeswaran, V. S. Spectformer: Frequency and attention is what you need in a vision transformer. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 9543–9554 (IEEE, 2025).
Rao, Y., Zhao, W., Zhu, Z., Lu, J. & Zhou, J. Global filter networks for image classification. Adv. Neural. Inf. Process. Syst. 34, 980–993 (2021).
Guibas, J. et al. Adaptive Fourier neural operators: Efficient token mixers for transformers. arXiv preprint arXiv:2111.13587 (2021).
Xu, Z., Gong, H., Wan, X. & Li, H. Asc: Appearance and structure consistency for unsupervised domain adaptation in fetal brain mri segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 325–335 (Springer, 2023).
Qin, X. et al. U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recogn. 106, 107404 (2020).
Zheng, Z., Zhang, S., Shen, J., Shao, Y. & Zhang, Y. A two-stage cnn for automated tire defect inspection in radiographic image. Meas. Sci. Technol. 32, 115403 (2021).
Google Scholar
Shi, W., Jiang, F. & Zhao, D. Single image super-resolution with dilated convolution based multi-scale information learning inception module. In 2017 IEEE International Conference on Image Processing (ICIP), 977–981 (IEEE, 2017).
Iandola, F. et al. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869 (2014).
Wang, Z., Simoncelli, E. & Bovik, A. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, 1398–1402. https://doi.org/10.1109/ACSSC.2003.1292216 (2003).
Guo, J.-M., Markoni, H. & Lee, J.-D. Barnet: Boundary aware refinement network for crack detection. IEEE Trans. Intell. Transport. Syst. https://doi.org/10.1109/TITS.2021.3069135 (2021).
Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder–decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) (2018).
Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017).
Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015 (eds Navab, N. et al.) 234–241 (Springer, 2015).
Zuo, X., Shao, Z., Wang, J., Huang, X. & Wang, Y. A cross-stage features fusion network for building extraction from remote sensing images. Geo-Spatial Inf. Sci. 6, 1–15 (2024).
Zhu, W. et al. A method for building extraction in remote sensing images based on swintransformer. Int. J. Digit. Earth 17, 2353113 (2024).
Google Scholar
Cao, S. et al. Bemrf-net: Boundary enhancement and multiscale refinement fusion for building extraction from remote sensing imagery. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. (2024).
Wang, W. et al. Tdfnet: Twice decoding v-mamba-cnn fusion features for building extraction. Geo-spatial Inf. Sci. 6, 1–20 (2025).
Ma, X., Zhang, X. & Pun, M.-O. Rs 3 mamba: Visual state space model for remote sensing image semantic segmentation. IEEE Geosci. Remote Sens. Lett. 21, 1–5 (2024).