Advancing Earth observation with a multi-modal remote sensing foundation model

Using optical, infrared and radar signals that come from diverse satellite platforms, remote sensing provides comprehensive observation of the Earth with different temporal, spatial and spectral resolutions. However, the complexity and heterogeneity of these data make their processing and integration challenging. Foundation models demonstrate generalizability across different Earth observation applications, and remote sensing foundation models (RSFMs) have been developed to extract generic features from extensive remote sensing images (RSIs), enabling adaptation to various downstream tasks through fine-tuning. Despite their potential, RSFMs face limitations in generalizability across Earth observation tasks that include inadequate support for multi-modal and temporal inputs, limited few-shot capabilities, and insufficient use of semantic information. Now, writing in Nature Machine Intelligence, Yansheng Li and colleagues present SkySense++, an RSFM that, leveraging pretraining on vast multi-modal RSIs, enhances generalizability across Earth observation tasks.

The ability of the model to handle unseen tasks with minimal labelled data without fine-tuning is particularly beneficial for time-sensitive Earth observation applications. Future work could focus on further scaling the model and exploring the integration of large language models to enhance performance across a broader range of tasks. “Concurrently, we plan to couple the model with multi-type geoscience knowledge, such as physical models and geographical principles to enhance the accuracy and depth of its analytical and interpretative power”, concludes Li.

Continue Reading