When deep learning meets multi-scale data

Blog

Authors

Jiangtao Liu

Ph.D. Student, Civil and Environmental Engineering

July 6, 2022

Detailed and accurate soil moisture data are essential for many applications, such as monitoring drought and crop irrigation; mapping landslides and flooding. Of all the ways to obtain high-resolution soil moisture data, Deep Learning (DL) is one of the top-performing methods. DL has set a new standard in model performance metrics in soil moisture, streamflow, groundwater, and model uncertainty estimates.

Fig1. Multi-scale data acquisition and processing

Deep Learning models use large amounts of data from different scales and sources in scientific research. However, the data often have their limitations. For example, In-situ (locally sampled) observations are accurate but limited in spatial coverage. At the same time, global satellite data products are too coarse in their resolution and challenging to apply to agricultural research below 10 km.

Furthermore, there are many variables that we cannot observe, such as groundwater, deeper soil moisture, and the state of the vegetation. These are fatal flaws in the soil moisture supervised learning framework. How do we overcome such limitations? We believe we can answer these questions by learning from both in situ and remotely sensed data. Metaphorically speaking, each data source is a teacher, with our model being the student. We can learn from multiple teachers to become better than either. In this project, we proposed a novel multi-scale scheme that learns from various data sources and scales simultaneously, overcoming the limitations of a single data source. Learning from multiple data sources allows data-driven models to break free from the confines of any single data source, creating a student that can surpass its teachers.

Fig2. Core Validation Sites (CVS), Soil Climate Analysis Network (SCAN), and United States Climate Reference Network (USCRN) soil moisture observation sites throughout the conterminous United States.

The multi-scale scheme can be roughly described as a four-step process.

The Long Short-Term Memory (LSTM) model is run on a fine scale with uniform weights for each fine grid cell.
Aggregate them at the coarse-scale, average the values, then calculate the satellite loss values.
Calculate the in-situ loss using the fine-scale data.
Combining the two-loss terms. And w is the weight for satellite loss. In extreme cases, w=0, the model only learns from satellite grid data, and w=1, the model only learns from in-situ data.

Fig3. Illustration of the multi-scale scheme

Based on spatial cross-validation over sites in the conterminous United States, the multi-scale scheme obtained a median correlation of 0.901 and root-mean-square error of 0.034 m3/m3. It outperformed the Soil Moisture Active Passive satellite mission’s 9 km product, DL models trained on in situ data alone, and land surface models.

Thus, our experiments show that inputting multiple data sources into our framework creates significantly more accurate measurements. You can learn from teacher one and teacher two, assimilating the best of them and becoming considerably better than either.

Reference: Jiangtao Liu, Farshid Rahmani, Kathryn Lawson, and Chaopeng Shen. “A Multi-scale Deep Learning Model for Soil Moisture Integrating Satellite and In Situ Data.” Geophysical Research Letters 49, no. 7 (2022): e2021GL096847. https://doi.org/10.1029/2021GL096847

This article was originally on Medium (https://medium.com/@psuwaterstudentgroup/when-deep-learning-meets-multi-scale-data-aa759571d542) and was republished with permission.

Blog

Authors