Data assimilation (DA) is the science of fusing information from numerical model forecasts with information from observations. It is the backbone of Earth system prediction. DA faces significant challenges associated with the high-resolution, multiscale, coupled Earth system modeling, and a large amount of diverse and complex observations sampling a variety of scales. The next-generation DA is required to effectively analyze the state and quantify its uncertainty across multiple scales and various Earth system components. The recent promises of machine learning (ML) demonstrated in Earth system science suggest new opportunities for DA. This seminar first briefly presents the connection between ML and DA, and the challenges specific to the multiscale data assimilation (MDA), followed by a discussion of a few recent efforts toward novel use of ML to improve MDA.
First, MAPCast, a newly developed, Graph Neural Networks (GNN) based convection-allowing emulator, is introduced to produce the surrogate background ensembles toward MDA. MAPCast surrogate ensemble not only emulates the storms simulated in the physics-based numerical model (e.g. MPAS), but also shows promise in emulating the background ensemble covariances at a variety of scales.
Second, a new approach that hybridizes the physics-based and the cost-effective ML surrogate background ensembles in DA is proposed. This approach is implemented and evaluated in a cycled Local Gain Form Ensemble Transform Kalman Filter (LGETKF) with the surface-quasi-geostrophic (SQG) model. Experiments have shown that the new hybrid approach outperforms the pure physics-based and the pure data-driven approach for both the analysis and free forecast.
Next, we performed a comprehensive investigation of the surrogate background ensemble from Google’s global Graphcast for hurricane data assimilation against the physics-based global model. It is found that GraphCast captures the key background error structures of the hurricane vortex but under-represents spread and exhibits systematic error correlation biases, especially for the processes associated with the secondary circulation. In the hurricane environment, binned correlations show linear relationships with the physics-based model in general, but GraphCast shows a reduced spread and a flatter empirical orthogonal function spectrum with more perturbation growth distributed to smaller-scale features.
In addition to the development and investigation of ML in emulating the full, physics-based numerical models for DA, a novel approach that utilizes Convolution Neural Networks (CNN) to determine critical parameters in DA is proposed. It is demonstrated that the background error covariance enhanced by the CNN resembles the shape of the true small- and large-scale error covariances more accurately than the traditional Gaspari-Cohn function, resulting in more accurate analysis with a reduced cost.
