Link to OceaniX calendar with all scheduled events and meetings: here

OceaniX Sandbox

You can find here all the informations and presentations of interest from Friday morning OceaniX team meetings.

Zoom conference for session on April 29th, 11am: link

Upcoming sessions

Koopman eigenfunctions estimation from reproducing kernel Hilbert space manifold, and ensemble data assimilation


This study aims at proposing a new framework to perform ensemble-based estimations of dynamical trajectories of a geophysical fluid flow system. To perform efficient estimations, the ensemble members are embedded in a set of evolving reproducing kernel Hilbert spaces (RKHS) defining a manifold of spaces, we nicknamed Wonderland, due to its analytical properties. The method proposed here is designed to deal with very large scale systems such as oceanic or meteorological flows, where it is out of the question to explore the whole attractor, neither to run very long time simulations. Instead, we propose to learn the system locally, in phase space, from an ensemble of trajectories.

Gilles Tissot


Previous meetings

Learning mulitmodal inversion models with 4DVarNet: insights from SST-SSH synergies


This talk addresses the inversion of space-time dynamics from multi-source data. We discuss how 4DVarNet schemes provide new means to learn inversion models from multi-modal observation data when the relationship between the different observation data and the state dynamics of interest may not be explicit. Illustrations on SST-SSH synergies will support the methodological discussion.

Ronan Fablet

In situ wind speed monitoring from underwater acoustics using deep learning


This talk addresses wind speed retrieval at sea surface from underwater acoustics data. We investigate trainable variational schemes, namely 4DVarNet schemes, to deal with this inverse problem. We demonstrate relevant reconstruction performance w.r.t. state-of-the-art approaches, including the ability to exploit multi-source data, here ECMWF winds and underwater acoustics data.

Matteo Zambra

Joint calibration and mapping of satellite altimetry data


Satellite radar altimeters are a key source of observation of ocean surface dynamics. However, current sensor technology and mapping techniques do not yet allow to systematically resolve scales smaller than 100km. With their new sensors, upcoming wide-swath altimeter missions such as SWOT should help resolve finer scales. Current mapping techniques rely on the quality of the input data, which is why the raw data go through multiple preprocessing stages before being used. Those calibration stages are improved and refined over many years and represent a challenge when a new type of sensor start acquiring data. Here we show how a data-driven variational data assimilation framework could be used to jointly learn a calibration operator and an interpolator from non-calibrated data.

Quentin Febvre

arXiv paper

Backpropagation and Reinforcement Learning for Distributed Resource Allocation: an introduction to two-timescale stochastic approximations


Two-timescale stochastic approximations are widely used in machine learning, especially in Reinforcement Learning, min-max optimization, and bilevel programming. In this talk, I will start by introducing stochastic approximations and the O.D.E method to analyze such schemes. I will also describe the different theoretical results already available in the literature. In the second part of my talk, I will focus on showing recipes to design efficient algorithms using such approaches with a special focus on application related to resource allocation and possibly to machine learning.

Alexandre Reiffers-Masson


Segmentation of SAR observations, especially wind speed and rainfall


Synthetic Aperture Radar is a prolific provider of information on the ocean surface. This kind of imagery system is sensible to the ocean surface rugosity and can therefore detect several oceanic and atmospheric processes such as atmospheric fronts, biological slicks, sea ice or convective cells. In particular, SAR images can be used to obtain information about the rainfall and the wind speed. As both have a similar impact on the surface rugosity, it is difficult to estimate the wind speed under heavy rainfall and alternatively, the rainfall under extreme winds. By building a dataset skewed to specific rain-wind combinations, we show that it is possible to train a CNN to estimate either parameter while reducing the impact of the second.

Aurélien Colin


On the interest of high order statistics when characterizing non-linear systems.


Too often the statistical characterization of non-linear systems is reduced to two-point statistics such as power spectrum or correlation. However, these statistics introduce a very strong linear (Gaussian) assumption which is not suitable. Thus, the characterization of a non-Gaussian distribution needs to take into account more than its mean and variance. In this talk we first differentiate between high-order, or multi-point statistics which are able to catch nonlinearities, and second-order, or two-point statistics. Second, we illustrate the importance of high-order statistics through the study of two non-linear systems, three dimensional isotropic turbulence, and convective rolls in the marine atmosphere boundary layer.

Carlos Grannero Belinchon

Temperature, salinity and mixed layer depth in the Gulf Stream, reconstructed from remote sensing and in situ observations with neural networks.


We introduce OSnet (Ocean Stratification network), a new ocean reconstruction system aimed at providing a physically consistent analysis of the upper ocean stratification. The proposed scheme is a bootstrapped multilayer perceptron trained to predict simultaneously temperature and salinity (T-S) profiles down to 1000m and the Mixed Layer Depth (MLD) from satellite data covering 1993 to 2019. The prediction is generalized on a 1/4 degree daily grid, producing four-dimensional fields of temperature and salinity, with their associated confidence interval issued from the bootstrap. While OSnet delivers an accurate interpolation of the ocean’s stratification, it is also a tool to study how the interior of the ocean’s behaviour reflects on the surface data. We can compute the relative importance of each input for each T-S prediction and analyse how the network learns which surface feature influences most which property and at which depth. Our results are promising and demonstrate the power of machine learning methods to improve the predictions of ocean interior properties from observations of the ocean surface.

Etienne Parthenet

Manipulation of GAN images using optimal transport


Anthony Frion

Predicting Significant Wave Height From SAR Image Spectra using Attention-based Deep Learning architectures and Data Augmentation


Zhengyang Lang

Oceanix-Melody-IA-OAC Sandbox on Deep Learning, Stochastic Dynamics and Geophysical Extremes


This session (10am-12.30pm) will review and discuss ongoing works within Melody/OceaniX on learning-based approaches for stochastic dynamics and geophysical extremes. Binary classifiers for extremes (P. Naveau), Latent representations for geophysical extremes (N. Lafon), 4DVarNets with trainable norms and stochastic components (R. Fablet/M. Beauchamp).

Melody team

From interpolation to short-term forecasting for sea surface sediment dynamics using 4DVarNN


J.M. Vient

related paper

Oceanix-Melody-IA-OAC Sandbox on Subdgrid-scale modeling


This session (10am-12.30pm) will review and discuss ongoing works within Melody/OceaniX on subgrid-scale modeling issues from a learning-based perspective.

Melody team


Parameterizing impacts of submesoscale processes on the ocean circulation


J. Gula


Interannual Climate Prediction of Surface Atmospheric Temperature


F. Sevellec


Learning integration schemes for ODEs: stability constraints and data-driven identification


we address the design of integration schemes as a machine learning problem. Based on automatic differentiation tools embedded in deep learning frameworks, we introduce trainable explicit Runge-Kutta integration schemes. We may consider different learning criteria, especially stability-based and data-driven ones, as well as a joint identification of ODE-based governing laws and associated integration schemes. We demonstrate the relevance of the proposed learning-based approach for non-linear equations and include a quantitative analysis w.r.t. classical state-of-the-art integration techniques, especially where the latter may not apply.

S. Ouala

Applying Conditional Generative Adversarial Models on Ocean Observations


Deep learning models usually lie on the representation of the input domain in a so-called latent space that contains meaningful encoding of the input’s semantic, from which it is easier to infer a categorization or a segmentation than from the original image space. Generative Adversarial Models, or GAN, on the other hand, aim to learn a latent space from which realistic synthetic data can be produced. As such, generative models can be considered to obtain a general representation that could be reused for a variety of tasks. It can also be used to generate new groundtruths for data augmentation, in which case conditional generation is required to select some aspect of the generated data. In the context of SAR imagery, constraints on the generator have to be integrated to take into account some particularities of the ocean observations, in particular the preponderance of high-frequency patterns.

A. Colin


Oceanix-Melody-IA-OAC Sandbox on Deep Learning and Data Assimilation


Discussion on the ongoing work on DL and Data Assimilation within OceaniX-Melody (10.00am-12.30pm)

OceaniX WP3



Narrowing uncertainties of climate projections using data science tools?


Next AI4Climate seminar (supported by SAMA and SCAI). Climate indices show large variability in CMIP climate predictions. In this presentation, we propose to weight multi-model climate simulations to reduce the uncertainty in climate predictions, and better estimate the future evolution of climate indices. The proposed methodology is based on advanced data science tools (i.e, data assimilation, analog forecasting, model evidence metrics), to accurately compute distances between current observations and simulated climate indices. This low-cost procedure is tested on a simplified climate model. The results show that the methods can be applied locally and is able to identify relevant parameterizations.

P. Tandeo

History Matching and Machine Learning for the tuning of climate models


A major cause of earth system model discrepancies result from processes that are missed or are incorrectly represented in the model’s equations. Despite the increasing number of collected observations, reducing parametric uncertainties is still an enourmous challenge. The process of relying on experience and intuition to find good sets of parameters, commonly referred to as parameter tuning keeps having a central role in the roadmaps followed by dozens of modeling groups involved in community efforts such as the Coupled Model Intercomparison Project (CMIP). In the talk I’ll present a tool from the Uncertainty Quantification community that started recently to draw attention in climate modeling, History Matching also referred to as « Iterative Refocussing ». The core idea of History Matching is to run several simulations with different set of parameters and then use observed data to rule-out any parameter settings which are “implausible”. Since climate simulation models are computationally heavy and do not allow testing every possible parameter setting, we employ an emulator that can be a cheap and accurate replacement. Here a machine learning algorithm, namely, Gaussian Process Regression is used for the emulating step. History Matching is then a good example where the recent advances in machine learning can be of high interest to climate modeling. I will show some results using History Matching on a toy model, the two-layer Lorenz96, and share some findings about the challenges and opportunities of using this technique.

R. Lguensat


Student projects on Data Assimilation


Several 30-minutes presentations from 1.30pm to 5.30pm on data assimilation, covering different topics (e.g., estimation of physical and statistical parameters, tracking particles using a 2D advection model, assimilate data in a simplified AMOC model, estimation of friction parameters in a 1D estuary model). These presentations will be given by students in the framework of a graduate course on data assimilation.

P. Tandeo


Joint RIKEN-IMT Atlantique Seminar on Application of deep-learning methods to environmental data


We investigate the problem of learning disentangled representations. Given a pair of images sharing some attributes, we aim to create a low-dimensional representation which is split into two parts, a shared representation that captures the common information between the images and an exclusive representation that contains the specific information of each image. To address this issue, we propose a model based on mutual information estimation without relying on image reconstruction or image generation. We show that disentangled representations are useful to perform downstream tasks such as image classification and image retrieval based on the shared or exclusive component. Moreover, our model outperforms the state-of-the-art models based on VAE/GAN approaches in representation disentanglement.

E. Sanchez

ECCV paper

Joint RIKEN-IMT Atlantique Seminar on Application of deep-learning methods to environmental data


The second seminar organized in the framework of the 2021 IMT Atlantique-RIKEN workshop on Statistical Modeling and Machine Learning in Meteorology and Oceanography will address the Semantic Segmentation of Metocean Processes and the Spatio-temporal integration of forecast guidance outputs using U-Net.

A. Colin and Dr. H. Hachiya


Deep learning and Trajectory Representation for the Prediction of Seabird Diving Behaviour


Seabirds’ behavior provides invaluable information for the study of marine ecosystems, since their foraging strategies give us a real-time response to the complex dynamics of the ecosystem. In particular, by deploying sufficiently light GPS sensors on seabirds, it is possible to obtain their trajectories, to identify behaviors at sea and foraging areas. This work addresses the inference of seabird diving behaviour from GPS data using Deep Learning methods. From a database of about 250 foraging trajectories derived from GPS data deployed simultaneously with pressure sensors for the identification of dives. Within a supervised setting, we show that the representations of trajectory data (time-series vs distance matrix) greatly affect the ability of deep learning architectures to infer diving behaviour for two tropical seabird species. We also point out the potential impact on the estimation of dives distribution.

A. Roy


SPDE-based deep neural networks for conditional simulations


Tutorial on the use of FEniCS models (finite-element PDE solver) as PyTroch modules

S. Ouala

Git repo

Student projects on Big Data, Cloud Computing and Environmental data


This friday, several 20-minute presentations from 9.30am to 12.00pm and 1.30pm to 5.00pm on the big data processing and analytics using Pangeo, BigQuery and Google Cloud platform applied to ocean-atmosphere-climate data and questions (e.g., sea level rise, hurricane analytics, CMIP6 simulation, satellite observation data). Feel free to join at any time. These presentations will be given by students in the framework of a graduate course on Big Data, Cloud Computing and Environmental Science.

P. Tandeo

Join on Webex


Matteo's talk on Emergence of Network Motifs in Deep Neural Networks


The presence of pattern of inter-connections between nodes (called network motifs) in real complex networks has been widely studied and documented. Our study aims to search for meaningful patterns in a MLP network before and after training. Simulations show that the final network topology is shaped by learning dynamics, but can be strongly biased by choosing appropriate weight initialization schemes. Overall, our results suggest that non-trivial initialization strategies can make learning more effective by promoting the development of useful network motifs, which are often surprisingly consistent with those observed in general transduction networks.

M. Zambra


Simon's talk on the variational learning of sea surface current reconstructions from AIS data streams.


Space oceanography missions, especially altimeter missions, have considerably improved the observation of sea surface dynamics over the last decades. They can however hardly resolve spatial scales below ∼ 100km. Meanwhile the AIS (Automatic Identification System) monitoring of the maritime traffic implicitly conveys information on the underlying sea surface currents as the trajectory of ships is affected by the current. Here, we show that an unsupervised variational learning scheme provides new means to elucidate how AIS data streams can be converted into sea surface currents. The proposed scheme relies on a learnable variational framework and relate to variational auto-encoder approach coupled with neural ODE (Ordinary Differential Equation) solving the targeted ill-posed inverse problem. Through numerical experiments on a real AIS dataset, we demonstrate how the proposed scheme could contribute to the reconstruction of sea surface currents from AIS data.

S. Bennaïchouche



Introduction to TorchSDE


Tutorial on PyTorch package for SDEs

N. Dridi

Github repo

Carlo's talk on Multiscale Complexity


Nowadays, the multi-scale character of complex natural systems is confirmed as a fact. Turbulence, which is considered as the last frontier of classical physics, is used as a benchmark of this type of systems. Thus due to its tremendous importance, several frameworks have been historically developed to describe it. However, the existing approaches present limitations to cover the full description of multi-scale couplings. As a consequence of these drawbacks, new progresses in the field are needed. Thus, I propose to develop a statistical framework based on Information Theory to deal with multi-scale coupling descriptions of physical systems and processes.

C. Granero


Introduction to Data Version Control


Short tutorial on a framework for dataset control and versioning

Q. Febvre


Linking Energy-based GAN to 4DVarNN


R. Fablet


Introduction to pytorch lightning


Proposition of a framework to standardize deep learning research code

Q Febvre