Data-driven discovery of PDEs from experimental video

code for this project is available here

Introduction

In this project, we aim to discover dynamics (partial differential equations) from in situ videos of scanning transmission electron microscopy. Mathematically, we want to find the best symbolic equation

that can describe the video. is the intensity, a fuction of the x-coordinate, the y-coordinate, and t-coordinate (time). RHS is the temporal derivative, and LHS is an unknown symbolic expression as a function of , the 1st order derivative of wrt , , the 1st order derivative of wrt , , and the three 2nd order derivatives, , , .

I divide this project into two major steps. The first step is to evaluate numerical derivatives, and the second step is to find the best symbolic equation. The challenge in the first step comes from noise and sparsity of experimental data, and the challenge in the second step is to find the global minimum in the symbolic-equation space. To resolve the first challenge, I proposed a scheme, deep learning total variation regularization. To resolve the second challenge, I proposed the spin sequential Monte Carlo to sample the symbolic-equation space according to the Bayesian posterior probability distribution.

Deep learning total variation regularization

We first need to evaluate numerical derivatives, which is a challenging task for experimental data as they are noisy and sparse. The conventional approaches, such as the finite difference method, are of little help in this scenario. Interestingly, deep learning offers an elegant way to resolve this challenge. We may parametrize a neural network for the insitu video, which should be a smooth function of , , and . To guarantee the smoothness, I apply the total variation regularization on the neural network, which means that I add a regularization term in the loss function,

where the first term is the total-variation regularization term, the second term the mean squared loss, and g the neural network.
Once we finish training the neural networks, we can then use the automatic differentiation to obtain numerical derivatives to any order.

Next, I use a simple example to demonstrate this scheme. The video below is the soft-segmentation result of a real in situ STEM video,

The signals at the moving interface are very noisy, which prohibits us from using conventional methods to evaluate the numerical derivatives. Let’s use DLTVR to do that.

DLTVR first smoothes the video shown above and return the smoothed video below

We can then employ the automatic differentiation implemeneted in tensorflow or pytorch to calculate the derivatives, which is a piece of cake.

Spin sequential Monte Carlo

In the second step, I proposed the spin sequential Monte Carlo to find the best partial differential equation (PDE) to describe the video. I call the algorithm spin sequential Monte Carlo because I combined the sequential Monte Carlo and the spin-flip Markov chain Monte Carlo. The inspiration comes from the paper by Ruby et al. and my PhD work. Ruby et al. proposed that the RHS of the partial differential equation might be expressed as a linear combination of non-learn terms

where are the non-linear terms in the non-linear library.

Now the key problem becomes which terms we need to select to construct the PDE. Ruby et al. proposed to use the thresholding Ridge regression, which might work well when the non-linear library is not large but become not suitable for a sizeable non-linear library. My PhD work on spin systems inspired me to use spin Monte Carlo sampling for this. To be more specific, we may map the linear combination of the non-linear terms onto an Ising spin chain. The spin up means that the corresponding term is selected, whereas the spin down means that the corresponding term is not selected. We can then use the spin-flip Markov chain Monte Carlo to sample the PDE space. To speed up sampling efficiency, I combine the spin-flip Monte Carlo with the sequential Monte Carlo.

The probability distribution for a PDE is given as the Bayesian posterior

which is the product of the likelihood probability and the prior probability. I use the Gaussian error function to define the likelihood probability

and define the prior probability as a function of the PDE complexity

as we don’t want to get too complex PDE.

We plot the final result for the video shown above as a Pareto frontier for PDE complexity and mean absolute error (MAE):

alt text

The best PDE is a first-order hyperbolic PDE, which is not a surprise.