### #prob-num

### Bayesian numerical methods for nonlinear PDEs

Junyang Wang, Jon Cockayne, Oksana Chkrebtii, Chris Oates, and I have just uploaded a preprint of our recent work “Bayesian numerical methods for nonlinear partial differential equations” to the arXiv. This paper continues our study of (approximate) Bayesian probabilistic numerical methods (Cockayne et al., 2019), in this case for the challenging setting of nonlinear PDEs, with the goal of realising a posterior distribution over the solution of the PDE that carries a meaningful expression of uncertainty about the solution's true value given the discretisation error that has been incurred.

**Abstract.**
The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied.
However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula.
This paper extends earlier work on linear PDEs to a general class of initial value problems specified by nonlinear PDEs, motivated by problems for which evaluations of the right-hand-side, initial conditions, or boundary conditions of the PDE have a high computational cost.
The proposed method can be viewed as exact Bayesian inference under an approximate likelihood, which is based on discretisation of the nonlinear differential operator.
Proof-of-concept experimental results demonstrate that meaningful probabilistic uncertainty quantification for the unknown solution of the PDE can be performed, while controlling the number of times the right-hand-side, initial and boundary conditions are evaluated.
A suitable prior model for the solution of the PDE is identified using novel theoretical analysis of the sample path properties of Matérn processes, which may be of independent interest.

Published on Tuesday 27 April 2021 at 10:00 UTC #preprint #prob-num #wang #cockayne #chkrebtii #oates

### Computing with dense kernel matrices at near-linear cost in MMS

The paper “Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity” by Florian Schäfer, Houman Owhadi, and myself has just appeared in print in *Multiscale Modeling and Simulation*.
This paper shows how a surprisingly simple algorithm — the zero fill-in incomplete Cholesky factorisation — with respect to a cleverly-chosen sparsity pattern allows for near-linear complexity compression, inversion, and approximate PCA of square matrices of the form

\( \Theta = \begin{bmatrix} G(x_{1}, x_{1}) & \cdots & G(x_{1}, x_{N}) \\ \vdots & \ddots & \vdots \\ G(x_{N}, x_{1}) & \cdots & G(x_{N}, x_{N}) \end{bmatrix} \in \mathbb{R}^{N \times N} , \)

where \(\{ x_{1}, \dots, x_{N} \} \subset \mathbb{R}^{d}\) is a data set and \(G \colon \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}\) is a covariance kernel function. Such matrices play a key role in, for example, Gaussian process regression and RKHS-based machine learning techniques.

F. Schäfer, T. J. Sullivan, and H. Owhadi. “Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity.” *Multiscale Modeling & Simulation: A SIAM Interdisciplinary Journal* 19(2):688–730, 2021.

**Abstract.**
Dense kernel matrices \(\Theta \in \mathbb{R}^{N \times N}\) obtained from point evaluations of a covariance function \(G\) at locations \(\{ x_{i} \}_{1 \leq i \leq N}\) arise in statistics, machine learning, and numerical analysis.
For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously-distributed sampling points, we show how to identify a subset \(S \subset \{ 1 , \dots , N \}^2\), with \(\# S = O ( N \log (N) \log^{d} ( N /\varepsilon ) )\), such that the zero fill-in incomplete Cholesky factorisation of the sparse matrix \(\Theta_{ij} 1_{( i, j ) \in S}\) is an \(\varepsilon\)-approximation of \(\Theta\).
This factorisation can provably be obtained in complexity \(O ( N \log( N ) \log^{d}( N /\varepsilon) )\) in space and \(O ( N \log^{2}( N ) \log^{2d}( N /\varepsilon) )\) in time; we further present numerical evidence that \(d\) can be taken to be the intrinsic dimension of the data set rather than that of the ambient space.
The algorithm only needs to know the spatial configuration of the \(x_{i}\) and does not require an analytic representation of \(G\).
Furthermore, this factorization straightforwardly provides an approximate sparse PCA with optimal rate of convergence in the operator norm.
Hence, by using only subsampling and the incomplete Cholesky factorization, we obtain, at nearly linear complexity, the compression, inversion and approximate PCA of a large class of covariance matrices.
By inverting the order of the Cholesky factorization we also obtain a solver for elliptic PDE with complexity \(O ( N \log^{d}( N /\varepsilon) )\) in space and \(O ( N \log^{2d}( N /\varepsilon) )\) in time.

Published on Thursday 15 April 2021 at 12:00 UTC #publication #prob-num #schaefer #owhadi

### Randomised integration for deterministic operator differential equations

Han Cheng Lie, Martin Stahn, and I have just uploaded a preprint of our recent work “Randomised one-step time integration methods for deterministic operator differential equations” to the arXiv. In this paper, we extend the analysis of Conrad et al. (2016) and Lie et al. (2019) to the case of evolutionary systems in Banach spaces or even Gel′fand triples, this being the right setting for many evolutionary partial differential equations.

**Abstract.**
Uncertainty quantification plays an important role in applications that involve simulating ensembles of trajectories of dynamical systems.
Conrad et al. (*Stat. Comput.*, 2017) proposed randomisation of deterministic time integration methods as a strategy for quantifying uncertainty due to time discretisation.
We consider this strategy for systems that are described by deterministic, possibly non-autonomous operator differential equations defined on a Banach space or a Gel′fand triple.
We prove pathwise and expected error bounds on the random trajectories, given an assumption on the local truncation error of the underlying deterministic time integration and an assumption that the absolute moments of the random variables decay with the time step. Our analysis shows that the error analysis for differential equations in finite-dimensional Euclidean space carries over to infinite-dimensional settings.

Published on Wednesday 31 March 2021 at 09:00 UTC #preprint #prob-num #lie #stahn

### Convergence rates of Gaussian ODE filters in Statistics and Computing

The paper “Convergence rates of Gaussian ODE filters” by Hans Kersting, Philipp Hennig, and myself has just appeared in the journal *Statistics and Computing*.
In this work, we examine the strong convergence rates of probabilistic solvers for ODEs of the form \(\dot{x}(t) = f(x(t))\) that are based upon Gaussian filtering.
In some sense, this work combines the numerical analysis perspective of Conrad et al. (2016) and Lie et al. (2019) with the filtering perspective on probabilistic numerical methods for ODEs of Schober et al. (2014).

H. Kersting, T. J. Sullivan, and P. Hennig. “Convergence rates of Gaussian ODE filters.” *Statistics and Computing* 30(6):1791–1816, 2020.

**Abstract.**
A recently introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems.
These methods model the true solution \(x\) and its first \(q\) derivatives a priori as a Gaussâ€“Markov process \(X\), which is then iteratively conditioned on information about \(\dot{x}\).
This article establishes worst-case local convergence rates of order \(q + 1\) for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order \(q\) in the case of \(q = 1\) and an integrated Brownian motion prior, and analyses how inaccurate information on \(\dot{x}\) coming from approximate evaluations of \(f\) affects these rates.
Moreover, we show that, in the globally convergent case, the posterior credible intervals are well calibrated in the sense that they globally contract at the same rate as the truncation error.
We illustrate these theoretical results by numerical experiments which might indicate their generalizability to \(q \in \{ 2, 3 , \dots \}\).

Published on Tuesday 15 September 2020 at 09:00 UTC #publication #stco #prob-num #kersting #hennig

### Optimality of probabilistic numerical methods

The paper “Optimality criteria for probabilistic numerical methods” by Chris Oates, Jon Cockayne, Dennis Prangle, Mark Girolami, and myself has just appeared in print:

C. J. Oates, J. Cockayne, D. Prangle, T. J. Sullivan, and M. Girolami. “Optimality criteria for probabilistic numerical methods” in *Multivariate Algorithms and Information-Based Complexity*, ed. F. J. Hickernell and P. Kritzer. *Radon Series on Computational and Applied Mathematics* 27:65–88, 2020.

**Abstract.**
It is well understood that Bayesian decision theory and average case analysis are essentially identical.
However, if one is interested in performing *uncertainty quantification* for a numerical task, it can be argued that standard approaches from the decision-theoretic framework are neither appropriate nor sufficient.
Instead, we consider a particular optimality criterion from Bayesian experimental design and study its implied optimal information in the numerical context.
This information is demonstrated to differ, in general, from the information that would be used in an average-case-optimal numerical method.
The explicit connection to Bayesian experimental design suggests several distinct regimes, in which optimal probabilistic numerical methods can be developed.

Published on Sunday 31 May 2020 at 08:00 UTC #publication #prob-num #oates #cockayne #prangle #girolami