### Preprint: Computing with dense kernel matrices at near-linear cost

Florian Schäfer, Houman Owhadi, and I have just uploaded a revised and improved version of our preprint “Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity” to the arXiv. This paper shows how a surprisingly simple algorithm — the zero fill-in incomplete Cholesky factorisation — with respect to a cleverly-chosen sparsity pattern allows for near-linear complexity compression, inversion, and approximate PCA of square matrices of the form

$$\Theta = \begin{bmatrix} G(x_{1}, x_{1}) & \cdots & G(x_{1}, x_{N}) \\ \vdots & \ddots & \vdots \\ G(x_{N}, x_{1}) & \cdots & G(x_{N}, x_{N}) \end{bmatrix} \in \mathbb{R}^{N \times N} ,$$

where $$\{ x_{1}, \dots, x_{N} \} \subset \mathbb{R}^{d}$$ is a data set and $$G \colon \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}$$ is a covariance kernel function. Such matrices play a key role in, for example, Gaussian process regression and RKHS-based machine learning techniques.

Abstract. Dense kernel matrices $$\Theta \in \mathbb{R}^{N \times N}$$ obtained from point evaluations of a covariance function $$G$$ at locations $$\{ x_{i} \}_{1 \leq i \leq N}$$ arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously-distributed sampling points, we show how to identify a subset $$S \subset \{ 1 , \dots , N \}^2$$, with $$\# S = O ( N \log (N) \log^{d} ( N /\varepsilon ) )$$, such that the zero fill-in incomplete Cholesky factorisation of the sparse matrix $$\Theta_{ij} 1_{( i, j ) \in S}$$ is an $$\varepsilon$$-approximation of $$\Theta$$. This factorisation can provably be obtained in complexity $$O ( N \log( N ) \log^{d}( N /\varepsilon) )$$ in space and $$O ( N \log^{2}( N ) \log^{2d}( N /\varepsilon) )$$ in time; we further present numerical evidence that $$d$$ can be taken to be the intrinsic dimension of the data set rather than that of the ambient space. The algorithm only needs to know the spatial configuration of the $$x_{i}$$ and does not require an analytic representation of $$G$$. Furthermore, this factorization straightforwardly provides an approximate sparse PCA with optimal rate of convergence in the operator norm. Hence, by using only subsampling and the incomplete Cholesky factorization, we obtain, at nearly linear complexity, the compression, inversion and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky factorization we also obtain a solver for elliptic PDE with complexity $$O ( N \log^{d}( N /\varepsilon) )$$ in space and $$O ( N \log^{2d}( N /\varepsilon) )$$ in time.

### Implicit Probabilistic Integrators in NeurIPS

The paper “Implicit probabilistic integrators for ODEs” by Onur Teymur, Han Cheng Lie, Ben Calderhead and myself has now appeared in Advances in Neural Information Processing Systems 31 (NeurIPS 2018). This paper forms part of an expanding body of work that provides mathematical convergence analysis of probabilistic solvers for initial value problems, in this case implicit methods such as (probabilistic versions of) the multistep Adams–Moulton method.

O. Teymur, H. C. Lie, T. J. Sullivan, and B. Calderhead. “Implicit probabilistic integrators for ODEs” in Advances in Neural Information Processing Systems 31 (NIPS 2018), ed. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi & R. Garnett. 2018.

Abstract. We introduce a family of implicit probabilistic integrators for initial value problems (IVPs), taking as a starting point the multistep Adams–Moulton method. The implicit construction allows for dynamic feedback from the forthcoming time-step, in contrast to previous probabilistic integrators, all of which are based on explicit methods. We begin with a concise survey of the rapidly-expanding field of probabilistic ODE solvers. We then introduce our method, which builds on and adapts the work of Conrad et al. (2016) and Teymur et al. (2016), and provide a rigorous proof of its well-definedness and convergence. We discuss the problem of the calibration of such integrators and suggest one approach. We give an illustrative example highlighting the effect of the use of probabilistic integrators — including our new method — in the setting of parameter inference within an inverse problem.

### Random Bayesian inverse problems in JUQ

The article “Random forward models and log-likelihoods in Bayesian inverse problems” by Han Cheng Lie, Aretha Teckentrup, and myself has now appeared in its final form in the SIAM/ASA Journal on Uncertainty Quantification, volume 6, issue 4. This paper considers the effect of approximating the likelihood in a Bayesian inverse problem by a random surrogate, as frequently happens in applications, with the aim of showing that the perturbed posterior distribution is close to the exact one in a suitable sense. This article considers general randomisation models, and thereby expands upon the previous investigations of Stuart and Teckentrup (2017) in the Gaussian setting.

H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Random forward models and log-likelihoods in Bayesian inverse problems.” SIAM/ASA Journal on Uncertainty Quantification 6(4):1600–1629, 2018. doi:10.1137/18M1166523

Abstract. We consider the use of randomised forward models and log-likelihoods within the Bayesian approach to inverse problems. Such random approximations to the exact forward model or log-likelihood arise naturally when a computationally expensive model is approximated using a cheaper stochastic surrogate, as in Gaussian process emulation (kriging), or in the field of probabilistic numerical methods. We show that the Hellinger distance between the exact and approximate Bayesian posteriors is bounded by moments of the difference between the true and approximate log-likelihoods. Example applications of these stability results are given for randomised misfit models in large data applications and the probabilistic solution of ordinary differential equations.

### Weak and strong modes in Inverse Problems

The paper “Equivalence of weak and strong modes of measures on topological vector spaces” by Han Cheng Lie myself has now appeared in Inverse Problems. This paper addresses a natural question in the theory of modes (or maximum a posteriori estimators, in the case of posterior measure for a Bayesian inverse problem) in an infinite-dimensional space $$X$$. Such modes can be defined either strongly (a la Dashti et al. (2013), via a global maximisation) or weakly (a la Helin & Burger (2015), via a dense subspace $$E \subset X$$). The question is, when are strong and weak modes equivalent? The answer turns out to be rather subtle: under reasonable uniformity conditions, the two kinds of modes are indeed equivalent, but finite-dimensional counterexamples exist when the uniformity conditions fail.

H. C. Lie and T. J. Sullivan. “Equivalence of weak and strong modes of measures on topological vector spaces.” Inverse Problems 34(11):115013, 2018. doi:10.1088/1361-6420/aadef2

(See also H. C. Lie and T. J. Sullivan. “Erratum: Equivalence of weak and strong modes of measures on topological vector spaces (2018 Inverse Problems 34 115013).” Inverse Problems 34(12):129601, 2018. doi:10.1088/1361-6420/aae55b )

Abstract. A strong mode of a probability measure on a normed space $$X$$ can be defined as a point $$u \in X$$ such that the mass of the ball centred at $$u$$ uniformly dominates the mass of all other balls in the small-radius limit. Helin and Burger weakened this definition by considering only pairwise comparisons with balls whose centres differ by vectors in a dense, proper linear subspace $$E$$ of $$X$$, and posed the question of when these two types of modes coincide. We show that, in a more general setting of metrisable vector spaces equipped with non-atomic measures that are finite on bounded sets, the density of $$E$$ and a uniformity condition suffice for the equivalence of these two types of modes. We accomplish this by introducing a new, intermediate type of mode. We also show that these modes can be inequivalent if the uniformity condition fails. Our results shed light on the relationships between among various notions of maximum a posteriori estimator in non-parametric Bayesian inference.

### Preprint: A Shape Trajectories Approach to Longitudinal Statistical Analysis

Esfandiar Nava-Yazdani, Christoph von Tycowicz, Hans-Christian Hege, and I have just published a preprint of our work “A Shape Trajectories Approach to Longitudinal Statistical Analysis”.

For Kendall's shape space we determine analytically Jacobi fields and parallel transport, and compute geodesic regression. Using the derived expressions, we can fully leverage the geometry via Riemannian optimization and reduce the computational expense by several orders of magnitude. The methodology is demonstrated by performing a longitudinal statistical analysis of epidemiological shape data.

As application example we have chosen 3D shapes of knee bones, reconstructed from image data of the Osteoarthritis Initiative. Comparing subject groups with incident and developing osteoarthritis versus normal controls, we find clear differences in the temporal development of femur shapes. This paves the way for early prediction of incident knee osteoarthritis, using geometry data only.

