# Tim Sullivan

## Outlines of Some Research Topics

My general area of research is uncertainty quantification, at the intersection of applied mathematics, probability, statistics, and computational science. The brief descriptions below outline some of my research interests, past work, and ideas for the future. If you are interested in any of these areas, then please get in touch!

A common theme underlying many of my current research interests is to add ‘the right amount of uncertainty’ to an otherwise certain or near-certain prediction, in order to correctly compensate for over-confidence in the modelling process. In the words of Julian Baggini:

The mark of a mature, psychologically healthy mind is indeed the ability to live with uncertainty and ambiguity, but only as much as there really is.

### Probabilistic Numerical Methods

There is a long tradition of giving probabilistic solutions to deterministic problems such as quadrature, linear algebra, optimisation, etc.; early contributions in this direction date back to Poincaré at the start of the 20th Century, continued by the work of Larkin in the 1970s, and seminal papers of Diaconis (1988) and O'Hagan (1992). For example, a Bayesian approach to computing the integral of some function $$f \colon [a, b] \to \mathbb{R}$$ would be to specify a prior probability measure on a suitable space of integrands, treat the evaluations $$y_{i} = f(x_{i})$$ of $$f$$ at $$n$$ quadrature nodes $$x_{i} \in [a, b]$$ as observed data, and then form the posterior distribution

$$\mathbb{P} \left( \int_{a}^{b} f(x) \, \mathrm{d} x \,\middle|\, f(x_{i}) = y_{i} \text{ for } i = 1, \dots, n \right) .$$

The expected value of this distribution gives an estimate of the value of the integral, and the variance can be used as a measure of accuracy. Indeed, a now-classical result of Sul′din (1959) shows that, under the Brownian motion prior on $$f$$, the posterior mean / MAP estimator for $$\int_{a}^{b} f(x) \, \mathrm{d} x$$ is the elementary trapezoidal rule for quadrature — there is an underlying posterior mean estimator for $$f$$, namely the piecewise-linear interpolant of the data $$f(x_{i}) = y_{i}$$, and the integral of this interpolant is the posterior mean estimator of $$\int_{a}^{b} f(x) \, \mathrm{d} x$$. The moral is that many deterministic numerical methods can be seen as point estimators for statistical procedures; conversely, statistical procedures offer a richer quantification of uncertainty in numerical tasks, and this UQ can be fed forward into other parts of scientific pipelines (e.g. inverse problems).

This probabilistic perpective on numerical analysis has received renewed attention in the last five years from the statistical and machine learning communities (Hennig et al., 2015), with an increasing emphasis on mathematical analysis and foundations, as well as new classes of methods. See Oates and Sullivan (2019) for a historical survey of the field that also encompasses recent work.

#### Mathematical and Statistical Foundations

On a general level, I am interested in foundational statistical questions, e.g. what it means for a probabilistic numerical method to be Bayesian, and how to exactly or approximately realise the Bayesian posterior for a probabilistic numerical method. Cockayne et al. (2019) propose the point of view that a (classical, deterministic) numerical method for approximating a quantity of interest $$Q \colon \mathcal{U} \to \mathcal{Q}$$ of an unknown $$u \in \mathcal{U}$$ based on information $$Y \colon \mathcal{U} \to \mathcal{Y}$$ should be seen as a function $$B \colon \mathcal{Y} \to \mathcal{Q}$$; a “good” numerical method is one for which the residual operator $$R = B \circ Y - Q$$ is “small”. A probabilistic numerical method enriches this picture by returning not just a point estimator $$B(y)$$ but a probability distribution $$\beta(y)$$ on $$\mathcal{Q}$$. Furthermore, one can (but does not have to) additionally ask that $$\beta$$ correspond to a Bayesian statistical procedure. The precise relationship between this perspective and the older paradigm of average-case numerical analysis is explored in Cockayne et al. (2019) and Oates et al. (2020).

I am also interested in concrete applications, e.g. to data-centric engineering, molecular dynamics, atomistics, and other multiscale modelling problems.

Revelant Publications

• C. J. Oates, J. Cockayne, D. Prangle, T. J. Sullivan, and M. Girolami. “Optimality criteria for probabilistic numerical methods” in Multivariate Algorithms and Information-Based Complexity, ed. F. J. Hickernell and P. Kritzer. Radon Series on Computational and Applied Mathematics 27:65–88, 2020. doi:10.1515/9783110635461-005 arXiv:1901.04326
• J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Bayesian probabilistic numerical methods.” SIAM Rev. 61(4):756–789, 2019. doi:10.1137/17M1139357 arXiv:1702.03673
• C. J. Oates and T. J. Sullivan. “A modern retrospective on probabilistic numerics.” Stat. Comput. 29(6):1335–1351, 2019. doi:10.1007/s11222-019-09902-z arXiv:1901.04457

#### Applications to Inverse Problems

One motivation for the use of probabilistic numerical methods is to quantify the impact of discretisation error in a statistical way, in particular when they are used as solvers in Bayesian inverse problems. To properly understand this impact, one needs to extend the analysis of Bayesian inverse problems to consider the use of randomised forward models and log-likelihoods (Lie et al., 2018).

Revelant Publications

• H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Error bounds for some approximate posterior measures in Bayesian inference” in Numerical Mathematics and Advanced Applications ENUMATH 2019, ed. F. J. Vermolen and C. Vuik. Lecture Notes in Computational Science and Engineering 139:275–283, 2021. doi:10.1007/978-3-030-55874-1_26 arXiv:1911.05669
• H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Random forward models and log-likelihoods in Bayesian inverse problems.” SIAM/ASA J. Uncertain. Quantif. 6(4):1600–1629, 2018. doi:10.1137/18M1166523 arXiv:1712.05717
• J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic numerical methods for PDE-constrained Bayesian inverse problems” in Proceedings of the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, ed. G. Verdoolaege. AIP Conference Proceedings 1853:060001-1–060001-8, 2017. doi:10.1063/1.4985359 arXiv:1701.04006
• J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic meshless methods for partial differential equations and Bayesian inverse problems.” arXiv Preprint, 2016. arXiv:1605.07811

#### Differential Equations

A fully Bayesian perspective on the solution of ODEs was advocated by Skilling (1992). As is often the case with dynamical systems that have a “time's arrow”, a fully Bayesian perspective is often too demanding, and a sequential filtering-based approach is more appropriate (Schober et al., 2014). However, another approach is based in the perturbation of the numerical flow, as advocated by Conrad et al. (2016), who pointed out the problems associated with over-confident inferences arising from under-resolved models, and established an optimal scaling for Gaussian process perturbations of standard numerical schemes (e.g. implicit Euler or RK4) that introduces maximal uncertainty without changing the deterministic convergence rate for the mean. Thus, for an ODE

$$u'(t) = f(u(t))$$

and a time step $$\tau > 0$$, one replaces the deterministic numerical integrator (discrete time flow map $$\Psi^{\tau}$$)

$$U_{k + 1} = \Psi^{\tau}(U_{k}),$$

which generates deterministic approximations $$U_{k} \approx u(k \tau)$$ by a randomised integrator

$$U_{k + 1} = \Psi^{\tau}(U_{k}) + \xi_{k}(\tau),$$

where the $$\xi_{k}(t)$$ are suitably-chosen stochastic processes: one desirable property is that if $$\Psi^{\tau}$$ has truncation error of order $$q$$, then so too should $$\mathbb{E}[U]$$ — though, of course, the randomised trajectories will exhibit some ‘spread’ about the mean. This approach provides a statistical characterisation of the impact of numerical errors in the integration scheme, which is highly desirable in Bayesian inverse problems, in which the numerical solution $$U$$ may be used to help infer important physical parameters.

I am interested in furthering this line of work, both in developing the theory of probabilistic integrators for more general ordinary (Lie et al., 2019; Teymur et al., 2018; Kersting et al., 2020; Lie et al., 2021) and partial differential equations (Cockayne et al., 2016; Cockayne et al., 2017; Wang et al., 2021).

Revelant Publications

• J. Wang, J. Cockayne, O. Chkrebtii, T. J. Sullivan, and C. J. Oates. “Bayesian numerical methods for nonlinear partial differential equations.” Stat. Comput. 31(5):no. 55, 20pp., 2021. doi:10.1007/s11222-021-10030-w arXiv:2104.12587
• H. C. Lie, M. Stahn, and T. J. Sullivan. “Randomised one-step time integration methods for deterministic operator differential equations.” arXiv Preprint, 2021. arXiv:2103.16506
• H. Kersting, T. J. Sullivan, and P. Hennig. “Convergence rates of Gaussian ODE filters.” Stat. Comput. 30(6):1791–1816, 2020. doi:10.1007/s11222-020-09972-4 arXiv:1807.09737
• H. C. Lie, A. M. Stuart, and T. J. Sullivan. “Strong convergence rates of probabilistic integrators for ordinary differential equations.” Stat. Comput. 29(6):1265–1283, 2019. doi:10.1007/s11222-019-09898-6 arXiv:1703.03680
• O. Teymur, H. C. Lie, T. J. Sullivan, and B. Calderhead. “Implicit probabilistic integrators for ODEs” in Advances in Neural Information Processing Systems 31 (NIPS 2018), ed. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett. 7244–7253, 2018. http://papers.nips.cc/paper/7955-implicit-probabilistic-integrators-for-odes arXiv:1805.07970
• J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic numerical methods for PDE-constrained Bayesian inverse problems” in Proceedings of the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, ed. G. Verdoolaege. AIP Conference Proceedings 1853:060001-1–060001-8, 2017. doi:10.1063/1.4985359 arXiv:1701.04006
• J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic meshless methods for partial differential equations and Bayesian inverse problems.” arXiv Preprint, 2016. arXiv:1605.07811

#### Linear Algebra

A major computational bottleneck in the application of Gaussian process and reproducing kernel Hilbert space (RKHS) techniques in machine learning and artificial intelligence is the cost of inverting a large square matrix of the form

$$\Theta = \begin{bmatrix} k(x_{1}, x_{1}) & \cdots & k(x_{1}, x_{N}) \\ \vdots & \ddots & \vdots \\ k(x_{N}, x_{1}) & \cdots & k(x_{N}, x_{N}) \end{bmatrix} \in \mathbb{R}^{N \times N} ,$$

where $$\{ x_{1}, \dots, x_{N} \}$$ is a data set in a space $$\mathcal{X}$$ and $$k \colon \mathcal{X} \times \mathcal{X} \to \mathbb{R}$$ is a covariance kernel function. The matrix $$\Theta$$ has $$N^2$$ entries and a naïve inversion has computational complexity $$O(N^3)$$; for many practical problems, these storage and computation costs are infeasible, so there is a strong interest in developing fast approximate methods for kernel matrix operations.

Schäfer et al. (2021) use a probabilistic interpretation of Cholesky factorisation to motivate a constructive choice of near-linearly many entries of the $$N^{2}$$ entries of $$\Theta$$, followed by a zero-fill-in incomplete Cholesky factorisation, yielding an exponentially accurate approximate inverse for $$\Theta$$ at near-linear computational cost.

Revelant Publications

• F. Schäfer, T. J. Sullivan, and H. Owhadi. “Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity.” Multiscale Model. Simul. 19(2):688–730, 2021. doi:10.1137/19M129526X arXiv:1706.02205
• T. J. Sullivan. “Contributed discussion on the article ‘A Bayesian conjugate gradient method’.” Bayesian Anal. 14(3):985–989, 2019. doi:10.1214/19-BA1145 arXiv:1906.10240

### Bayesian Inverse Problems

Bayesian procedures are popular methods for the quantification of uncertainty, especially in inverse problems. In many modern applications, the underlying models are infinite-dimensional, and discretisation only enters as a computational consideration. The general form of an inverse problem is, given spaces $$\mathcal{U}$$ and $$\mathcal{Y}$$ and a known operator $$G \colon \mathcal{U} \to \mathcal{Y}$$, to recover $$u \in \mathcal{U}$$ from $$y \in \mathcal{Y}$$, where $$y$$ is a (perhaps noisily corrupted version of) $$G(u)$$. A simple example is the additive noise model

$$y = G(u) + \eta ,$$

where $$\mathcal{Y}$$ is a vector space and $$\eta$$ is a $$\mathcal{Y}$$-valued random variable. In modern applications, the forward model $$G$$ may correspond to solving an integral equation or an ordinary or partial differential equation, and the parameter $$u$$ may be a scalar, vector, or tensor field — and so $$\mathcal{U}$$ is an infinite-dimensional space. For example, to first approximation, X-ray imaging of a spatial domain $$D \subset \mathbb{R}^d$$ is governed by the X-ray transform $$G$$ from a space $$\mathcal{U}$$ of functions on $$D$$ to a space of functions on $$\mathbb{R}^d \times \mathbb{S}^{d-1}$$,

$$\displaystyle (G u)(x_0, v) = \int_{\mathbb{R}} u(x_0 + t v) \, \mathrm{d} t.$$

Here $$u$$ describes the interior properties of the sample being X-rayed, which must be reconstructed from the average values along all lines, described by their direction $$v$$, emanating from a base point $$x_0$$.

Inverse problems are typically ill-posed in the strict sense (i.e. no such $$u \in \mathcal{U}$$ exists, or there are multiple solutions, or they depend sensitively upon the problem setup. In the Bayesian paradigm, closely related to the tradition of regularisation, the inverse problem is rendered well-posed by seeking a conditional posterior probability distribution for $$u$$ given $$y$$, not just a point estimate. In modern terminology, following seminal contributions of Stuart (2010), the Bayesian posterior distribution $$\mu^{y}$$ on $$\mathcal{U}$$ of $$u$$ given the data $$y$$ is given in terms of its density with respect to the prior distribution $$\mu_{0}$$:

\begin{align*} \dfrac{\mathrm{d} \mu^{y}}{\mathrm{d} \mu_{0}} (u) & = \dfrac{\exp(- \Phi(u; y))}{Z(y)} , \\ Z(y) & = \int_{\mathcal{U}} \exp(- \Phi(u; y)) \, \mu_{0} (\mathrm{d} u) , \end{align*}

where $$\Phi(\cdot; y) \colon \mathcal{U} \to \mathbb{R}$$ denotes the negative log-likelihood function (misfit) for the data $$y$$. The basic example, when the data space $$\mathcal{Y}$$ is finite-dimensional and $$\eta \sim \mathcal{N}(0, \Gamma)$$ is centred additive Gaussian noise with covariance matrix $$\Gamma$$, is the quadratic misfit functional

$$\Phi(u; y) = \dfrac{1}{2} \| \Gamma^{-1/2} ( y - G(u) ) \|^{2} .$$

#### Stability and Consistency of Bayesian inverse problems

From both frequentist and numerical-analytic points of view, it is natural to ask whether Bayesian methods are well-defined, stable, and consistent:

• well-definedness: does $$\mu^{y}$$, as defined above, really correspond to the notion of conditioning of probability measures, and are there any issues of uniquness?
• stability: does $$\mu^{y}$$ change “only a little bit” if the data $$y$$, the log-likelihood $$\Phi$$, forward model $$G$$, or prior $$\mu_{0}$$ are slightly altered?
• consistency: if exposed to enough high-quality data generated by a “true” data-generating parameter value $$u^{\dagger}$$, will the posterior measure $$\mu^{y}$$ concentrate on $$u^{\dagger}$$?

In this sense, what kinds of priors and likelihoods make for good modelling choices, and also lead to well-posed inverse problems? This is a subtle topic with many positive and negative results (Owhadi et al., 2015; Owhadi et al., 2015). I am also interested in the use of heavy-tailed priors (Sullivan, 2017; Lie and Sullivan, 2018) and randomised models (Lie et al., 2018) in Bayesian inverse problems.

Some past student projects that I have supervised in this area include those of Khor (2015) and Feuring (2020).

Revelant Publications

• H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Error bounds for some approximate posterior measures in Bayesian inference” in Numerical Mathematics and Advanced Applications ENUMATH 2019, ed. F. J. Vermolen and C. Vuik. Lecture Notes in Computational Science and Engineering 139:275–283, 2021. doi:10.1007/978-3-030-55874-1_26 arXiv:1911.05669
• L. E. Feuring. “Bayesian treatment of a seismological inverse problem.” Bachelor’s thesis, Humboldt-Universität zu Berlin, 2020.
• H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Random forward models and log-likelihoods in Bayesian inverse problems.” SIAM/ASA J. Uncertain. Quantif. 6(4):1600–1629, 2018. doi:10.1137/18M1166523 arXiv:1712.05717
• T. J. Sullivan. “Well-posedness of Bayesian inverse problems in quasi-Banach spaces with stable priors” in 88th Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), Weimar 2017, ed. C. Könke and C. Trunk. Proceedings in Applied Mathematics and Mechanics 17(1):871–874, 2017. doi:10.1002/pamm.201710402 arXiv:1710.05610
• T. J. Sullivan. “Well-posed Bayesian inverse problems and heavy-tailed stable quasi-Banach space priors.” Inverse Probl. Imaging 11(5):857–874, 2017. doi:10.3934/ipi.2017040 arXiv:1605.05898
• H. Owhadi, C. Scovel, and T. J. Sullivan. “On the brittleness of Bayesian inference.” SIAM Rev. 57(4):566–582, 2015. doi:10.1137/130938633 arXiv:1308.6306
• C. Khor. “The Bayesian inverse problem for traffic flow.” MA469 Fourth Year Project, University of Warwick, 2015.
• H. Owhadi, C. Scovel, and T. J. Sullivan. “Brittleness of Bayesian inference under finite information in a continuous world.” Electron. J. Stat. 9(1):1–79, 2015. doi:10.1214/15-EJS989 arXiv:1304.6772

#### Maximum a Posteriori Estimators

A general probability measure $$\mu$$, such as a Bayesian posterior, is generally too complicated to realise exactly, so some kinds of low-order summary statistics must be used. Expected values and covariances are common choices for such summaries; another choice is a “most likely point under $$\mu$$”, a so-called mode or maximum a posteriori estimator. In finite-dimensional spaces $$\mathcal{U}$$, a mode is simply a maximiser of the probability density function of $$\mu$$. In general, though, there is no Lebesgue measure with which to take densities, so modes must be defined intrinsically.

As is often the case in functional analysis, there are multiple ways of defining modes on an infinite-dimensional space $$\mathcal{U}$$, e.g. a separable Banach space. Dashti et al. (2013) proposed defining a strong mode of $$\mu$$ to be any point $$u^{\ast} \in \mathcal{U}$$ such that

$$\displaystyle \lim_{r \to 0} \frac{\sup_{v \in \mathcal{U}} \mu(B(v, r))}{\mu(B(u^{\ast}, r))} = 1 .$$

In later work, Helin and Burger (2015) proposed a weak mode with respect to a dense subspace $$E \subset \mathcal{U}$$. It turns out that, under reasonable conditions, every weak mode is a strong mode and vice versa (Lie and Sullivan, 2018). On the other hand, there are pathological measures, even in finite-dimensional spaces such as $$\mathcal{U} = \mathbb{R}^{2}$$, that have weak modes but no strong modes, and so in general these two notions are distinct.

It is a challenge to relate the robustness theories of Bayesian inverse problems and their MAP estimators to one another, largely because two distributions can be very similar overall and yet have very different MAP estimators, or vice versa. To address this problem, Ayanbayev et al. (2021) proposed a robustness theory for MAP estimators grounded in the Γ-convergence of the Onsager–Machlup functionals of the posteriors, which in turn can be broken down into the Γ-convergence of the prior Onsager–Machlup functionals and continuous convergence of the potentials (negative log-likelihoods).

Revelant Publications

• B. Ayanbayev, I. Klebanov, H. C. Lie, and T. J. Sullivan. “Γ-convergence of Onsager–Machlup functionals. Part I: With applications to maximum a posteriori estimation in Bayesian inverse problems.” arXiv Preprint, 2021. arXiv:2108.04597
• B. Ayanbayev, I. Klebanov, H. C. Lie, and T. J. Sullivan. “Γ-convergence of Onsager–Machlup functionals. Part II: Infinite product measures on Banach spaces.” arXiv Preprint, 2021. arXiv:2108.04598
• H. C. Lie and T. J. Sullivan. “Erratum: Equivalence of weak and strong modes of measures on topological vector spaces (2018 Inverse Problems 34 115013).” Inverse Probl. 34(12):129601, 2018. doi:10.1088/1361-6420/aae55b
• H. C. Lie and T. J. Sullivan. “Equivalence of weak and strong modes of measures on topological vector spaces.” Inverse Probl. 34(11):115013, 2018. doi:10.1088/1361-6420/aadef2 arXiv:1708.02516

#### Sampling and Approximation of Bayesian Posterior Distributions

At the end of the day, one needs to be able to compute with Bayesian posterior distributions. In principle, methods such as Markov chain Monte Carlo (MCMC) can be used to explore and sample an arbitrary probability distribution $$\mu$$ on a space $$\mathcal{U}$$, such as a Bayesian posterior distribution $$\mu^{y}$$. However, these methods can be computationally expensive and often converge poorly — if at all — in practice when $$\mathcal{U}$$ has high dimension, or when $$\mu$$ is concentrated on a low-dimensional subset of $$\mathcal{U}$$. In such situations, it is desirable to approximate $$\mu$$ by any reasonable means in order to obtain an informative approximate solution in finite time. Methods of interest to me include the following:

• Dimension reduction of data and parameters (Lie et al., 2018; Schuster et al., 2017), including active subspace methods (Constantine et al., 2016).
• Filtering techniques, e.g. the ensemble Kálmán filter (Evensen, 1994). Though originally developed for dynamical systems, these methods are also powerful derivative-free techniques for the solution of static inverse problems (Iglesias et al., 2013).
• Reproducing kernel Hilbert space techniques such as conditional mean embedding (Fukumizu et al., 2004; Song et al., 2009; Klebanov et al., 2020). In such methods, random variables and probability distributions on almost arbitrary spaces are represented through their embeddings into a feature space $$\mathcal{H}$$, and conditional probability in the original space can be realised through linear-algebraic operations in $$\mathcal{H}$$.

Revelant Publications

• H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Error bounds for some approximate posterior measures in Bayesian inference” in Numerical Mathematics and Advanced Applications ENUMATH 2019, ed. F. J. Vermolen and C. Vuik. Lecture Notes in Computational Science and Engineering 139:275–283, 2021. doi:10.1007/978-3-030-55874-1_26 arXiv:1911.05669
• I. Klebanov, I. Schuster, and T. J. Sullivan. “A rigorous theory of conditional mean embeddings.” SIAM J. Math. Data Sci. 2(3):583–606, 2020. doi:10.1137/19M1305069 arXiv:1912.00671
• H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Random forward models and log-likelihoods in Bayesian inverse problems.” SIAM/ASA J. Uncertain. Quantif. 6(4):1600–1629, 2018. doi:10.1137/18M1166523 arXiv:1712.05717
• I. Schuster, P. G. Constantine, and T. J. Sullivan. “Exact active subspace Metropolis–Hastings, with applications to the Lorenz-96 system.” arXiv Preprint, 2017. arXiv:1712.02749

### Optimal Distributionally-Robust Uncertainty Quantification

In its most general form, uncertainty quantification can be cast as a constrained optimisation problem over a large space of parameters, functions, and probability measures, with information acting as constraints on the feasible set (Owhadi et al., 2013; Sullivan et al., 2013). Typically, one is interested in the expected value (and other moments) of some quantity of interest $$q$$, as the probability distribution $$\mu$$ of the inputs $$X$$ and any associated response function $$f$$ range over a general admissible set $$\mathcal{A}$$; that is, we need to calculate and minimise/maximise

$$\mathbb{E}_{X \sim \mu}[q(X, f(X))], \quad (\mu, f) \in \mathcal{A}.$$

The admissible set $$\mathcal{A}$$ corresponds to the available information about the system of interest: for example, one might know that the mean of $$X$$ must lie between $$a$$ and $$b$$, and so the constraint $$a \leq \mathbb{E}_{X \sim \mu}[X] \leq b$$ would form part of the definition of $$\mathcal{A}$$. Similarly, one one could consider higher-order statistics related to $$\mu$$, and information about the response function $$f$$, e.g. numerical models and approximations, archives of legacy data, and so on (Sullivan et al., 2013; Kamga et al., 2014).

In principle, the optimisation problem over $$(\mu, f) \in \mathcal{A}$$ is infinite-dimensional, and therefore computationally intractable. However, extensions of classical Choquet theory (Owhadi et al., 2013) can be used to show that, under generalised independence and moment constraints on the probability measures $$\mu$$, this optimisation problem reduces to a search over a finite-dimensional effective admissible set $$\mathcal{A}_{\Delta} \subset \mathcal{A}$$, and this reduced problem can be solved numerically — in some cases, it is even a convex problem (Han et al., 2015). In the reduced setting $$\mathcal{A}_{\Delta}$$, the measures $$\mu$$ are discrete measures with finite support, and the values of $$f$$ are only needed on the finitely many points where $$\mu$$ assigns positive probability mass:

$$\mu \to \displaystyle \sum_{i = 0}^{N} w_{i} \delta_{x_{i}}, \quad f \to (x_{i}, f(x_{i}))_{i = 0}^{N} .$$

For the forward uncertainty propagation problem, this approach allows for the optimal propagation of distributional uncertainty (e.g. histogram envelopes) through general forward models. In practice, this opens up new possibilities for probabilistic reliability forecasting and quality control. For inverse problems, this approach allows one to study the robustness of Bayesian inversion to perturbations of the prior and likelihood model (Owhadi et al., 2015; Owhadi et al., 2015). See also recent work by Burdzy and Pitman (2019) on the probability of radically different posterior opinions given the same prior probability model.

Revelant Publications

• L. Bonnet, J.-L. Akian, É. Savin, and T. J. Sullivan. “Adaptive reconstruction of imperfectly-observed monotone functions, with applications to uncertainty quantification.” Algorithms 13(8):no. 196, 21pp., 2020. doi:10.3390/a13080196 arXiv:2007.05236
• M. McKerns, F. J. Alexander, K. S. Hickman, T. J. Sullivan, and D. E. Vaughan. “Optimal bounds on nonlinear partial differential equations in model certification, validation, and experimental design” in Handbook on Big Data and Machine Learning in the Physical Sciences, Volume 2: Advanced Analysis Solutions for Leading Experimental Techniques, ed. K. K. van Dam, K. G. Yager, S. I. Campbell, R. Farnsworth, and M. van Dam. World Scientific Series on Emerging Technologies 271–306, 2020. doi:10.1142/9789811204579_0014 arXiv:2009.06626
• H. Owhadi, C. Scovel, and T. J. Sullivan. “On the brittleness of Bayesian inference.” SIAM Rev. 57(4):566–582, 2015. doi:10.1137/130938633 arXiv:1308.6306
• H. Owhadi, C. Scovel, and T. J. Sullivan. “Brittleness of Bayesian inference under finite information in a continuous world.” Electron. J. Stat. 9(1):1–79, 2015. doi:10.1214/15-EJS989 arXiv:1304.6772
• P.-H. T. Kamga, B. Li, M. McKerns, L. H. Nguyen, M. Ortiz, H. Owhadi, and T. J. Sullivan. “Optimal uncertainty quantification with model uncertainty and legacy data.” J. Mech. Phys. Solids 72:1–19, 2014. doi:10.1016/j.jmps.2014.07.007
• T. J. Sullivan. “Optimal Uncertainty Quantification for Hypervelocity Impact” in Uncertainty Quantification in Computational Fluid Dynamics, 15–19 September 2014, von Karman Institute for Fluid Dynamics, Belgium, and 2–3 June 2014, Stanford University, United States. STO-AVT-VKI Lecture Series, AVT-235, , 2014.
• J. Brennan. “Some optimal inequalities for vector-valued random variables.” MA469 Fourth Year Project, University of Warwick, 2014.
• T. J. Sullivan, M. McKerns, M. Ortiz, H. Owhadi, and C. Scovel. “Optimal uncertainty quantification: Distributional robustness versus Bayesian brittleness.” ASME J. Med. Dev. 7(4):040920, 2013. doi:10.1115/1.4025786
• T. J. Sullivan, M. McKerns, D. Meyer, F. Theil, H. Owhadi, and M. Ortiz. “Optimal uncertainty quantification for legacy data observations of Lipschitz functions.” ESAIM Math. Model. Numer. Anal. 47(6):1657–1689, 2013. doi:10.1051/m2an/2013083 arXiv:1202.1928
• H. Owhadi, C. Scovel, T. J. Sullivan, M. McKerns, and M. Ortiz. “Optimal Uncertainty Quantification.” SIAM Rev. 55(2):271–345, 2013. doi:10.1137/10080782X arXiv:1009.0679
• M. Ortiz, M. McKerns, H. Owhadi, T. J. Sullivan, and C. Scovel. “Optimal Uncertainty Quantification” in Advanced Computational Engineering, 12–18 February 2012, ed. O. Allix, C. Carstensen, J. Schröder, and P. Wriggers. Oberwolfach Reports 9(1):537–540, 2012. doi:10.4171/OWR/2012/09
• L. H. Nguyen. “Dimensional collapse in optimal uncertainty quantification.” Summer Undergraduate Research Fellowship, California Institute of Technology, 2012.
• Y. Chen. “On a Chebyshev-type inequality for sums of independent random variables.” Senior thesis, California Institute of Technology, 2012.
• M. McKerns, H. Owhadi, C. Scovel, T. J. Sullivan, and M. Ortiz. “The optimal uncertainty algorithm in the mystic framework.” Caltech CACR Technical Report No. 523, August 2010. arXiv:1202.1055

### Concentration of Measure

A powerful heuristic first observed by Lévy (1951) and Milman (1971) is that a ‘nice’ function of many independent or weakly correlated random variables has overwhelming probability to take values near its mean, median, or a low-dimensional subset of its range. This has implications for uncertainty quantification of complex systems: in situations where concentration of measure results can be proved, a large stochastic system becomes effectively deterministic. In the past, I have worked on applications of concentration-of-measure methods to the certification of terminal ballistics (Sullivan et al., 2011; Kidane et al., 2012; Adams et al., 2012).

Revelant Publications

• A. A. Kidane, A. Lashgari, B. Li, M. McKerns, M. Ortiz, G. Ravichandran, M. Stalzer, and T. J. Sullivan. “Rigorous model-based uncertainty quantification with application to terminal ballistics. Part I: Systems with controllable inputs and small scatter.” J. Mech. Phys. Solids 60(5):983–1001, 2012. doi:10.1016/j.jmps.2011.12.001
• M. Adams, A. Lashgari, B. Li, M. McKerns, J. Mihaly, M. Ortiz, H. Owhadi, A. J. Rosakis, M. Stalzer, and T. J. Sullivan. “Rigorous model-based uncertainty quantification with application to terminal ballistics. Part II: Systems with uncontrollable inputs and large scatter.” J. Mech. Phys. Solids 60(5):1002–1019, 2012. doi:10.1016/j.jmps.2011.12.002
• T. J. Sullivan and H. Owhadi. “Distances and diameters in concentration inequalities: from geometry to optimal assignment of sampling resources.” Int. J. Uncertain. Quantif. 2(1):21–38, 2012. doi:10.1615/Int.J.UncertaintyQuantification.v2.i1.30
• T. J. Sullivan, U. Topcu, M. McKerns, and H. Owhadi. “Uncertainty quantification via codimension-one partitioning.” Internat. J. Numer. Methods Engrg. 85(12):1499–1521, 2011. doi:10.1002/nme.3030
• T. J. Sullivan, M. McKerns, U. Topcu, and H. Owhadi. “Uncertainty quantification via codimension-one domain partitioning and a new concentration inequality.” Proc. Soc. Behav. Sci. 2(6):7751–7752, 2010. doi:10.1016/j.sbspro.2010.05.211

### Statistics in nonlinear spaces

Many important statistical problems are fundamentally nonlinear, in the sense that the parameters of interest, which one must learn or infer, cannot be sensibly thought of as elements of a linear vector space.I am interested in theory and applications of statistical methods, especially Bayesian methods, for such settings.

A prime example here is the field of shape analysis, in which the objects of interest are geometrical shapes, described as elements of manifolds or even quotient spaces of manifolds, e.g. Kendall's shape space, the quotient space

$$\displaystyle \Sigma_{m}^{k} = \frac{ \left\{ x = (x_{1}, \dots, x_{k}) \in \mathbb{R}^{m \times k} \,\middle|\, \sum_{i = 1}^{k} x_{i} = 0 \in \mathbb{R}^{m}, \| x \|_{\mathrm{F}} = 1 \right\} }{ \mathrm{SO}_{m} } .$$

Kernel methods for machine learning are also appropriate tools here, in particular the tools of kernel mean embedding and conditional mean embedding, which can realise Bayesian conditional distributions through linear algebra in the feature space (Fukumizu et al., 2004; Song et al., 2009; Klebanov et al., 2020). In this approach to non-parametric inference, which goes back to at least the early 2000s, a random variable $$Y \sim \mathbb{P}_{Y}$$ in a set $$\mathcal{Y}$$ is represented by its kernel mean embedding, the reproducing kernel Hilbert space element

$$\displaystyle \mu_{Y} = \int_{\mathcal{Y}} \psi(y) \, \mathrm{d} \mathbb{P}_{Y} (y) \in \mathcal{G},$$

and conditioning with respect to an observation $$x$$ of a related random variable $$X \sim \mathbb{P}_{X}$$ in a set $$\mathcal{X}$$ with RKHS $$\mathcal{H}$$ is performed using the Woodbury formula

$$\displaystyle \mu_{Y|X = x} = \mu_Y + (C_{XX}^{\dagger} C_{XY})^\ast \, (\varphi(x) - \mu_X) .$$

Here $$\psi \colon \mathcal{Y} \to \mathcal{G}$$ and $$\varphi \colon \mathcal{X} \to \mathcal{H}$$ are the feature maps and the $$C$$'s denote the appropriate centred (cross-)covariance operators of the embedded random variables $$\psi(Y)$$ in $$\mathcal{G}$$ and $$\varphi(X)$$ in $$\mathcal{H}$$.

While one can analyse the validity of the CME approach in a self-contained way (Klebanov et al., 2020), it turns out that the deeper reasons why this trick works are to do with affine approximation of conditional expectation functions (Klebanov et al., 2021). That is, the RKHS setting one in which the Bayes linear approach to statistics is exact.

Revelant Publications

• I. Klebanov, B. Sprungk, and T. J. Sullivan. “The linear conditional expectation in Hilbert space.” Bernoulli 27(4):2267–2299, 2021. doi:10.3150/20-BEJ1308 arXiv:2008.12070
• I. Klebanov, I. Schuster, and T. J. Sullivan. “A rigorous theory of conditional mean embeddings.” SIAM J. Math. Data Sci. 2(3):583–606, 2020. doi:10.1137/19M1305069 arXiv:1912.00671
• E. Nava-Yazdani, H.-C. Hege, T. J. Sullivan, and C. von Tycowicz. “Geodesic analysis in Kendall's shape space with epidemiological applications.” J. Math. Imaging Vis. 62(4):549–559, 2020. doi:10.1007/s10851-020-00945-w arXiv:1906.11950