Tim Sullivan

Outlines of Some Research Topics

My general area of research is uncertainty quantification, at the intersection of applied mathematics, probability, statistics, and computational science. The brief descriptions below outline some of my research interests, past work, and ideas for the future. If you are interested in any of these areas, then please get in touch!

A common theme underlying many of my current research interests is to add ‘the right amount of uncertainty’ to an otherwise certain or near-certain prediction, in order to correctly compensate for over-confidence in the modelling process. In the words of Julian Baggini:

The mark of a mature, psychologically healthy mind is indeed the ability to live with uncertainty and ambiguity, but only as much as there really is.

Probabilistic Numerical Methods

There is a long tradition of giving probabilistic solutions to deterministic problems such as quadrature, linear algebra, optimisation, etc.; early contributions in this direction date back to Poincaré at the start of the 20th Century, continued by the work of Larkin in the 1970s, and seminal papers of Diaconis (1988) and O'Hagan (1992). For example, a Bayesian approach to computing the integral of some function \(f \colon [a, b] \to \mathbb{R}\) would be to specify a prior probability measure on a suitable space of integrands, treat the evaluations \(y_{i} = f(x_{i})\) of \(f\) at \(n\) quadrature nodes \(x_{i} \in [a, b]\) as observed data, and then form the posterior distribution");

\( \mathbb{P} \left( \int_{a}^{b} f(x) \, \mathrm{d} x \,\middle|\, f(x_{i}) = y_{i} \text{ for } i = 1, \dots, n \right) . \)

The expected value of this distribution gives an estimate of the value of the integral, and the variance can be used as a measure of accuracy. Indeed, a now-classical result of Sul′din (1959) shows that, under the Brownian motion prior on \(f\), the posterior mean / MAP estimator for \(\int_{a}^{b} f(x) \, \mathrm{d} x\) is the elementary trapezoidal rule for quadrature — there is an underlying posterior mean estimator for \(f\), namely the piecewise-linear interpolant of the data \(f(x_{i}) = y_{i}\), and the integral of this interpolant is the posterior mean estimator of \(\int_{a}^{b} f(x) \, \mathrm{d} x\). The moral is that many deterministic numerical methods can be seen as point estimators for statistical procedures; conversely, statistical procedures offer a richer quantification of uncertainty in numerical tasks, and this UQ can be fed forward into other parts of scientific pipelines (e.g. inverse problems).

This probabilistic perpective on numerical analysis has received renewed attention in the last five years from the statistical and machine learning communities (Hennig et al., 2015), with an increasing emphasis on mathematical analysis and foundations, as well as new classes of methods.

Mathematical and Statistical Foundations

On a general level, I am interested in foundational statistical questions, e.g. what it means for a probabilistic numerical method to be Bayesian, and how to exactly or approximately realise the Bayesian posterior for a probabilistic numerical method . I am also interested in concrete applications, e.g. to data-centric engineering, molecular dynamics, atomistics, and other multiscale modelling problems.

Revelant Publications

  • J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Bayesian probabilistic numerical methods.” SIAM Rev., 2019. To appear. arXiv:1702.03673
  • C. J. Oates, J. Cockayne, D. Prangle, T. J. Sullivan, and M. Girolami. “Optimality criteria for probabilistic numerical methods.” arXiv Preprint, 2019. arXiv:1901.04326
  • C. J. Oates and T. J. Sullivan. “A modern retrospective on probabilistic numerics.” Stat. Comput., 2019. To appear. arXiv:1901.04457

Applications to Inverse Problems

One motivation for the use of probabilistic numerical methods is to quantify the impact of discretisation error in a statistical way, in particular when they are used as solvers in Bayesian inverse problems. To properly understand this impact, one needs to extend the analysis of Bayesian inverse problems to consider the use of randomised forward models and log-likelihoods (Lie et al., 2018).

Revelant Publications

  • H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Random forward models and log-likelihoods in Bayesian inverse problems.” SIAM/ASA J. Uncertain. Quantif. 6(4):1600–1629, 2018. doi:10.1137/18M1166523 arXiv:1712.05717
  • J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic numerical methods for PDE-constrained Bayesian inverse problems” in Proceedings of the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, ed. G. Verdoolaege. AIP Conference Proceedings, 1853:060001-1–060001-8, 2017. doi:10.1063/1.4985359 arXiv:1701.04006
  • J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic meshless methods for partial differential equations and Bayesian inverse problems.” arXiv Preprint, 2016. arXiv:1605.07811

Differential Equations

A fully Bayesian perspective on the solution of ODEs was advocated by Skilling (1992). As is often the case with dynamical systems that have a “time's arrow”, a fully Bayesian perspective is often too demanding, and a sequential filtering-based approach is more appropriate (Schober et al., 2014). However, another approach is based in the perturbation of the numerical flow, as advocated by Conrad (2016), who pointed out the problems associated with over-confident inferences arising from under-resolved models, and established an optimal scaling for Gaussian process perturbations of standard numerical schemes (e.g. implicit Euler or RK4) that introduces maximal uncertainty without changing the deterministic convergence rate for the mean. Thus, for an ODE

\( u'(t) = f(u(t)) \)

and a time step \(\tau > 0\), one replaces the deterministic numerical integrator (discrete time flow map \(\Psi^{\tau}\))

\( U_{k + 1} = \Psi^{\tau}(U_{k}), \)

which generates deterministic approximations \(U_{k} \approx u(k \tau)\) by a randomised integrator

\( U_{k + 1} = \Psi^{\tau}(U_{k}) + \xi_{k}(\tau), \)

where the \(\xi_{k}(t)\) are suitably-chosen stochastic processes: one desirable property is that if \(\Psi^{\tau}\) has truncation error of order \(q\), then so too should \(\mathbb{E}[U]\) — though, of course, the randomised trajectories will exhibit some ‘spread’ about the mean. This approach provides a statistical characterisation of the impact of numerical errors in the integration scheme, which is highly desirable in Bayesian inverse problems, in which the numerical solution \(U\) may be used to help infer important physical parameters.

I am interested in furthering this line of work, both in developing the theory of probabilistic integrators for more general ordinary (Lie et al., 2019; Teymur et al., 2018) and partial differential equations (Cockayne et al., 2016; Cockayne et al., 2017).

Revelant Publications

  • H. C. Lie, A. M. Stuart, and T. J. Sullivan. “Strong convergence rates of probabilistic integrators for ordinary differential equations.” Stat. Comput., 2019. To appear. arXiv:1703.03680
  • O. Teymur, H. C. Lie, Sullivan. T. J., and B. Calderhead. “Implicit probabilistic integrators for ODEs” in Advances in Neural Information Processing Systems 31 (NIPS 2018), ed. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi & R. Garnett. 2018. arXiv:1805.07970
  • H. Kersting, T. J. Sullivan, and P. Hennig. “Convergence rates of Gaussian ODE filters.” arXiv Preprint, 2018. arXiv:1807.09737
  • J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic numerical methods for PDE-constrained Bayesian inverse problems” in Proceedings of the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, ed. G. Verdoolaege. AIP Conference Proceedings, 1853:060001-1–060001-8, 2017. doi:10.1063/1.4985359 arXiv:1701.04006
  • J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami. “Probabilistic meshless methods for partial differential equations and Bayesian inverse problems.” arXiv Preprint, 2016. arXiv:1605.07811

Linear Algebra

A major computational bottleneck in the application of Gaussian process techniques in machine learning and artificial intelligence is the cost of inverting a large square matrix of the form

\( \Theta = \begin{bmatrix} k(x_{1}, x_{1}) & \cdots & k(x_{1}, x_{N}) \\ \vdots & \ddots & \vdots \\ k(x_{N}, x_{1}) & \cdots & k(x_{N}, x_{N}) \end{bmatrix} \in \mathbb{R}^{N \times N} , \)

where \(\{ x_{1}, \dots, x_{N} \}\) is a data set and \(k\) is a covariance kernel function.

Schäfer et al. (2017) use a probabilistic interpretation of Cholesky factorisation to motivate a constructive choice of near-linearly many entries of the \(N^{2}\) entries of \(\Theta\), followed by a zero-fill-in incomplete Cholesky factorisation, yielding an exponentially accurate approximate inverse for \(\Theta\) at near-linear computational cost.

Revelant Publications

  • F. Schäfer, T. J. Sullivan, and H. Owhadi. “Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity.” arXiv Preprint, 2017. arXiv:1706.02205

Bayesian Inverse Problems

Bayesian procedures are popular methods for the quantification of uncertainty, especially in inverse problems. In many modern applications, the underlying models are infinite-dimensional, and discretisation only enters as a computational consideration. The general form of an inverse problem is, given spaces \(\mathcal{U}\) and \(\mathcal{Y}\) and a known operator \(G \colon \mathcal{U} \to \mathcal{Y}\), to recover \(u \in \mathcal{U}\) from \(y \in \mathcal{Y}\), where \(y\) is a (perhaps noisily corrupted version of) \(G(u)\). A simple example is the additive noise model

\( y = G(u) + \eta , \)

where \(\mathcal{Y}\) is a vector space and \(\eta\) is a \(\mathcal{Y}\)-valued random variable. In modern applications, the forward model \(G\) may correspond to solving an ordinary or partial differential equation, and the parameter \(u\) may be a scalar, vector, or tensor field — and so \(\mathcal{U}\) is an infinite-dimensional space. Inverse problems are typically ill-posed in the strict sense (i.e. no such \(u \in \mathcal{U}\) exists, or there are multiple solutions, or they depend sensitively upon the problem setup. In the Bayesian paradigm, closely related to the tradition of regularisation, the inverse problem is rendered well-posed by seeking a conditional posterior probability distribution for \(u\) given \(y\), not just a point estimate. In modern terminology, following seminal contributions of Stuart (2010), the Bayesian posterior distribution \(\mu^{y}\) on \(\mathcal{U}\) of \(u\) given the data \(y\) is given in terms of its density with respect to the prior distribution \(\mu_{0}\):

\( \begin{align*} \dfrac{\mathrm{d} \mu^{y}}{\mathrm{d} \mu_{0}} (u) & = \dfrac{\exp(- \Phi(u; y))}{Z(y)} , \\ Z(y) & = \int_{\mathcal{U}} \exp(- \Phi(u; y)) \, \mu_{0} (\mathrm{d} u) , \end{align*} \)

where \(\Phi(\cdot; y) \colon \mathcal{U} \to \mathbb{R}\) denotes the negative log-likelihood function (misfit) for the data \(y\). The basic example, when the data space \(\mathcal{Y}\) is finite-dimensional and \(\eta \sim \mathcal{N}(0, \Gamma)\) is centred additive Gaussian noise of covariance \(\Gamma\), is the quadratic misfit functional

\( \Phi(u; y) = \dfrac{1}{2} \| \Gamma^{-1/2} ( y - G(u) ) \|^{2} . \)

Stability and Consistency of Bayesian inverse problems

From both frequentist and numerical analytic points of view, it is natural to ask whether Bayesian methods are well-defined, stable, and consistent:

  • well-definedness: does \(\mu^{y}\), as defined above, really correspond to the notion of conditioning of probability measures, and are there any issues of uniquness?
  • stability: does \(\mu^{y}\) change “only a little bit” if the data \(y\), the log-likelihood \(\Phi\), forward model \(G\), or prior \(\mu_{0}\) are slightly altered?
  • consistency: if exposed to enough high-quality data generated by a “true” data-generating parameter value \(u^{\dagger}\), will the posterior measure \(\mu^{y}\) concentrate on \(u^{\dagger}\)?

In this sense, what kinds of priors and likelihoods make for good modelling choices, and also lead to well-posed inverse problems? This is a subtle topic with many positive and negative results (Owhadi et al., 2015; Owhadi et al., 2015). I am also interested in the use of heavy-tailed priors (Sullivan, 2017; Lie and Sullivan, 2018) randomised models (Lie et al., 2018) in Bayesian inverse problems.

Revelant Publications

  • H. C. Lie, T. J. Sullivan, and A. L. Teckentrup. “Random forward models and log-likelihoods in Bayesian inverse problems.” SIAM/ASA J. Uncertain. Quantif. 6(4):1600–1629, 2018. doi:10.1137/18M1166523 arXiv:1712.05717
  • T. J. Sullivan. “Well-posed Bayesian inverse problems and heavy-tailed stable quasi-Banach space priors.” Inverse Probl. Imaging 11(5):857–874, 2017. doi:10.3934/ipi.2017040 arXiv:1605.05898
  • T. J. Sullivan. “Well-posedness of Bayesian inverse problems in quasi-Banach spaces with stable priors” in 88th Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), Weimar 2017, ed. C. Könke and C. Trunk. Proceedings in Applied Mathematics and Mechanics, 17(1):871–874, 2017. doi:10.1002/pamm.201710402 arXiv:1710.05610
  • H. Owhadi, C. Scovel, and T. J. Sullivan. “Brittleness of Bayesian inference under finite information in a continuous world.” Electron. J. Stat. 9(1):1–79, 2015. doi:10.1214/15-EJS989 arXiv:1304.6772
  • H. Owhadi, C. Scovel, and T. J. Sullivan. “On the brittleness of Bayesian inference.” SIAM Rev. 57(4):566–582, 2015. doi:10.1137/130938633 arXiv:1308.6306

Maximum a Posteriori Estimators

A general probability measure \(\mu\), such as a Bayesian posterior, is generally too complicated to realise exactly, so some kinds of low-order summary statistics must be used. Expected values and covariances are common choices for such summaries; another choice is a “most likely point under \(\mu\)”, a so-called mode or maximum a posteriori estimator. In finite-dimensional spaces \(\mathcal{U}\), a mode is simply a maximiser of the probability density function of \(\mu\). In general, though, there is no Lebesgue measure with which to take densities, so modes must be defined intrinsically.

As is often the case in functional analysis, there are multiple ways of defining modes on an infinite-dimensional space \(\mathcal{U}\), e.g. a separable Banach space. Dashti et al. (2013) proposed defining a strong mode of \(\mu\) to be any point \(u^{\ast} \in \mathcal{U}\) such that

\( \displaystyle \lim_{r \to 0} \frac{\sup_{v \in \mathcal{U}} \mu(B(v, r))}{\mu(B(u^{\ast}, r))} = 1 . \)

In later work, Helin and Burger (2015) proposed a weak mode with respect to a dense subspace \(E \subset \mathcal{U}\). It turns out that, under reasonable conditions, every weak mode is a strong mode and vice versa (Lie and Sullivan, 2018). On the other hand, there are pathological measures, even in finite-dimensional spaces such as \(\mathcal{U} = \mathbb{R}^{2}\), that have weak modes but no strong modes, and so in general these two notions are distinct.

Revelant Publications

  • H. C. Lie and T. J. Sullivan. “Equivalence of weak and strong modes of measures on topological vector spaces.” Inverse Probl. 34(11):115013, 2018. doi:10.1088/1361-6420/aadef2 arXiv:1708.02516
  • H. C. Lie and T. J. Sullivan. “Erratum: Equivalence of weak and strong modes of measures on topological vector spaces (2018 Inverse Problems 34 115013).” Inverse Probl. 34(12):129601, 2018. doi:10.1088/1361-6420/aae55b

Exploration of Bayesian Posterior Distributions

In principle, methods such as Markov chain Monte Carlo (MCMC) can be used to explore and sample an arbitrary probability distribution \(\mu\) on a space \(\mathcal{U}\), such as a Bayesian posterior distribution \(\mu^{y}\). However, these methods often converge poorly — if at all — in practice when \(\mathcal{U}\) has high dimension, or when \(\mu\) is concentrated on a low-dimensional subset of \(\mathcal{U}\). In such situations, it is desirable to approximate \(\mu\) by any reasonable means in order to obtain an informative approximate solution in finite time. Methods of interest include

Revelant Publications

  • I. Schuster, P. G. Constantine, and T. J. Sullivan. “Exact active subspace Metropolis–Hastings, with applications to the Lorenz-96 system.” arXiv Preprint, 2017. arXiv:1712.02749

Optimal Distributionally-Robust Uncertainty Quantification

In its most general form, uncertainty quantification can be cast as a constrained optimisation problem over a large space of parameters, functions, and probability measures, with information acting as constraints on the feasible set (Owhadi et al., 2013; Sullivan et al., 2013). Typically, one is interested in the expected value (and other moments) of some quantity of interest \(q\), as the probability distribution \(\mu\) of the inputs \(X\) and any associated response function \(f\) range over a general admissible set \(\mathcal{A}\); that is, we need to calculate and minimise/maximise

\( \mathbb{E}_{X \sim \mu}[q(X, f(X))], \quad (\mu, f) \in \mathcal{A}. \)

The admissible set \(\mathcal{A}\) corresponds to the available information about the system of interest: for example, one might know that the mean of \(X\) must lie between \(a\) and \(b\), and so the constraint \(a \leq \mathbb{E}_{X \sim \mu}[X] \leq b\) would form part of the definition of \(\mathcal{A}\). Similarly, one one could consider higher-order statistics related to \(\mu\), and information about the response function \(f\), e.g. numerical models and approximations, archives of legacy data, and so on (Sullivan et al., 2013; Kamga et al., 2014).

In principle, the optimisation problem over \((\mu, f) \in \mathcal{A}\) is infinite-dimensional, and therefore computationally intractable. However, extensions of classical Choquet theory (Owhadi et al., 2013) can be used to show that, under generalised independence and moment constraints on the probability measures \(\mu\), this optimisation problem reduces to a search over a finite-dimensional effective admissible set \(\mathcal{A}_{\Delta} \subset \mathcal{A}\), and this reduced problem can be solved numerically — in some cases, it is even a convex problem (Han et al., 2015). In the reduced setting \(\mathcal{A}_{\Delta}\), the measures \(\mu\) are discrete measures with finite support, and the values of \(f\) are only needed on the finitely many points where \(\mu\) assigns positive probability mass:

\( \mu \to \displaystyle \sum_{i = 0}^{N} w_{i} \delta_{x_{i}}, \quad f \to (x_{i}, f(x_{i}))_{i = 0}^{N} . \)

For the forward uncertainty propagation problem, this approach allows for the optimal propagation of distributional uncertainty (e.g. histogram envelopes) through general forward models. In practice, this opens up new possibilities for probabilistic reliability forecasting and quality control. For inverse problems, this approach allows one to study the robustness of Bayesian inversion to perturbations of the prior and likelihood model (Owhadi et al., 2015, Owhadi et al., 2015). See also recent work by Burdzy and Pitman (2019) on the probability of radically different posterior opinions given the same prior probability model.

Revelant Publications

  • H. Owhadi, C. Scovel, and T. J. Sullivan. “Brittleness of Bayesian inference under finite information in a continuous world.” Electron. J. Stat. 9(1):1–79, 2015. doi:10.1214/15-EJS989 arXiv:1304.6772
  • H. Owhadi, C. Scovel, and T. J. Sullivan. “On the brittleness of Bayesian inference.” SIAM Rev. 57(4):566–582, 2015. doi:10.1137/130938633 arXiv:1308.6306
  • P.-H. T. Kamga, B. Li, M. McKerns, L. H. Nguyen, M. Ortiz, H. Owhadi, and T. J. Sullivan. “Optimal uncertainty quantification with model uncertainty and legacy data.” J. Mech. Phys. Solids 72:1–19, 2014. doi:10.1016/j.jmps.2014.07.007
  • T. J. Sullivan. “Optimal Uncertainty Quantification for Hypervelocity Impact” in Uncertainty Quantification in Computational Fluid Dynamics, 15–19 September 2014, von Karman Institute for Fluid Dynamics, Belgium, and 2–3 June 2014, Stanford University, United States. STO-AVT-VKI Lecture Series, AVT-235, 2014.
  • T. J. Sullivan, M. McKerns, D. Meyer, F. Theil, H. Owhadi, and M. Ortiz. “Optimal uncertainty quantification for legacy data observations of Lipschitz functions.” ESAIM Math. Model. Numer. Anal. 47(6):1657–1689, 2013. doi:10.1051/m2an/2013083 arXiv:1202.1928
  • H. Owhadi, C. Scovel, T. J. Sullivan, M. McKerns, and M. Ortiz. “Optimal Uncertainty Quantification.” SIAM Rev. 55(2):271–345, 2013. doi:10.1137/10080782X arXiv:1009.0679
  • T. J. Sullivan, M. McKerns, M. Ortiz, H. Owhadi, and C. Scovel. “Optimal uncertainty quantification: Distributional robustness versus Bayesian brittleness.” ASME J. Med. Dev. 7(4):040920, 2013. doi:10.1115/1.4025786
  • M. Ortiz, M. McKerns, H. Owhadi, T. J. Sullivan, and C. Scovel. “Optimal Uncertainty Quantification” in Advanced Computational Engineering, 12–18 February 2012, ed. O. Allix, C. Carstensen, J. Schröder & P. Wriggers. Oberwolfach Reports 9(1):537–540, 2012. doi:10.4171/OWR/2012/09
  • M. McKerns, H. Owhadi, C. Scovel, T. J. Sullivan, and M. Ortiz. “The optimal uncertainty algorithm in the mystic framework.” Caltech CACR Technical Report No. 523, August 2010. arXiv:1202.1055

Concentration of Measure

A powerful heuristic first observed by Lévy (1951) and Milman (1971) is that a ‘nice’ function of many independent or weakly correlated random variables has overwhelming probability to take values near its mean, median, or a low-dimensional subset of its range. This has implications for uncertainty quantification of complex systems: in situations where concentration of measure results can be proved, a large stochastic system becomes effectively deterministic. In the past, I have worked on applications of concentration-of-measure methods to the certification of terminal ballistics (Sullivan et al., 2011; Adams et al., 2012; Kidane et al., 2012).

Revelant Publications

  • A. A. Kidane, A. Lashgari, B. Li, M. McKerns, M. Ortiz, G. Ravichandran, M. Stalzer, and T. J. Sullivan. “Rigorous Model-Based Uncertainty Quantification with Application to Terminal Ballistics. Part I: Systems with Controllable Inputs and Small Scatter.” J. Mech. Phys. Solids 60(5):983–1001, 2012. doi:10.1016/j.jmps.2011.12.001
  • M. Adams, A. Lashgari, B. Li, M. McKerns, J. Mihaly, M. Ortiz, H. Owhadi, A. J. Rosakis, M. Stalzer, and T. J. Sullivan. “Rigorous Model-Based Uncertainty Quantification with Application to Terminal Ballistics. Part II: Systems with Uncontrollable Inputs and Large Scatter.” J. Mech. Phys. Solids 60(5):1002–1019, 2012. doi:10.1016/j.jmps.2011.12.002
  • T. J. Sullivan and H. Owhadi. “Distances and diameters in concentration inequalities: from geometry to optimal assignment of sampling resources.” Int. J. Uncertain. Quantif. 2(1):21–38, 2012. doi:10.1615/Int.J.UncertaintyQuantification.v2.i1.30
  • T. J. Sullivan, U. Topcu, M. McKerns, and H. Owhadi. “Uncertainty quantification via codimension-one partitioning.” Internat. J. Numer. Methods Engrg. 85(12):1499–1521, 2011. doi:10.1002/nme.3030
  • T. J. Sullivan, M. McKerns, U. Topcu, and H. Owhadi. “Uncertainty quantification via codimension-one domain partitioning and a new concentration inequality.” Proc. Soc. Behav. Sci. 2(6):7751–7752, 2010. doi:10.1016/j.sbspro.2010.05.211