DSAI One-Day Meeting: Sampling Techniques in High Dimensional Spaces
- Friday 10 December 2021 (13:00-17:00)
We would like to welcome you to attend an hybrid afternoon on 'Sampling Techniques in High Dimensional Spaces', which will take place via Zoom as well as on the University of Birmingham campus (Lecture Room 1, Arts Building). We strongly encourage active audience participation throughout the event.
13:00-13:05 A welcome from Professor Iain Styles
13:05-13:40 Paul Fearnhead (Lancaster University) Virtual
13:40-14:15 Ata Kaban (Computer Science, University of Birmingham) In person
14:45-15:20 Arnaud Doucet (University of Oxford) Virtual
15:20-15:55 Kostas Zygalakis (University of Edinburgh) Virtual
15:55-16:30 Jinglai Li (Mathematics, University of Birmingham) In person
Click the Zoom link to join:
The titles and abstracts for the talks are provided below.
Sampling from high-dimensional distributions using Piecewise Deterministic Markov Processes
This talk will show how we can implement MCMC by simulating a continuous-time process, called a piecewise deterministic Markov process. These processes are non-reversible, and thus have the potential to mix better in high-dimensions than standard MCMC algorithms. We will cover recent research on generic implementations of these samplers, and how they can be used to sample distributions on compact spaces and how they can be used to explore model spaces of differing dimensions.
Taming the dimensionality curse in machine learning: A compressive sampling approach
High dimensional learning problems are increasingly prevalent in practice, but typically ill-posed without some fairly specific prior knowledge such as sparsity, margin, intrinsic low dimension, and others. We are interested in the question of how to discover and exploit such benign structural traits of learning problem instances without knowing what they are in advance. In this talk we look at this question through the lens of compressive sampling -- a computationally inexpensive dimensionality reduction method, whereby data is observed in a randomly projected form. Compressive sampling is the cornerstone of compressive sensing, and random projection has also been well-studied in theoretical computer science and low-distortion embeddings. However, the goal in learning is very different, as we do not need to recover, or even to approximate the seen data, instead we need to produce accurate predictions for unseen data. Our strategy is to develop a general notion of compressive distortion, or compressibility, which yields conditions of geometric nature that guarantee effective learning from randomly projected data by an ERM algorithm. We instantiate and interpret the compressive distortion in various model settings, demonstrating its ability to capture benign traits of problem instances. Moreover, in generalised linear models, we find distributional conditions under which a compressive ERM ensemble algorithm exhibits near-optimality in a precise statistical (i.e. minimax) sense. Interestingly, the conditions we find have rather little in common with compressive sensing conditions, and they also differ from the classic learning theoretic conditions. The talk will draw on joint works with Henry Reeve, and with Bob Durrant.
Generative Modeling through High-Dimensional Sampling
Abstract: Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time inhomogeneous Langevin-type SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. In particular, we will explain why these diffusions can "mix" even in extremely high-dimensional spaces.
Connection Between Optimization and Sampling Algorithms
Optimization and Sampling problems lie in the heart of Bayesian inverse problems. The ability to solve such inverse problems depends crucially on the efficient calculation of quantities relating to the posterior distribution, giving thus rise to computationally challenging high dimensional optimization and sampling problems.
In this talk, we will connect the corresponding optimization and sampling problems to the large time behaviour of solutions to (stochastic) differential equations. Establishing such a connection allows to utilise existing knowledge from the field of numerical analysis of differential equations. In particular, two very important concepts are numerical stability and numerical contractivity. In the case of linear differential equations these two concepts coincide, but with the exception of some very simple Runge-Kutta methods such the Euler method in the non-linear case numerical stability doesn’t imply numerical contractivity .
However, the recently introduced framework of integral quadratic constraints and Lyapunov functions [2, 3] allows for bridging this gap between linearity and non-linearity in the case of (strongly) convex functions. We will use this framework, to study a large class of strogly convex optimization methods and give an alternative explanation for the good properties of Nesterov method, as well as highlight the reasons behind the failure of the heavy ball method .
In addition, using similar ideas , we will present a general framework for the non-asymptotic study of the 2-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. If times allows us we will also talk about the application of these ideas to imaging inverse problems [5, 6].
 J. M. Sanz Serna and K. C. Zygalakis, Contractivity of Runge–Kutta Methods for Convex Gradient Systems, SIAM Journal on Numerical Analysis 58(4):2079-2092, 2020.
 L. Lessard, B. Recht, and A. Packard, Analysis and design of optimization algorithms via integral quadratic constraints, SIAM Journal on Optimization, 26(1):57–95, 2016.
 M. Fazlyab, A. Ribeiro, M. Morari, and V. M. Preciado, Analysis of optimization algorithms via integral quadratic constraints: nonstrongly convex problems, SIAM Journal on Optimization, 28(3):2654–2689, 2018.
 J. M. Sanz Serna and K. C. Zygalakis, Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations, Journal of Machine Learning Research, 22, 1--37, 2021
 Eftekhari, B. Vandereycken, G. Vilmart, and K C. Zygalakis, Explicit Stabilised Gradient Descent for Faster Strongly Convex Optimisation, BIT Numerical Mathematics 61:119–139, 2021.
 M. Pereyra, L. Vargas Mieles, and K. C. Zygalakis, Accelerating Proximal Markov Chain Monte Carlo by Using an Explicit Stabilized Method, SIAM Journal on Imaging Sciences 13(2):905-935, 2020.
Maximum Conditional Entropy Hamiltonian Monte Carlo Sampler
The performance of Hamiltonian Monte Carlo (HMC) sampler depends critically on some algorithm parameters such as the total integration time and the numerical integration stepsize. The parameter tuning is particularly challenging when the mass matrix of the HMC sampler is adapted. We propose in this work a Kolmogorov-Sinai entropy (KSE) based design criterion to optimize these algorithm parameters, which can avoid some potential issues in the often used jumping-distance based measures. For near-Gaussian distributions, we are able to derive the optimal algorithm parameters with respect to the KSE criterion analytically. As a byproduct the KSE criterion also provides a theoretical justification for the need to adapt the mass matrix in HMC sampler. Based on the results, we propose an adaptive HMC algorithm, and we then demonstrate the performance of the proposed algorithm with numerical examples.