Gaussian process regression

And classification. And extensions.

December 3, 2019 — July 28, 2023

functional analysis

Gaussian

generative

Hilbert space

kernel tricks

nonparametric

regression

spatial

stochastic processes

time series

Figure 1: Chi Feng’s GP regression demo.

Gaussian random processes/fields are stochastic processes/fields with jointly Gaussian distributions of observations. While “Gaussian process regression” is not wrong per se, there is a common convention in stochastic process theory (and also in pedagogy) to use process to talk about some notionally time-indexed process and field to talk about ones that have a some space-like index without a special arrow of time. This leads to much confusion, because Gaussian field regression is what we usually want to talk about (although the arrow-of-time can pop up usefully). Hereafter I use “field” and “process” interchangeably, as everyone does in this corner of the discipline.

In machine learning, Gaussian fields are used often as a means of regression or classification, since it is fairly easy to conditionalize a Gaussian field on data and produce a posterior distribution over functions. Because the reuslting regression function can have some very funky and weird posterior distributions, we can think of it as a kind of nonparametric Bayesian inference, although as always with that term we probably want to be careful with it; in fact GP regression typically has parameters.

I would further add that GPs are the crystal meth of machine learning methods, in terms of the addictiveness, and of the passion of the people who use it.

The central trick is using a clever union of Hilbert space tricks and probability to give a probabilistic interpretation of functional regression as a kind of nonparametric Bayesian inference.

Useful side divergence into representer theorems and Karhunen-Loève expansions for give us a helpful interpretation. Regression using Gaussian processes is common e.g. spatial statistics where it arises as kriging. Cressie (1990) traces a history of this idea via Matheron (1963a), to works of Krige (1951).

1 Lavish intros

I am not the right guy to provide the canonical introduction, because it already exists. Specifically, Rasmussen and Williams (2006). Moreover, because GP regression is so popular and so elegant, there are many excellent interactive introductions online.

This lecture by the late David Mackay is probably good; the man could talk.

There is also a well-illustrated and elementary introduction by Yuge Shi. There are many, many more.

Gaussianprocess.org is a classic.

A Visual Exploration of Gaussian Processes recommends the following:

Interactive visualization of Gaussian processes by ST John that joins together the different concepts introduced throughout this article.

Gaussian process regression demo by Tomi Peltola

Gaussian Processes for Dummies by Katherine Bailey

Intuition behind Gaussian Processes by Mike McCourt

Fitting Gaussian Process Models in Python by Chris Fonnesbeck

A Practical Guide to Gaussian Processes by Marc Peter Deisenroth, Yicheng Luo, and Mark van der Wilk: heuristics for initializing and optimizing Gaussian processes

If you want more of a hands-on experience, there are also many Python notebooks available:

Fitting Gaussian Process Models in Python by Chris Fonnesbeck

Gaussian process lecture by Andreas Damianou

Already read all those? Try the brutally quick intro.

2 Brutally quick intro

J. T. Wilson et al. (2021) have a dense and useful perspective. If you are used to this field, they might reboot your perspective. If you are new to the GP area, see the more instructive intros.

A Gaussian process (GP) is a random function \(f: \mathcal{X} \rightarrow \mathbb{R}\), such that, for any finite collection of points \(\mathbf{X} \subset \mathcal{X}\), the random vector \(\boldsymbol{f}=f(\mathbf{X})\) follows a Gaussian distribution. Such a process is uniquely identified by a mean function \(\mu: \mathcal{X} \rightarrow \mathbb{R}\) and a positive semi-definite kernel \(k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}\). Hence, if \(f \sim \mathcal{G} \mathcal{P}(\mu, k)\), then \(\boldsymbol{f} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{K})\) is multivariate normal with mean \(\boldsymbol{\mu}=\mu(\mathbf{X})\) and covariance \(\mathbf{K}=k(\mathbf{X}, \mathbf{X})\).

[…] we investigate different ways of reasoning about the random variable \(\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}\) for some non-trivial partition \(\boldsymbol{f}=\boldsymbol{f}_n \oplus \boldsymbol{f}_*\). Here, \(\boldsymbol{f}_n=f\left(\mathbf{X}_n\right)\) are process values at a set of training locations \(\mathbf{X}_n \subset \mathbf{X}\) where we would like to introduce a condition \(\boldsymbol{f}_n=\boldsymbol{y}\), while \(\boldsymbol{f}_*=f\left(\mathbf{X}_*\right)\) are process values at a set of test locations \(\mathbf{X}_* \subset \mathbf{X}\) where we would like to obtain a random variable \(\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}\).

[…] we may obtain \(\boldsymbol{f}_* \mid \boldsymbol{y}\) by first finding its conditional distribution. Since process values \(\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)\) are defined as jointly Gaussian, this procedure closely resembles that of [the finite-dimensional case]: we factor out the marginal distribution of \(\boldsymbol{f}_n\) from the joint distribution \(p\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)\) and, upon canceling, identify the remaining distribution as \(p\left(\boldsymbol{f}_* \mid \boldsymbol{y}\right)\). Having done so, we find that the conditional distribution is the Gaussian \(\mathcal{N}\left(\boldsymbol{\mu}_{* \mid y}, \mathbf{K}_{*, * \mid y}\right)\) with moments \[\begin{aligned} \boldsymbol{\mu}_{* \mid \boldsymbol{y}}&=\boldsymbol{\mu}_*+\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_n\right) \\ \mathbf{K}_{*, * \mid \boldsymbol{y}}&=\mathbf{K}_{*, *}-\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1} \mathbf{K}_{n, *}\end{aligned} \]

3 Kernels

a.k.a. covariance models.

GP regression models are kernel machines. As such covariance kernels are the parameters. More or less. One can also parameterise with a mean function, but (see next) let us ignore that detail for now because usually we do not use them.

4 Prior with a mean functions

Almost immediate but not quite trivial (Rasmussen and Williams 2006, 2.7).

TODO: discuss identifiability.

5 Using state filtering

When one dimension of the input vector can be interpreted as a time dimension we are Kalman filtering Gaussian Processes, which has benefits in terms of speed and hipness.

6 On lattice observations

Gaussian processes on lattices.

7 On manifolds

I would like to read Terenin on GPs on Manifolds who also makes a suggestive connection to SDEs, which is the filtering GPs trick again.

8 By variational inference

🏗

9 With inducing variables

“Sparse GP”. See Quiñonero-Candela and Rasmussen (2005). 🏗

10 By variational inference with inducing variables

See GP factoring.

11 With vector output

See vector gaussian process regression.

12 Deep

Layering Gaussian processes.

13 Neural processes

See neural processes.

14 Non-Gaussian prior that looks a bit similar

See Stochastic process regression.

15 Observation likelihoods

Gaussian processes need not have a Gaussian likelihood. Classification etc. TBD

16 Density estimation

Can I infer a density using GPs? Yes. One popular method is apparently the logistic Gaussian process (Tokdar 2007; Lenk 2003).

17 Approximation with dropout

Unconvincing in practice. See NN ensembles for some vague notes.

18 Inhomogeneous with covariates

Integrated nested Laplace approximation connects to GP-as-SDE idea, I think?

19 For dimension reduction

e.g. GP-LVM (N. Lawrence 2005). 🏗

20 Pathwise/Matheron updates

See pathwise GP.

21 Implementations

See GP regression implementations

22 References

Abrahamsen. 1997. “A Review of Gaussian Random Fields and Correlation Functions.”

Abt, and Welch. 1998. “Fisher Information and Maximum-Likelihood Estimation of Covariance Parameters in Gaussian Stochastic Processes.” Canadian Journal of Statistics.

Altun, Smola, and Hofmann. 2004. “Exponential Families for Conditional Random Fields.” In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. UAI ’04.

Alvarado, and Stowell. 2018. “Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music.” arXiv:1705.07104 [Cs, Stat].

Ambikasaran, Foreman-Mackey, Greengard, et al. 2015. “Fast Direct Methods for Gaussian Processes.” arXiv:1403.6015 [Astro-Ph, Stat].

Bachoc, F., Gamboa, Loubes, et al. 2018. “A Gaussian Process Regression Model for Distribution Inputs.” IEEE Transactions on Information Theory.

Bachoc, Francois, Suvorikova, Ginsbourger, et al. 2019. “Gaussian Processes with Multidimensional Distribution Inputs via Optimal Transport and Hilbertian Embedding.” arXiv:1805.00753 [Stat].

Birgé, and Massart. 2006. “Minimal Penalties for Gaussian Model Selection.” Probability Theory and Related Fields.

Bonilla, Chai, and Williams. 2007. “Multi-Task Gaussian Process Prediction.” In Proceedings of the 20th International Conference on Neural Information Processing Systems. NIPS’07.

Bonilla, Krauth, and Dezfouli. 2019. “Generic Inference in Latent Gaussian Process Models.” Journal of Machine Learning Research.

Borovitskiy, Terenin, Mostowsky, et al. 2020. “Matérn Gaussian Processes on Riemannian Manifolds.” arXiv:2006.10160 [Cs, Stat].

Burt, Rasmussen, and Wilk. 2020. “Convergence of Sparse Variational Inference in Gaussian Processes Regression.” Journal of Machine Learning Research.

Calandra, Peters, Rasmussen, et al. 2016. “Manifold Gaussian Processes for Regression.” In 2016 International Joint Conference on Neural Networks (IJCNN).

Cressie. 1990. “The Origins of Kriging.” Mathematical Geology.

———. 2015. Statistics for Spatial Data.

Cressie, and Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0.

Csató, and Opper. 2002. “Sparse On-Line Gaussian Processes.” Neural Computation.

Csató, Opper, and Winther. 2001. “TAP Gibbs Free Energy, Belief Propagation and Sparsity.” In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. NIPS’01.

Cunningham, Shenoy, and Sahani. 2008. “Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.

Cutajar, Bonilla, Michiardi, et al. 2017. “Random Feature Expansions for Deep Gaussian Processes.” In PMLR.

Dahl, and Bonilla. 2017. “Scalable Gaussian Process Models for Solar Power Forecasting.” In Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy. Lecture Notes in Computer Science.

Dahl, and Bonilla. 2019. “Sparse Grouped Gaussian Processes for Solar Power Forecasting.” arXiv:1903.03986 [Cs, Stat].

Damianou, and Lawrence. 2013. “Deep Gaussian Processes.” In Artificial Intelligence and Statistics.

Damianou, Titsias, and Lawrence. 2011. “Variational Gaussian Process Dynamical Systems.” In Advances in Neural Information Processing Systems 24.

Dezfouli, and Bonilla. 2015. “Scalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28. NIPS’15.

Domingos. 2020. “Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.00152 [Cs, Stat].

Dubrule. 2018. “Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In Handbook of Mathematical Geosciences: Fifty Years of IAMG.

Dunlop, Girolami, Stuart, et al. 2018. “How Deep Are Deep Gaussian Processes?” Journal of Machine Learning Research.

Dutordoir, Hensman, van der Wilk, et al. 2021. “Deep Neural Networks as Point Estimates for Deep Gaussian Processes.” In arXiv:2105.04504 [Cs, Stat].

Dutordoir, Saul, Ghahramani, et al. 2022. “Neural Diffusion Processes.”

Duvenaud. 2014. “Automatic Model Construction with Gaussian Processes.”

Duvenaud, Lloyd, Grosse, et al. 2013. “Structure Discovery in Nonparametric Regression Through Compositional Kernel Search.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13).

Ebden. 2015. “Gaussian Processes: A Quick Introduction.” arXiv:1505.02965 [Math, Stat].

Eleftheriadis, Nicholson, Deisenroth, et al. 2017. “Identification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30.

Emery. 2007. “Conditioning Simulations of Gaussian Random Fields by Ordinary Kriging.” Mathematical Geology.

Evgeniou, Micchelli, and Pontil. 2005. “Learning Multiple Tasks with Kernel Methods.” Journal of Machine Learning Research.

Ferguson. 1973. “A Bayesian Analysis of Some Nonparametric Problems.” The Annals of Statistics.

Finzi, Bondesan, and Welling. 2020. “Probabilistic Numeric Convolutional Neural Networks.” arXiv:2010.10876 [Cs].

Föll, Haasdonk, Hanselmann, et al. 2017. “Deep Recurrent Gaussian Process with Variational Sparse Spectrum Approximation.” arXiv:1711.00799 [Stat].

Frigola, Chen, and Rasmussen. 2014. “Variational Gaussian Process State-Space Models.” In Advances in Neural Information Processing Systems 27.

Frigola, Lindsten, Schön, et al. 2013. “Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC.” In Advances in Neural Information Processing Systems 26.

Gal, and Ghahramani. 2015. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).

Galliani, Dezfouli, Bonilla, et al. 2017. “Gray-Box Inference for Structured Gaussian Process Models.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.

Gal, and van der Wilk. 2014. “Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial.” arXiv:1402.1412 [Stat].

Gardner, Pleiss, Bindel, et al. 2018. “GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.

Gardner, Pleiss, Wu, et al. 2018. “Product Kernel Interpolation for Scalable Gaussian Processes.” arXiv:1802.08903 [Cs, Stat].

Garnelo, Rosenbaum, Maddison, et al. 2018. “Conditional Neural Processes.” arXiv:1807.01613 [Cs, Stat].

Garnelo, Schwarz, Rosenbaum, et al. 2018. “Neural Processes.”

Ghahramani. 2013. “Bayesian Non-Parametrics and the Probabilistic Approach to Modelling.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

Gilboa, Saatçi, and Cunningham. 2015. “Scaling Multidimensional Inference for Structured Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Girolami, and Rogers. 2005. “Hierarchic Bayesian Models for Kernel Learning.” In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05.

Gramacy. 2016. “laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R.” Journal of Statistical Software.

Gramacy, and Apley. 2015. “Local Gaussian Process Approximation for Large Computer Experiments.” Journal of Computational and Graphical Statistics.

Gratiet, Marelli, and Sudret. 2016. “Metamodel-Based Sensitivity Analysis: Polynomial Chaos Expansions and Gaussian Processes.” In Handbook of Uncertainty Quantification.

Grosse, Salakhutdinov, Freeman, et al. 2012. “Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence.

Hartikainen, and Särkkä. 2010. “Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing.

Hensman, Fusi, and Lawrence. 2013. “Gaussian Processes for Big Data.” In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence. UAI’13.

Huber. 2014. “Recursive Gaussian Process: On-Line Regression and Learning.” Pattern Recognition Letters.

Huggins, Campbell, Kasprzak, et al. 2018. “Scalable Gaussian Process Inference with Finite-Data Mean and Variance Guarantees.” arXiv:1806.10234 [Cs, Stat].

Jankowiak, Pleiss, and Gardner. 2020. “Deep Sigma Point Processes.” In Conference on Uncertainty in Artificial Intelligence.

Jordan. 1999. Learning in Graphical Models.

Karvonen, and Särkkä. 2016. “Approximate State-Space Gaussian Processes via Spectral Transformation.” In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

Kasim, Watson-Parris, Deaconu, et al. 2020. “Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.” arXiv:2001.08055 [Physics, Stat].

Kingma, and Welling. 2014. “Auto-Encoding Variational Bayes.” In ICLR 2014 Conference.

Kocijan, Girard, Banko, et al. 2005. “Dynamic Systems Identification with Gaussian Processes.” Mathematical and Computer Modelling of Dynamical Systems.

Ko, and Fox. 2009. “GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models.” In Autonomous Robots.

Krauth, Bonilla, Cutajar, et al. 2016. “AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In UAI17.

Krige. 1951. “A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand.” Journal of the Southern African Institute of Mining and Metallurgy.

Kroese, and Botev. 2013. “Spatial Process Generation.” arXiv:1308.0399 [Stat].

Lawrence, Neil. 2005. “Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.” Journal of Machine Learning Research.

Lawrence, Neil, Seeger, and Herbrich. 2003. “Fast Sparse Gaussian Process Methods: The Informative Vector Machine.” In Proceedings of the 16th Annual Conference on Neural Information Processing Systems.

Lawrence, Neil D., and Urtasun. 2009. “Non-Linear Matrix Factorization with Gaussian Processes.” In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.

Lázaro-Gredilla, Quiñonero-Candela, Rasmussen, et al. 2010. “Sparse Spectrum Gaussian Process Regression.” Journal of Machine Learning Research.

Lee, Bahri, Novak, et al. 2018. “Deep Neural Networks as Gaussian Processes.” In ICLR.

Leibfried, Dutordoir, John, et al. 2022. “A Tutorial on Sparse Gaussian Processes and Variational Inference.”

Lenk. 2003. “Bayesian Semiparametric Density Estimation and Model Verification Using a Logistic–Gaussian Process.” Journal of Computational and Graphical Statistics.

Lindgren, Rue, and Lindström. 2011. “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Liutkus, Badeau, and Richard. 2011. “Gaussian Processes for Underdetermined Source Separation.” IEEE Transactions on Signal Processing.

Lloyd, Duvenaud, Grosse, et al. 2014. “Automatic Construction and Natural-Language Description of Nonparametric Regression Models.” In Twenty-Eighth AAAI Conference on Artificial Intelligence.

Louizos, Shi, Schutte, et al. 2019. “The Functional Neural Process.” In Advances in Neural Information Processing Systems.

Lu. 2022. “A Rigorous Introduction to Linear Models.”

MacKay. 1998. “Introduction to Gaussian Processes.” NATO ASI Series. Series F: Computer and System Sciences.

———. 2002. “Gaussian Processes.” In Information Theory, Inference & Learning Algorithms.

Matheron. 1963a. Traité de Géostatistique Appliquée. 2. Le Krigeage.

———. 1963b. “Principles of Geostatistics.” Economic Geology.

Matthews, van der Wilk, Nickson, et al. 2016. “GPflow: A Gaussian Process Library Using TensorFlow.” arXiv:1610.08733 [Stat].

Mattos, Dai, Damianou, et al. 2016. “Recurrent Gaussian Processes.” In Proceedings of ICLR.

Mattos, Dai, Damianou, et al. 2017. “Deep Recurrent Gaussian Processes for Outlier-Robust System Identification.” Journal of Process Control, DYCOPS-CAB 2016,.

Micchelli, and Pontil. 2005a. “Learning the Kernel Function via Regularization.” Journal of Machine Learning Research.

———. 2005b. “On Learning Vector-Valued Functions.” Neural Computation.

Minh. 2022. “Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.” SIAM/ASA Journal on Uncertainty Quantification.

Mohammadi, Challenor, and Goodfellow. 2021. “Emulating Computationally Expensive Dynamical Simulators Using Gaussian Processes.” arXiv:2104.14987 [Stat].

Moreno-Muñoz, Artés-Rodríguez, and Álvarez. 2019. “Continual Multi-Task Gaussian Processes.” arXiv:1911.00002 [Cs, Stat].

Nagarajan, Peters, and Nevat. 2018. “Spatial Field Reconstruction of Non-Gaussian Random Fields: The Tukey G-and-H Random Process.” SSRN Electronic Journal.

Nickisch, Solin, and Grigorevskiy. 2018. “State Space Gaussian Processes with Non-Gaussian Likelihood.” In International Conference on Machine Learning.

O’Hagan. 1978. “Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society: Series B (Methodological).

Papaspiliopoulos, Pokern, Roberts, et al. 2012. “Nonparametric Estimation of Diffusions: A Differential Equations Approach.” Biometrika.

Pinder, and Dodd. 2022. “GPJax: A Gaussian Process Framework in JAX.” Journal of Open Source Software.

Pleiss, Gardner, Weinberger, et al. 2018. “Constant-Time Predictive Distributions for Gaussian Processes.” In.

Pleiss, Jankowiak, Eriksson, et al. 2020. “Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization.” Advances in Neural Information Processing Systems.

Qi, Abdel-Gawad, and Minka. 2010. “Sparse-Posterior Gaussian Processes for General Likelihoods.” In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. UAI’10.

Quiñonero-Candela, and Rasmussen. 2005. “A Unifying View of Sparse Approximate Gaussian Process Regression.” Journal of Machine Learning Research.

Raissi, and Karniadakis. 2017. “Machine Learning of Linear Differential Equations Using Gaussian Processes.” arXiv:1701.02440 [Cs, Math, Stat].

Rasmussen, and Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning.

Reece, and Roberts. 2010. “An Introduction to Gaussian Processes for the Kalman Filter Expert.” In 2010 13th International Conference on Information Fusion.

Ritter, Kukla, Zhang, et al. 2021. “Sparse Uncertainty Representation in Deep Learning with Inducing Weights.” arXiv:2105.14594 [Cs, Stat].

Riutort-Mayol, Bürkner, Andersen, et al. 2020. “Practical Hilbert Space Approximate Bayesian Gaussian Processes for Probabilistic Programming.” arXiv:2004.11408 [Stat].

Rossi, Heinonen, Bonilla, et al. 2021. “Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations.” In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.

Saatçi. 2012. “Scalable inference for structured Gaussian process models.”

Saatçi, Turner, and Rasmussen. 2010. “Gaussian Process Change Point Models.” In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10.

Saemundsson, Terenin, Hofmann, et al. 2020. “Variational Integrator Networks for Physically Structured Embeddings.” arXiv:1910.09349 [Cs, Stat].

Salimbeni, and Deisenroth. 2017. “Doubly Stochastic Variational Inference for Deep Gaussian Processes.” In Advances In Neural Information Processing Systems.

Salimbeni, Eleftheriadis, and Hensman. 2018. “Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models.” In International Conference on Artificial Intelligence and Statistics.

Särkkä. 2011. “Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.” In Artificial Neural Networks and Machine Learning – ICANN 2011. Lecture Notes in Computer Science.

———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3.

Särkkä, and Hartikainen. 2012. “Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In Artificial Intelligence and Statistics.

Särkkä, Solin, and Hartikainen. 2013. “Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” IEEE Signal Processing Magazine.

Schulam, and Saria. 2017. “Reliable Decision Support Using Counterfactual Models.” In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.

Shah, Wilson, and Ghahramani. 2014. “Student-t Processes as Alternatives to Gaussian Processes.” In Artificial Intelligence and Statistics.

Sidén. 2020. Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields. Linköping Studies in Statistics.

Smith, Alvarez, and Lawrence. 2018. “Gaussian Process Regression for Binned Data.” arXiv:1809.02010 [Cs, Stat].

Snelson, and Ghahramani. 2005. “Sparse Gaussian Processes Using Pseudo-Inputs.” In Advances in Neural Information Processing Systems.

Solin, and Särkkä. 2020. “Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.” Statistics and Computing.

Tait, and Damoulas. 2020. “Variational Autoencoding of PDE Inverse Problems.” arXiv:2006.15641 [Cs, Stat].

Tang, Zhang, and Banerjee. 2019. “On Identifiability and Consistency of the Nugget in Gaussian Spatial Process Models.” arXiv:1908.05726 [Math, Stat].

Titsias, Michalis K. 2009a. “Variational Learning of Inducing Variables in Sparse Gaussian Processes.” In International Conference on Artificial Intelligence and Statistics.

———. 2009b. “Variational Model Selection for Sparse Gaussian Process Regression: TEchical Supplement.”

Titsias, Michalis, and Lawrence. 2010. “Bayesian Gaussian Process Latent Variable Model.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.

Tokdar. 2007. “Towards a Faster Implementation of Density Estimation With Logistic Gaussian Process Priors.” Journal of Computational and Graphical Statistics.

Turner, Ryan, Deisenroth, and Rasmussen. 2010. “State-Space Inference and Learning with Gaussian Processes.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.

Turner, Richard E., and Sahani. 2014. “Time-Frequency Analysis as Probabilistic Inference.” IEEE Transactions on Signal Processing.

van der Wilk, Wilson, and Rasmussen. 2014. “Variational Inference for Latent Variable Modelling of Correlation Structure.” In NIPS 2014 Workshop on Advances in Variational Inference.

Vanhatalo, Riihimäki, Hartikainen, et al. 2013. “GPstuff: Bayesian Modeling with Gaussian Processes.” Journal of Machine Learning Research.

———, et al. 2015. “Bayesian Modeling with Gaussian Processes Using the GPstuff Toolbox.” arXiv:1206.5754 [Cs, Stat].

Walder, Christian, Kim, and Schölkopf. 2008. “Sparse Multiscale Gaussian Process Regression.” In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.

Walder, C., Schölkopf, and Chapelle. 2006. “Implicit Surface Modelling with a Globally Regularised Basis of Compact Support.” Computer Graphics Forum.

Wang, Pleiss, Gardner, et al. 2019. “Exact Gaussian Processes on a Million Data Points.” In Advances in Neural Information Processing Systems.

Wikle, Cressie, and Zammit-Mangion. 2019. Spatio-Temporal Statistics with R.

Wilkinson, Andersen, Reiss, et al. 2019. “End-to-End Probabilistic Inference for Nonstationary Audio Analysis.” arXiv:1901.11436 [Cs, Eess, Stat].

Wilkinson, Särkkä, and Solin. 2021. “Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.”

Williams, Christopher, Klanke, Vijayakumar, et al. 2009. “Multi-Task Gaussian Process Learning of Robot Inverse Dynamics.” In Advances in Neural Information Processing Systems 21.

Williams, Christopher KI, and Seeger. 2001. “Using the Nyström Method to Speed Up Kernel Machines.” In Advances in Neural Information Processing Systems.

Wilson, Andrew Gordon, and Adams. 2013. “Gaussian Process Kernels for Pattern Discovery and Extrapolation.” In International Conference on Machine Learning.

Wilson, James T, Borovitskiy, Terenin, et al. 2020. “Efficiently Sampling Functions from Gaussian Process Posteriors.” In Proceedings of the 37th International Conference on Machine Learning.

Wilson, James T, Borovitskiy, Terenin, et al. 2021. “Pathwise Conditioning of Gaussian Processes.” Journal of Machine Learning Research.

Wilson, Andrew Gordon, Dann, Lucas, et al. 2015. “The Human Kernel.” arXiv:1510.07389 [Cs, Stat].

Wilson, Andrew Gordon, and Ghahramani. 2011. “Generalised Wishart Processes.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11.

———. 2012. “Modelling Input Varying Correlations Between Multiple Responses.” In Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science.

Wilson, Andrew Gordon, Knowles, and Ghahramani. 2012. “Gaussian Process Regression Networks.” In Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.

Wilson, Andrew Gordon, and Nickisch. 2015. “Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP).” In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37. ICML’15.

Zhang, Walder, Bonilla, et al. 2020. “Quantile Propagation for Wasserstein-Approximate Gaussian Processes.” In Proceedings of NeurIPS 2020.