“Gaussian processes” are processes with Gaussian marginal distributions, like Brownian motions and suchlike. Very prominent in, e.g. spatial statistics.

Chi Feng’s amazing GP http://chifeng.scripts.mit.edu/stuff/gp-demo/

When you see it capitalised it tends to means a specific emphasis, on the use of these processes for regression, as nonparametric method with a conveniently Bayesian interpretation. The basic trick is using covariance estimation and/or Gaussian process simulation on some clever Hilbert space to do functional regression.

I feel this is not too complex but I’ve not looked deeply into it. They reputedly work well with kernel methods to do machine learning stuff, apparently. The details of this are still hazy to me, and they aren’t currently on the correct side of the hype curve for me to dive in.

This web site aims to provide an overview of resources concerned with probabilistic modeling, inference and learning based on Gaussian processes. Although Gaussian processes have a long history in the field of statistics, they seem to have been employed extensively only in niche areas. With the advent of kernel machines in the machine learning community, models based on Gaussian processes have become commonplace for problems of regression (kriging) and classification as well as a host of more specialized applications.

I’ve not been very enthusiastic about these in the past for the reason of it not being worth it. It’s nice to have a principle nonparametric Bayesian formalism, but it’s pointless having a formalism that is so computationally demanding that people don’t try to use more than a thousand datapoints.

However, perhaps I should be persuaded by AutoGP (BoKD16) which breaks a lot of the awful computational deadlocks by clever use of inducing variables and variational approximation to produce a compressed representation of the data with tractable inference and model selection, including kernel selection, and doing the whole thing in many dimensions simultaneously.

## Implementations

Bayes workhorse stan can do Gaussian Process regression just like everything else; see Michael Betancourt’s blog, 1. 2. 3.

The current scikit-learn has semi-fancy gaussian processes, and an introduction.

Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems.

The advantages of Gaussian processes are:

The prediction interpolates the observations (at least for regular kernels).

The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest.

Versatile: different kernels can be specified. Common kernels are provided, but it is also possible to specify custom kernels.

The disadvantages of Gaussian processes include:

They are not sparse, i.e., they use the whole samples/features information to perform the prediction.

They lose efficiency in high dimensional spaces – namely when the number of features exceeds a few dozens.

Is that last point strictly true? Surely an appropriate kernel could ameliorate the dimensionality problem?

There are even fancier Gaussian process toolsets. Chris Fonnesbeck mentions GPflow, autogp, PyMC3, and the scikit-learn implementation. Plus I notice skgmm is a fancified version of the scikit-learn one. So… It’s easy enough to be bikeshedded is the message I’m getting here. George is another python GP regression that claims to handle big data at the cost of lots of C++.

Questions:

Can I infer a density using these? Or is it strictly in a regression/classification setting that the machinery works? (EDIT: Yes, you can. I believe Neil Lawrence has been commended to me for this?)

Can you somehow make them (in some sense) sparse after all, using kernel approximation techniques? Is this what the variational version does?

## Kernels

a.k.a. covariance models.

GP models are the meeting of Covariance estimation and kernel machines.

### Matern

The Matérn stationary (and in the Euclidean case, isotropic) covariance function is one model for covariance. See Carl Edward Rasmussen’s Gaussian Process lecture notes for a readable explanation, or chapter 4 of his textbook (RaWi06).

### Cyclic

TBD

## Approximation with state filtering

Looks interesting. Without knowing enough about either to make an informed judgement, I imagine this makes the Gaussian process regression soluble by making it local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. But, in fact, I do not know. See Simo Särkkä’s work, I suppose? (HaSä10, SäHa12, SäSH13, KaSä16)

## Approximation with variational inference

TBD.

## Approximation with inducing variables

TBD.

## Approximation with variational inference and inducing variables

This is the trick that makes AutoGP work. (BoKD16). TBD.

## Random projection kernel approximation

For now see Kernel approximation.

## Readings

This lecture by the late David Mackay is probably good; the man could talk.

## Refs

- Abra97: (1997) A review of Gaussian random fields and correlation functions
- QuRa05: (2005) A Unifying View of Sparse Approximate Gaussian Process Regression.
*Journal of Machine Learning Research*, 6(Dec), 1939–1959. - KaSä16: (2016) Approximate state-space Gaussian processes via spectral transformation.
- KiWe14: (2014) Auto-Encoding Variational Bayes. In ICLR 2014 conference.
- KBCF16: (2016) AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models. In UAI17.
- LDGT14: (2014) Automatic Construction and Natural-Language Description of Nonparametric Regression Models.
*ArXiv:1402.4304 [Cs, Stat]*. - Särk13: (2013)
*Bayesian filtering and smoothing*. Cambridge, U.K. ; New York: Cambridge University Press - TiLa10: (2010) Bayesian Gaussian Process Latent Variable Model. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 844–851).
- FLSR13: (2013) Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC. In Advances in Neural Information Processing Systems 26 (pp. 3156–3164). Curran Associates, Inc.
- Emer07: (2007) Conditioning Simulations of Gaussian Random Fields by Ordinary Kriging.
*Mathematical Geology*, 39(6), 607–623. DOI - FHHU17: (2017) Deep Recurrent Gaussian Process with Variational Sparse Spectrum Approximation.
*ArXiv:1711.00799 [Stat]*. - MDDB17: (2017) Deep recurrent Gaussian processes for outlier-robust system identification.
*Journal of Process Control*, 60, 82–94. DOI - SaDe17: (2017) Doubly Stochastic Variational Inference for Deep Gaussian Processes. In Advances In Neural Information Processing Systems.
- KGBM05: (2005) Dynamic systems identification with Gaussian processes.
*Mathematical and Computer Modelling of Dynamical Systems*, 11(4), 411–424. DOI - GSFT12: (2012) Exploiting compositionality to explore a large space of model structures. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
- AlSH04: (2004) Exponential Families for Conditional Random Fields. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 2–9). Arlington, Virginia, United States: AUAI Press
- LaSH03: (2003) Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (pp. 609–616).
- WiAd13: (2013) Gaussian Process Kernels for Pattern Discovery and Extrapolation.
*ArXiv:1302.4245 [Cs, Stat]*. - Mack02: (2002) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
- Ebde15: (2015) Gaussian Processes: A Quick Introduction.
*ArXiv:1505.02965 [Math, Stat]*. - HeFL13: (2013) Gaussian Processes for Big Data. In Uncertainty in Artificial Intelligence (p. 282). Citeseer
- RaWi06: (2006)
*Gaussian processes for machine learning*. Cambridge, Mass: MIT Press - BoKD16: (2016) Generic Inference in Latent Gaussian Process Models.
*ArXiv:1609.00577 [Stat]*. - KoFo09: (2009) GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models.
*Autonomous Robots*, 27(1), 75–90. DOI - MWNF16: (2016) GPflow: A Gaussian process library using TensorFlow.
*ArXiv:1610.08733 [Stat]*. - ENDH17: (2017) Identification of Gaussian Process State Space Models. In Advances in Neural Information Processing Systems 30 (pp. 5309–5319). Curran Associates, Inc.
- WaSC06: (2006) Implicit Surface Modelling with a Globally Regularised Basis of Compact Support.
*Computer Graphics Forum*, 25(3), 635–644. DOI - SäHa12: (2012) Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression. In Journal of Machine Learning Research.
- Mack98: (1998) Introduction to Gaussian processes.
*NATO ASI Series. Series F: Computer and System Sciences*, 133–165. - HaSä10: (2010) Kalman filtering and smoothing solutions to temporal Gaussian process regression models. In 2010 IEEE International Workshop on Machine Learning for Signal Processing (pp. 379–384). DOI
- Jord99: (1999)
*Learning in graphical models*. Cambridge, Mass.: MIT Press - EvMP05: (2005) Learning Multiple Tasks with Kernel Methods.
*Journal of Machine Learning Research*, 6(Apr), 615–637. - MiPo05a: (2005a) Learning the Kernel Function via Regularization.
*Journal of Machine Learning Research*, 6(Jul), 1099–1125. - RaKa17: (2017) Machine Learning of Linear Differential Equations using Gaussian Processes.
*ArXiv:1701.02440 [Cs, Math, Stat]*. - BiMa06: (2006) Minimal Penalties for Gaussian Model Selection.
*Probability Theory and Related Fields*, 138(1–2), 33–73. DOI - WKVC09: (2009) Multi-task Gaussian Process Learning of Robot Inverse Dynamics. In Advances in Neural Information Processing Systems 21 (pp. 265–272). Curran Associates, Inc.
- BoCW07: (2007) Multi-task Gaussian Process Prediction. In Proceedings of the 20th International Conference on Neural Information Processing Systems (pp. 153–160). USA: Curran Associates Inc.
- MiPo05b: (2005b) On Learning Vector-Valued Functions.
*Neural Computation*, 17(1), 177–204. DOI - CBMF17: (2017) Random Feature Expansions for Deep Gaussian Processes. In PMLR.
- MDDF16: (2016) Recurrent Gaussian Processes. In Proceedings of ICLR.
- DeBo15: (2015) Scalable Inference for Gaussian Process Models with Black-box Likelihoods. In Advances in Neural Information Processing Systems 28 (pp. 1414–1422). Cambridge, MA, USA: MIT Press
- SnGh05: (2005) Sparse Gaussian processes using pseudo-inputs. In Advances in neural information processing systems (pp. 1257–1264).
- WaKS08: (2008) Sparse Multiscale Gaussian Process Regression. In Proceedings of the 25th International Conference on Machine Learning (pp. 1112–1119). New York, NY, USA: ACM DOI
- KrBo13: (2013) Spatial process generation.
*ArXiv:1308.0399 [Stat]*. - SäSH13: (2013) Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.
*IEEE Signal Processing Magazine*, 30(4), 51–61. DOI - TuDR10: (2010) State-Space Inference and Learning with Gaussian Processes. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 868–875).
- DLGT13: (2013) Structure Discovery in Nonparametric Regression through Compositional Kernel Search. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1166–1174).
- WDLX15: (2015) The Human Kernel.
*ArXiv:1510.07389 [Cs, Stat]*. - WiSe01: (2001) Using the Nyström Method to Speed Up Kernel Machines. In Advances in Neural Information Processing Systems (pp. 682–688).
- DaTL11: (2011) Variational Gaussian Process Dynamical Systems. In Advances in Neural Information Processing Systems 24 (pp. 2510–2518). Curran Associates, Inc.
- FrCR14: (2014) Variational Gaussian Process State-Space Models. In Advances in Neural Information Processing Systems 27 (pp. 3680–3688). Curran Associates, Inc.
- GaWi14: (2014) Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial.
*ArXiv:1402.1412 [Stat]*. - Tits09a: (2009a) Variational learning of inducing variables in sparse Gaussian processes. In International Conference on Artificial Intelligence and Statistics (pp. 567–574).
- Tits09b: (2009b) Variational model selection for sparse Gaussian process regression: TEchical Supplement. Technical report, School of Computer Science, University of Manchester