The Living Thing / Notebooks : Gaussian process regression and classification

Gaussian processes” are processes with Gaussian conditional distributions, like Brownian motions and suchlike. Very prominent in, e.g. spatial statistics where they are used for kriging etc.

However, when you see it capitalised it seems to means some specific emphasis, on the use of these processes for regression. Is a nonparametric method with a conveniently Bayesian interpretation?

I feel this is not too complex but I’ve never looked in to it. They work well with kernel methods to do machine learning stuff, apparently. The details of this are still hazy to me, and they aren’t currently on the correct side of the hype curve for me to dive in.

Gaussianprocess.org:

This web site aims to provide an overview of resources concerned with probabilistic modeling, inference and learning based on Gaussian processes. Although Gaussian processes have a long history in the field of statistics, they seem to have been employed extensively only in niche areas. With the advent of kernel machines in the machine learning community, models based on Gaussian processes have become commonplace for problems of regression (kriging) and classification as well as a host of more specialized applications.

The current scikit-learn has fancy gaussian processes, and introduction

Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems.

The advantages of Gaussian processes are:

The disadvantages of Gaussian processes include:

Questions:

Covariance models

Covariance estimation is weird. The Matérn stationary (and in the Euclidean case, isotropic) covariance function is one model for covariance. See Carl Edward Rasmussen’s Gaussian Process lecture notes for a readable explanation, or chapter 4 of his textbook (RaWi06).

Connection to Kalman filtering

Looks interesting. Without knowing enough about either to make an informed judgement, I imagine this makes the gaussian process regression soluble by marking it local, i.e. Markov, by augmenting it with hidden states, in the same way Kalman filtering does Wiener filtering. This would address at least some of the criticisms about sparsity etc.

See Simo Särkkä’s work for that. (HaSä10, SäHa12,SäSH13_, KaSä16)

Readings

This lecture by the late David Mackay is probably good; the man could talk.

Refs

Abra97
Abrahamsen, P. (1997) A review of Gaussian random fields and correlation functions.
AlSH04
Altun, Y., Smola, A. J., & Hofmann, T. (2004) Exponential Families for Conditional Random Fields. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 2–9). Arlington, Virginia, United States: AUAI Press
CBMF16
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2016) Practical Learning of Deep Gaussian Processes via Random Fourier Features. arXiv:1610.04386 [Stat].
DLGT13
Duvenaud, D., Lloyd, J., Grosse, R., Tenenbaum, J., & Zoubin, G. (2013) Structure Discovery in Nonparametric Regression through Compositional Kernel Search. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1166–1174).
Ebde15
Ebden, M. (2015) Gaussian Processes: A Quick Introduction. arXiv:1505.02965 [Math, Stat].
GaWi14
Gal, Y., & van der Wilk, M. (2014) Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial. arXiv:1402.1412 [Stat].
GSFT12
Grosse, R., Salakhutdinov, R. R., Freeman, W. T., & Tenenbaum, J. B.(2012) Exploiting compositionality to explore a large space of model structures. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
HaSä10
Hartikainen, J., & Särkkä, S. (2010) Kalman filtering and smoothing solutions to temporal Gaussian process regression models. In 2010 IEEE International Workshop on Machine Learning for Signal Processing (pp. 379–384). DOI.
Jord99
Jordan, M. I.(1999) Learning in graphical models. . Cambridge, Mass.: MIT Press
KaSä16
Karvonen, T., & Särkkä, S. (2016) Approximate state-space Gaussian processes via spectral transformation.
LaSH03
Lawrence, N., Seeger, M., & Herbrich, R. (2003) Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (pp. 609–616).
LDGT14
Lloyd, J. R., Duvenaud, D., Grosse, R., Tenenbaum, J. B., & Ghahramani, Z. (2014) Automatic Construction and Natural-Language Description of Nonparametric Regression Models. arXiv:1402.4304 [Cs, Stat].
Mack98
MacKay, D. J. C.(1998) Introduction to Gaussian processes. NATO ASI Series. Series F: Computer and System Sciences, 133–165.
Mack02
MacKay, D. J. C.(2002) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
MWNF16
Matthews, A. G. de G., van der Wilk, M., Nickson, T., Fujii, K., Boukouvalas, A., León-Villagrá, P., … Hensman, J. (2016) GPflow: A Gaussian process library using TensorFlow. arXiv:1610.08733 [Stat].
QuRa05
Quiñonero-Candela, J., & Rasmussen, C. E.(2005) A Unifying View of Sparse Approximate Gaussian Process Regression. Journal of Machine Learning Research, 6, 1939–1959.
RaKa17
Raissi, M., & Karniadakis, G. E.(2017) Machine Learning of Linear Differential Equations using Gaussian Processes. arXiv:1701.02440 [Cs, Math, Stat].
RaWi06
Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
Särk13
Särkkä, S. (2013) Bayesian filtering and smoothing. . Cambridge, U.K.; New York: Cambridge University Press
SäHa12
Särkkä, S., & Hartikainen, J. (2012) Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression. In Journal of Machine Learning Research.
SäSH13
Särkkä, S., Solin, A., & Hartikainen, J. (2013) Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering. IEEE Signal Processing Magazine, 30(4), 51–61. DOI.
SnGh05
Snelson, E., & Ghahramani, Z. (2005) Sparse Gaussian processes using pseudo-inputs. In Advances in neural information processing systems (pp. 1257–1264).
WaSC06
Walder, C., Schölkopf, B., & Chapelle, O. (2006) Implicit Surface Modelling with a Globally Regularised Basis of Compact Support. Computer Graphics Forum, 25(3), 635–644. DOI.
WaKS08
Walder, Christian, Kim, K. I., & Schölkopf, B. (2008) Sparse Multiscale Gaussian Process Regression. In Proceedings of the 25th International Conference on Machine Learning (pp. 1112–1119). New York, NY, USA: ACM DOI.
WiAd13
Wilson, A. G., & Adams, R. P.(2013) Gaussian Process Kernels for Pattern Discovery and Extrapolation. arXiv:1302.4245 [Cs, Stat].
WDLX15
Wilson, A. G., Dann, C., Lucas, C. G., & Xing, E. P.(2015) The Human Kernel. arXiv:1510.07389 [Cs, Stat].