The Living Thing / Notebooks :

Gaussian process regression and classification

“Gaussian processes” are processes with Gaussian marginal distributions, like Brownian motions and suchlike. Very prominent in, e.g. spatial statistics.

GP regression Chi Feng‘s amazing GP

When you see it capitalised it tends to means a specific emphasis, on the use of these processes for regression, as nonparametric method with a conveniently Bayesian interpretation. The basic trick is using covariance estimation and/or Gaussian process simulation on some clever Hilbert space to do functional regression.

I feel this is not too complex but I’ve not looked deeply into it. They reputedly work well with kernel methods to do machine learning stuff, apparently. The details of this are still hazy to me, and they aren’t currently on the correct side of the hype curve for me to dive in.

This web site aims to provide an overview of resources concerned with probabilistic modeling, inference and learning based on Gaussian processes. Although Gaussian processes have a long history in the field of statistics, they seem to have been employed extensively only in niche areas. With the advent of kernel machines in the machine learning community, models based on Gaussian processes have become commonplace for problems of regression (kriging) and classification as well as a host of more specialized applications.

I’ve not been very enthusiastic about these in the past for the reason of it not being worth it. It’s nice to have a principle nonparametric Bayesian formalism, but it’s pointless having a formalism that is so computationally demanding that people don’t try to use more than a thousand datapoints.

However, perhaps I should be persuaded by AutoGP (BoKD16) which breaks a lot of the awful computational deadlocks by clever use of inducing variables and variational approximation to produce a compressed representation of the data with tractable inference and model selection, including kernel selection, and doing the whole thing in many dimensions simultaneously.


The current scikit-learn has semi-fancy gaussian processes, and an introduction.

Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems.

The advantages of Gaussian processes are:

The disadvantages of Gaussian processes include:

Is that last point strictly true? Surely an appropriate kernel could ameliorate the dimensionality problem?

There are even fancier gaussian processes. Chris Fonnesbeck mentions GPflow, autogp, PyMC3, and the scikit-learn implementation. Plus I notice skgmm is a fancified version of the scikit-learn one. So… It’s easy enough to be bikeshedded is the message I’m getting here. George is another python GP regression that claims to handle big data at the cost of lots of c++.



a.k.a. covariance models.

GP models are the meeting of Covariance estimation and kernel machines.


The Matérn stationary (and in the Euclidean case, isotropic) covariance function is one model for covariance. See Carl Edward Rasmussen’s Gaussian Process lecture notes for a readable explanation, or chapter 4 of his textbook (RaWi06).



Approximation with state filtering

Looks interesting. Without knowing enough about either to make an informed judgement, I imagine this makes the Gaussian process regression soluble by making it local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This would address at least some of the criticisms about sparsity etc.

See Simo Särkkä’s work for that. (HaSä10, SäHa12, SäSH13, KaSä16)

Approximation with variational inference


Approximation with inducing variables


Approximation with variational inference and inducing variables

This is the trick that makes AutoGP work. (BoKD16). TBD.

Random projection kernel approximation

For now see Kernel approximation.


This lecture by the late David Mackay is probably good; the man could talk.