The Living Thing / Notebooks : (Convolution) kernel function estimation

A nonparametric method of approximating something from data by assuming that it’s “close” to the data distribution convolved with some kernel.

This is especially popular the target is a probability density function; Then you are working with a kernel density estimator. But you can estimate anything.

To learn about:

Estimating Densities

What you are usually doing with this tool.

Bandwidth/kernel selection in density estimation

Bernacchia (BePi11) has a neat hack: “self consistency” for simultaneous kernel and distribution inference, i.e. simultaneous deconvolution and bandwidth selection. The idea is removing bias by using simple spectral methods, thereby estimating a kernel which in a certain sense would generate the data that you just observed. The results look similar to finite-sample corrections for Gaussian scale parameter estimates, but are not quite Gaussian.

Question: could it work with mixture models too?

Mixture models

Where the number of kernels does not grow as fast as the number of data points, this becomes a mixture model; Or, if you’d like, kernel density estimates are a limiting case of mixture model estimates.

They are so clearly similar that I think it best we not make them both feel awkward by dithering about where the free parameters are. Anyway, they are filed separately. BaLi13, ZeMe97 and Geer96 discuss some useful things common to various convex combination estimators.

Does this work with uncertain point locations?

The fact we can write the kernel density estimate as an integral with a convolution of Dirac deltas immediately suggests that we could write it as a convolution of something else, such as Gaussians. Can we recover well-behaved estimates in that case? This would be a kind of hierarchical model, possibly a typical Bayesian one.

Does this work with asymmetric kernels?

Almost all the kernel estimates I’ve seen require KDEs to be symmetric, because of Cline’s argument that asymmetric kernels are inadmissible in the class of all (possibly multivariate) densities. Presumably this implies \(\mathcal(C)_1\) distributions, i.e. once-differentiable ones. In particular admissible kernels are those which have “nonnegative Fourier transforms bounded by 1”, which implies symmetry about the axis. If we have an a priori constrained class of densities, this might not apply.

Estimating general functions

i.e. not densities. Keyword: Nadarya-Watson estimator.

For regression this is generally not incredibly popular, but sometimes it is what you need - e.g. conditional intensity point processes, some DSP resampling problems. You can try more sophisticated things like loess or smoothing splines. I won’t stop you. Try using the phrase Sobolev space while you do so, you’ll feel ever so clever. Regardles, at this point it becomes a spline basis method, so read on over there for some extra details.

Connection to rate estimation

The Ogata special case of the Lamperti transform gives a suggestive connection to rate estimation as a kind of inverse-method kernel estimation. See change-of-time

Fast Gauss Transform and Fast multipole methods

How to make these methods computationally feasible at scale. See Fast Gauss Transform and other related fast multipole methods.


Bashtannyk, D. M., & Hyndman, R. J.(2001) Bandwidth selection for kernel conditional density estimation. Computational Statistics & Data Analysis, 36(3), 279–298. DOI.
Battey, H., & Liu, H. (2013) Smooth projected density estimation. arXiv:1308.3968 [Stat].
Bernacchia, A., & Pigolotti, S. (2011) Self-consistent method for density estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 407–422.
Botev, Z. I., Grotowski, J. F., & Kroese, D. P.(2010) Kernel density estimation via diffusion. The Annals of Statistics, 38(5), 2916–2957. DOI.
Cline, D. B. H.(1988) Admissible Kernel Estimators of a Multivariate Density. The Annals of Statistics, 16(4), 1421–1427. DOI.
Crisan, D., & Míguez, J. (2014) Particle-kernel estimation of the filter density in state-space models. Bernoulli, 20(4), 1879–1929. DOI.
Doosti, H., & Hall, P. (2015) Making a non-parametric density estimator more attractive, and more accurate, by data perturbation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(2), 445–462. DOI.
Ellis, S. P.(1991) Density estimation for point processes. Stochastic Processes and Their Applications, 39(2), 345–358. DOI.
Geenens, G. (2014) Probit Transformation for Kernel Density Estimation on the Unit Interval. Journal of the American Statistical Association, 109(505), 346–358. DOI.
Gisbert, F. J. G.(2003) Weighted samples, kernel density estimators and convergence. Empirical Economics, 28(2), 335–351. DOI.
Hall, P. (1987) On Kullback-Leibler Loss and Density Estimation. The Annals of Statistics, 15(4), 1491–1519. DOI.
Hall, P., & Park, B. U.(2002) New Methods for Bias Correction at Endpoints and Boundaries. The Annals of Statistics, 30(5), 1460–1479. DOI.
Ibragimov, I. (2001) Estimation of analytic functions. In Institute of Mathematical Statistics Lecture Notes - Monograph Series (pp. 359–383). Beachwood, OH: Institute of Mathematical Statistics
Koenker, R., & Mizera, I. (2006) Density estimation by total variation regularization. Advances in Statistical Modeling and Inference, 613–634.
Malec, P., & Schienle, M. (2014) Nonparametric kernel density estimation near the boundary. Computational Statistics & Data Analysis, 72, 57–76. DOI.
Marshall, J. C., & Hazelton, M. L.(2010) Boundary kernels for adaptive density estimators on regions with irregular boundaries. Journal of Multivariate Analysis, 101(4), 949–963. DOI.
O’Brien, T. A., Kashinath, K., Cavanaugh, N. R., Collins, W. D., & O’Brien, J. P.(2016) A fast and objective multidimensional kernel density estimation method: fastKDE. Computational Statistics & Data Analysis, 101, 148–160. DOI.
Panaretos, V. M., & Konis, K. (2012) Nonparametric Construction of Multivariate Kernels. Journal of the American Statistical Association, 107(499), 1085–1095. DOI.
Park, B. U., Jeong, S.-O., Jones, M. C., & Kang, K.-H. (2003) Adaptive variable location kernel density estimators with good performance at boundaries. Journal of Nonparametric Statistics, 15(1), 61–75. DOI.
Silverman, B. W.(1982) On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method. The Annals of Statistics, 10(3), 795–810. DOI.
Smith, E., & Lewicki, M. S.(2005) Efficient Coding of Time-Relative Structure Using Spikes. Neural Computation, 17(1), 19–45. DOI.
van de Geer, S. (1996) Rates of convergence for the maximum likelihood estimator in mixture models. Journal of Nonparametric Statistics, 6(4), 293–310. DOI.
Wang, B., & Wang, X. (2007) Bandwidth selection for weighted kernel density estimation. arXiv Preprint arXiv:0709.1616.
Wen, K., & Wu, X. (2015) An Improved Transformation-Based Kernel Estimator of Densities on the Unit Interval. Journal of the American Statistical Association, 110(510), 773–783. DOI.
Zeevi, A. J., & Meir, R. (1997) Density Estimation Through Convex Combinations of Densities: Approximation and Estimation Bounds. Neural Networks: The Official Journal of the International Neural Network Society, 10(1), 99–109. DOI.
Zhang, S., & Karunamuni, R. J.(2010) Boundary performance of the beta kernel estimators. Journal of Nonparametric Statistics, 22(1), 81–104. DOI.