Kernel in the sense of the “kernel trick”.
Not to be confused with density-estimation-type
convolution kernels,
nor the dozens of related-but-slightly-different
clashing definitions of *kernel*;
they can have their respective own pages.

Kernel tricks use not-necessarily-Euclidean “reproducing” kernels, aka Mercer kernels (Merc09) to implicitly define convenient Hilbert “feature” spaces for your purpose. Alternatively, you might like to make your Hilbert space basis explicit by doing a basis transform, or by taking your implicit feature map and approximating it but you don't need to. In fact, the implicit space induced by your reproducing kernels might (in general will) look odd indeed, something with no finite dimensional representation. That's the “trick” part.

TODO: clear explanation, less blather. Until then, see ScSm02, which is a very well-written textbook covering an unbelievable amount of ground without pausing to make a fuss, or MaAm15, which is more narrowly focussed on just the Mercer-kernel part, or ChLi09 for an approximation-theory perspective.

Spoiler: you upgrade your old boring linear algebra on finite (usually low-) dimensional spaces to sexy new curvy algebra on potentially-infinite-dimensional manifolds, which still has a low-dimensional representation. Or, if you'd like, you apply statistical learning methods based on things with an obvious finite vector space representation () to things without one (Sentences, piano-rolls, ).

With smallish data, you have lots of sexy guarantees and clever models. Practically, kernel methods have problems with scalability to large data sets. Problems, for example, that even afflict little old me with a mere data points. Since the Gram matrix of inner products does not in general admit an accurate representation in less than dimensions, or inversion in less than , you basically can't handle big data.

OTOH, see the *inducing set* methods and the
random-projection inversions
which make this in-principle more tractable for,
e.g. Gaussian process learning.

The oft-cited grandfather of all the reproducing kernel stuff is Aronszajn's 1950 work (Aron50) -- see also Mercer (Merc09) -- although this didn't percolate into machine-learning for decades.

I'm especially interested in

- Nonparametric kernel independence tests
- efficient kernel pre-image approximation.
- connection between kernel PCA and clustering (SKSB98 and Will01)
- kernel regression with rbfs
- kernel layers in neural networks

## Kernel design

Automating kernel design has some weird hacks. See the Automated statistician project by David Duvenaud, James Robert Lloyd, Roger Grosse and colleagues. Also AutoGP approaches by Krauth and Bonilla et al (KBCF16) use gradient descent to design kernels for gaussian processes. For traditionalists, one of the Automated Statisticians has written a page on doing kernel design by hand. See the GSFT12 for a mind-melting compositional kernel diagram.

Grosse, Salakhutdinov, Freeman and Joshua B. Tenenbaum, (GSFT12):

Examples of existing machine learning models which fall under our framework. Arrows represent models reachable using a single production rule. Only a small fraction of the 2496 models reachable within 3 steps are shown, and not all possible arrows are shown.”

Alex Smola (who with, Bernhard Schölkopf) has his name on a terrifying proportion of publications in this area, also has all his publications online.

## Non-scalar kernels

Operator-valued kernels, Micchelli & Pontil 2005, generalises as seen in multi-task learning.

## Kernel approximation

See kernel approximation.

## RKHS distribution embedding

See that page.

## Refs

- KiWa70: George S. Kimeldorf, Grace Wahba (1970) A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines.
*The Annals of Mathematical Statistics*, 41(2), 495–502. DOI - ChLi09: Elliott Ward Cheney, William Allan Light (2009)
*A Course in Approximation Theory*. American Mathematical Soc. - ScHS01: Bernhard Schölkopf, Ralf Herbrich, Alex J. Smola (2001) A Generalized Representer Theorem. In Computational Learning Theory (pp. 416–426). Springer Berlin Heidelberg DOI
- GFTS08: Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, Alexander J Smola (2008) A Kernel Statistical Test of Independence. In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference. Cambridge, MA: MIT Press
- Kail71a: T. Kailath (1971a) A note on least-squares estimation by the innovations method. In 1971 IEEE Conference on Decision and Control (pp. 407–411). DOI
- VeTS04: Jean-Philippe Vert, Koji Tsuda, Bernhard Schölkopf (2004) A primer on kernel methods. In Kernel Methods in Computational Biology. MIT Press
- MaAm15: Jonathan H. Manton, Pierre-Olivier Amblard (2015) A Primer on Reproducing Kernel Hilbert Spaces.
*Foundations and Trends® in Signal Processing*, 8(1–2), 1–126. DOI - DeGL96: Luc Devroye, László Györfi, Gábor Lugosi (1996)
*A probabilistic theory of pattern recognition*. New York: Springer - XPPP08: Jian-Wu Xu, A.R.C. Paiva, Il Park, J.C. Principe (2008) A Reproducing Kernel Hilbert Space Framework for Information-Theoretic Learning.
*IEEE Transactions on Signal Processing*, 56(12), 5891–5902. DOI - LLHE08: Zhengdong Lu, Todd K. Leen, Yonghong Huang, Deniz Erdogmus (2008) A Reproducing Kernel Hilbert Space Framework for Pairwise Time Series Distances. In Proceedings of the 25th International Conference on Machine Learning (pp. 624–631). New York, NY, USA: ACM DOI
- ScSm03: Bernhard Schölkopf, Alexander J. Smola (2003) A Short Introduction to Learning with Kernels. In Advanced Lectures on Machine Learning (pp. 41–64). Springer Berlin Heidelberg DOI
- WeSi78: H. Weinert, G. Sidhu (1978) A stochastic framework for recursive computation of spline functions–Part I: Interpolating splines.
*IEEE Transactions on Information Theory*, 24(1), 45–50. DOI - SmSc04: Alex J. Smola, Bernhard Schölkopf (2004) A tutorial on support vector regression.
*Statistics and Computing*, 14(3), 199–222. DOI - KlRB10: Marius Kloft, Ulrich Rückert, Peter L. Bartlett (2010) A Unifying View of Multiple Kernel Learning. In Machine Learning and Knowledge Discovery in Databases (pp. 66–81). Springer Berlin Heidelberg DOI
- Kail74: T. Kailath (1974) A view of three decades of linear filtering theory.
*IEEE Transactions on Information Theory*, 20(2), 146–181. DOI - KaGe71: T. Kailath, R. Geesey (1971) An innovations approach to least squares estimation–Part IV: Recursive estimation given lumped covariance functions.
*IEEE Transactions on Automatic Control*, 16(6), 720–727. DOI - KaGe73: T. Kailath, R. Geesey (1973) An innovations approach to least-squares estimation–Part V: Innovations representations and recursive estimation in colored noise.
*IEEE Transactions on Automatic Control*, 18(5), 435–453. DOI - GeKa73: M. Gevers, T. Kailath (1973) An innovations approach to least-squares estimation–Part VI: Discrete-time innovations representations and recursive estimation.
*IEEE Transactions on Automatic Control*, 18(6), 588–600. DOI - AaKa73: H. Aasnaes, T. Kailath (1973) An innovations approach to least-squares estimation–Part VII: Some applications of vector autoregressive-moving average models.
*IEEE Transactions on Automatic Control*, 18(6), 601–607. DOI - MMRT01: K. Muller, S. Mika, G. Ratsch, K. Tsuda, Bernhard Scholkopf (2001) An introduction to kernel-based learning algorithms.
*IEEE Transactions on Neural Networks*, 12(2), 181–201. DOI - KaDu72: T. Kailath, D. Duttweiler (1972) An RKHS approach to detection and estimation problems– III: Generalized innovations representations and a likelihood-ratio formula.
*IEEE Transactions on Information Theory*, 18(6), 730–745. DOI - KaWe75: T. Kailath, H. Weinert (1975) An RKHS approach to detection and estimation problems–II: Gaussian signal detection.
*IEEE Transactions on Information Theory*, 21(1), 15–23. DOI - Jung13: Alexander Jung (2013) An RKHS Approach to Estimation with Sparsity Constraints. In Advances in Neural Information Processing Systems 29.
- McEl11: Brian McFee, Daniel PW Ellis (2011) Analyzing song structure with spectral clustering. In IEEE conference on Computer Vision and Pattern Recognition (CVPR).
- KBCF16: Karl Krauth, Edwin V. Bonilla, Kurt Cutajar, Maurizio Filippone (2016) AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models. In UAI17.
- LDGT14: James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani (2014) Automatic Construction and Natural-Language Description of Nonparametric Regression Models.
*ArXiv:1402.4304 [Cs, Stat]*. - LjKa76: L. Ljung, T. Kailath (1976) Backwards Markovian models for second-order stochastic processes (Corresp).
*IEEE Transactions on Information Theory*, 22(4), 488–491. DOI - SzRi09: Gábor J. Székely, Maria L. Rizzo (2009) Brownian distance covariance.
*The Annals of Applied Statistics*, 3(4), 1236–1265. DOI - YCSS13: Yaoliang Yu, Hao Cheng, Dale Schuurmans, Csaba Szepesvári (2013) Characterizing the representer theorem. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 570–578).
- Gent02: Marc G. Genton (2002) Classes of Kernels for Machine Learning: A Statistics Perspective.
*Journal of Machine Learning Research*, 2, 299–312. - SMFP15: Bernhard Schölkopf, Krikamol Muandet, Kenji Fukumizu, Jonas Peters (2015) Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations.
*ArXiv:1501.06794 [Cs, Stat]*. - CoDu02: Michael Collins, Nigel Duffy (2002) Convolution Kernels for Natural Language. In Advances in Neural Information Processing Systems 14 (pp. 625–632). MIT Press
- Haus99: David Haussler (1999) Convolution kernels on discrete structures. Technical report, UC Santa Cruz
- LNCS16: David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou (2016) Discovering Causal Signals in Images.
*ArXiv:1605.08179 [Cs, Stat]*. - MaBe17: Siyuan Ma, Mikhail Belkin (2017) Diving into the shallows: a computational perspective on large-scale shallow learning.
*ArXiv:1703.10622 [Cs, Stat]*. - VeZi12: A. Vedaldi, A. Zisserman (2012) Efficient Additive Kernels via Explicit Feature Maps.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 34(3), 480–492. DOI - YaDD04: Changjiang Yang, Ramani Duraiswami, Larry S. Davis (2004) Efficient kernel machines using the improved fast Gauss transform. In Advances in neural information processing systems (pp. 1561–1568).
- GSFT12: Roger Grosse, Ruslan R. Salakhutdinov, William T. Freeman, Joshua B. Tenenbaum (2012) Exploiting compositionality to explore a large space of model structures. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
- Bach00: Francis Bach (n.d.) Exploring large feature spaces with hierarchical multiple kernel learning. In In Advances in Neural Information Processing Systems (NIPS (p. 2008).
- AlSH04: Yasemin Altun, Alex J. Smola, Thomas Hofmann (2004) Exponential Families for Conditional Random Fields. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 2–9). Arlington, Virginia, United States: AUAI Press
- SKSB98: Bernhard Schölkopf, Phil Knirsch, Alex Smola, Chris Burges (1998) Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces. In Mustererkennung 1998 (pp. 125–132). Springer Berlin Heidelberg DOI
- CuSS08: John P. Cunningham, Krishna V. Shenoy, Maneesh Sahani (2008) Fast Gaussian process methods for point process intensity estimation. (pp. 192–199). ACM Press DOI
- AlMa14: Ahmed El Alaoui, Michael W. Mahoney (2014) Fast Randomized Kernel Methods With Statistical Guarantees.
*ArXiv:1411.0306 [Cs, Stat]*. - LaSH03: Neil Lawrence, Matthias Seeger, Ralf Herbrich (2003) Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (pp. 609–616).
- Merc09: J. Mercer (1909) Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations.
*Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character*, 209(441–458), 415–446. DOI - WiAd13: Andrew Gordon Wilson, Ryan Prescott Adams (2013) Gaussian Process Kernels for Pattern Discovery and Extrapolation.
*ArXiv:1302.4245 [Cs, Stat]*. - Burg98: C. J. C. Burges (1998) Geometry and Invariance in Kernel Based Methods. In Advances in Kernel Methods - Support Vector Learning. Cambridge, MA: MIT Press
- VSKB10: S. V. N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt (2010) Graph Kernels.
*Journal of Machine Learning Research*, 11, 1201–1242. - FLRP13: Derry FitzGerald, Antoine Liukus, Zafar Rafii, Bryan Pardo, Laurent Daudet (2013) Harmonic/percussive separation using kernel additive modelling. In Irish Signals & Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT 2014). 25th IET (pp. 35–40). IET
- WaSC06: C. Walder, B. Schölkopf, O. Chapelle (2006) Implicit Surface Modelling with a Globally Regularised Basis of Compact Support.
*Computer Graphics Forum*, 25(3), 635–644. DOI - YDGD03: Changjiang Yang, Ramani Duraiswami, Nail A. Gumerov, Larry Davis (2003) Improved Fast Gauss Transform and Efficient Kernel Density Estimation. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2 (pp. 464–). Washington, DC, USA: IEEE Computer Society DOI
- SGFL08: B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, B. Schölkopf (2008) Injective Hilbert Space Embeddings of Probability Measures. In Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008).
- SMBK99: Bernhard Schölkopf, Sebastian Mika, Chris J. C. Burges, Philipp Knirsch, Klaus-Robert Müller, Gunnar Rätsch, Alexander J. Smola (1999) Input Space Versus Feature Space in Kernel-Based Methods.
*IEEE Transactions on Neural Networks*, 10, 1000–1017. - MFSS17: Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Schölkopf (2017) Kernel Mean Embedding of Distributions: A Review and Beyond.
*Foundations and Trends® in Machine Learning*, 10(1–2), 1–141. DOI - MFSG14: Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, Bernhard Schölkopf (2014) Kernel Mean Shrinkage Estimators.
*ArXiv:1405.5505 [Cs, Stat]*. - KoCM08: Leonid (Aryeh) Kontorovich, Corinna Cortes, Mehryar Mohri (2008) Kernel methods for learning languages.
*Theoretical Computer Science*, 405(3), 223–236. DOI - HoSS08: Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola (2008) Kernel methods in machine learning.
*The Annals of Statistics*, 36(3), 1171–1220. DOI - ScSM97: Bernhard Schölkopf, Alexander Smola, Klaus-Robert Müller (1997) Kernel principal component analysis. In Artificial Neural Networks — ICANN’97 (pp. 583–588). Springer Berlin Heidelberg DOI
- LRPF14: Antoine Liutkus, Zafar Rafii, Bryan Pardo, Derry Fitzgerald, Laurent Daudet (2014) Kernel spectrogram models for source separation. (pp. 6–10). IEEE DOI
- ZPJS12: Kun Zhang, Jonas Peters, Dominik Janzing, Bernhard Schölkopf (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery.
*ArXiv:1202.3775 [Cs, Stat]*. - DaFG14: Somayeh Danafar, Kenji Fukumizu, Faustino Gomez (2014) Kernel-based Information Criterion.
*ArXiv:1408.5810 [Stat]*. - ClFW06: Alexander Clark, Christophe Costa Florêncio, Chris Watkins (2006) Languages as Hyperplanes: Grammatical Inference with String Kernels. In Machine Learning: ECML 2006 (pp. 90–101). Springer Berlin Heidelberg
- ZFGS16: Qinyi Zhang, Sarah Filippi, Arthur Gretton, Dino Sejdinovic (2016) Large-Scale Kernel Methods for Independence Testing.
*ArXiv:1606.07892 [Stat]*. - GlLi16: Amir Globerson, Roi Livni (2016) Learning Infinite-Layer Networks: Beyond the Kernel Trick.
*ArXiv:1606.05316 [Cs]*. - KoCM06: Leonid Kontorovich, Corinna Cortes, Mehryar Mohri (2006) Learning Linearly Separable Languages. In Algorithmic Learning Theory (pp. 288–303). Springer Berlin Heidelberg
- EvMP05: Theodoros Evgeniou, Charles A. Micchelli, Massimiliano Pontil (2005) Learning Multiple Tasks with Kernel Methods.
*Journal of Machine Learning Research*, 6(Apr), 615–637. - HeBu14: Markus Heinonen, Florence d’Alché-Buc (2014) Learning nonparametric differential equations with operator-valued kernels and gradient matching.
*ArXiv:1411.5172 [Cs, Stat]*. - MiPo05a: Charles A. Micchelli, Massimiliano Pontil (2005a) Learning the Kernel Function via Regularization.
*Journal of Machine Learning Research*, 6(Jul), 1099–1125. - BaZT04: Gökhan H. Bakır, Alexander Zien, Koji Tsuda (2004) Learning to Find Graph Pre-images. In Pattern Recognition (pp. 253–261). Springer Berlin Heidelberg
- ZhZS13: Ke Zhou, Hongyuan Zha, Le Song (2013) Learning triggering kernels for multi-dimensional Hawkes processes. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1301–1309).
- ScSm02: Bernhard Schölkopf, Alexander J. Smola (2002)
*Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond*. MIT Press - WuZh08: Qiang Wu, Ding-Xuan Zhou (2008) Learning with sample dependent hypothesis spaces.
*Computers & Mathematics with Applications*, 56(11), 2896–2907. DOI - SzRB07: Gábor J. Székely, Maria L. Rizzo, Nail K. Bakirov (2007) Measuring and testing dependence by correlation of distances.
*The Annals of Statistics*, 35(6), 2769–2794. DOI - PoGi90: T. Poggio, F. Girosi (1990) Networks for approximation and learning.
*Proceedings of the IEEE*, 78(9), 1481–1497. DOI - SeDK75: A. Segall, M. Davis, T. Kailath (1975) Nonlinear filtering with counting observations.
*IEEE Transactions on Information Theory*, 21(2), 143–149. DOI - ShBG16: Yanning Shen, Brian Baingana, Georgios B. Giannakis (2016) Nonlinear Structural Vector Autoregressive Models for Inferring Effective Brain Network Connectivity.
*ArXiv:1610.06551 [Stat]*. - YLMJ12: Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou (2012) Nyström method vs random fourier features: A theoretical and empirical comparison. In Advances in neural information processing systems (pp. 476–484).
- Will01: Christopher K. I. Williams (2001) On a Connection between Kernel PCA and Metric Multidimensional Scaling. In Advances in Neural Information Processing Systems 13 (Vol. 46, pp. 675–681). MIT Press DOI
- SmSc98: A. J. Smola, B. Schölkopf (1998) On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion.
*Algorithmica*, 22(1–2), 211–231. DOI - MiPo05b: Charles A. Micchelli, Massimiliano Pontil (2005b) On Learning Vector-Valued Functions.
*Neural Computation*, 17(1), 177–204. DOI - Bach15: Francis Bach (2015) On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions.
*ArXiv Preprint ArXiv:1502.06800*. - BaIS17: Arturs Backurs, Piotr Indyk, Ludwig Schmidt (2017) On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks.
*ArXiv:1704.02958 [Cs, Stat]*. - CuSm02: Felipe Cucker, Steve Smale (2002) On the mathematical foundations of learning.
*Bulletin of the American Mathematical Society*, 39(1), 1–49. DOI - DrMa05: Petros Drineas, Michael W. Mahoney (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning.
*Journal of Machine Learning Research*, 6, 2153–2175. - SeKa76: A. Segall, T. Kailath (1976) Orthogonal functionals of independent-increment processes.
*IEEE Transactions on Information Theory*, 22(3), 287–298. DOI - KWSR16: Alec Koppel, Garrett Warnell, Ethan Stump, Alejandro Ribeiro (2016) Parsimonious Online Learning with Kernels via Sparse Projections in Function Space.
*ArXiv:1612.04111 [Cs, Stat]*. - CFWS06: Alexander Clark, Christophe Costa Florêncio, Chris Watkins, Mariette Serayet (2006) Planar Languages and Learnability. In Grammatical Inference: Algorithms and Applications (pp. 148–160). Springer Berlin Heidelberg
- FlTS16: Seth Flaxman, Yee Whye Teh, Dino Sejdinovic (2016) Poisson intensity estimation with reproducing kernels.
*ArXiv:1610.08623 [Stat]*. - YSAM14: Jiyan Yang, Vikas Sindhwani, Haim Avron, Michael Mahoney (2014) Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels.
*ArXiv:1412.8293 [Cs, Math, Stat]*. - RaRe07: Ali Rahimi, Benjamin Recht (2007) Random features for large-scale kernel machines. In Advances in neural information processing systems (pp. 1177–1184). Curran Associates, Inc.
- CoHM04: Corinna Cortes, Patrick Haffner, Mehryar Mohri (2004) Rational Kernels: Theory and Algorithms.
*Journal of Machine Learning Research*, 5, 1035–1062. - KaFu00: Motonobu Kanagawa, Kenji Fukumizu (n.d.) Recovering Distributions from Gaussian RKHS Embeddings. In Journal of Machine Learning Research.
- ChSi16: Krzysztof Choromanski, Vikas Sindhwani (2016) Recycling Randomness with Structure for Sublinear time Kernel Expansions.
*ArXiv:1605.09049 [Cs, Stat]*. - Kail71b: T. Kailath (1971b) RKHS approach to detection and estimation problems–I: Deterministic signals in Gaussian noise.
*IEEE Transactions on Information Theory*, 17(5), 530–549. DOI - DuKa73a: D. Duttweiler, T. Kailath (1973a) RKHS approach to detection and estimation problems–IV: Non-Gaussian detection.
*IEEE Transactions on Information Theory*, 19(1), 19–28. DOI - DuKa73b: D. Duttweiler, T. Kailath (1973b) RKHS approach to detection and estimation problems–V: Parameter estimation.
*IEEE Transactions on Information Theory*, 19(1), 29–37. DOI - LjKF75: L. Ljung, T. Kailath, B. Friedlander (1975) Scattering theory and linear least squares estimation: Part I: Continuous-time problems. In 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes (pp. 55–56). DOI
- FrKL75: B. Friedlander, T. Kailath, L. Ljung (1975) Scattering theory and linear least squares estimation: Part II: Discrete-time problems. In 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes (pp. 57–58). DOI
- Bach13: Francis R. Bach (2013) Sharp analysis of low-rank kernel matrix approximations. In COLT (Vol. 30, pp. 185–209).
- KeCh72: R. Kemerait, D. Childers (1972) Signal detection and extraction by cepstrum techniques.
*IEEE Transactions on Information Theory*, 18(6), 745–759. DOI - KBGP16: Nicolas Keriven, Anthony Bourrier, Rémi Gribonval, Patrick Pérez (2016) Sketching for Large-Scale Learning of Mixture Models. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6190–6194). DOI
- ClWa08: Alexander Clark, Chris Watkins (2008) Some Alternatives to Parikh Matrices Using String Kernels.
*Fundamenta Informaticae*, 84(3), 291–303. - KaGW72: T. Kailath, R. Geesey, H. Weinert (1972) Some relations among RKHS norms, Fredholm equations, and innovations representations.
*IEEE Transactions on Information Theory*, 18(3), 341–348. DOI - SnGh05: Edward Snelson, Zoubin Ghahramani (2005) Sparse Gaussian processes using pseudo-inputs. In Advances in neural information processing systems (pp. 1257–1264).
- SmSc00: Alex J. Smola, Bernhard Schölkopf (2000) Sparse greedy matrix approximation for machine learning.
- TiNh01: Michael E. Tipping, Cambridge Cb Nh (2001) Sparse Kernel Principal Component Analysis. In Advances in Neural Information Processing Systems 13 (pp. 633–639). MIT Press
- WaKS08: Christian Walder, Kwang In Kim, Bernhard Schölkopf (2008) Sparse Multiscale Gaussian Process Regression. In Proceedings of the 25th International Conference on Machine Learning (pp. 1112–1119). New York, NY, USA: ACM DOI
- Wein78: Howard L. Weinert (1978) Statistical methods in optimal curve fitting.
*Communications in Statistics - Simulation and Computation*, 7(4), 417–435. DOI - BrLi04: Lawrence D. Brown, Yi Lin (2004) Statistical properties of the method of regularization with periodic Gaussian reproducing kernel.
*The Annals of Statistics*, 32(4), 1723–1743. DOI - RaWe14: Aaditya Ramdas, Leila Wehbe (2014) Stein Shrinkage for Cross-Covariance Operators and Kernel Independence Testing.
*ArXiv:1406.1922 [Stat]*. - CaOC01: Rafael C. Carrasco, Jose Oncina, Jorge Calera-Rubio (2001) Stochastic Inference of Regular Tree Languages.
*Machine Learning*, 44(1–2), 185–197. DOI - WeKa74: Howard L. Weinert, Thomas Kailath (1974) Stochastic Interpretations and Recursive Algorithms for Spline Functions.
*The Annals of Statistics*, 2(4), 787–794. DOI - DLGT13: David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, Ghahramani Zoubin (2013) Structure Discovery in Nonparametric Regression through Compositional Kernel Search. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1166–1174).
- Jain09: Brijnesh J Jain (2009) Structure Spaces.
*Journal of Machine Learning Research*, 10. - BOSS08: Asa Ben-Hur, Cheng Soon Ong, Sören Sonnenburg, Bernhard Schölkopf, Gunnar Rätsch (2008) Support Vector Machines and Kernels for Computational Biology.
*PLoS Comput Biol*, 4(10), e1000173. DOI - LSSC02: Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, Chris Watkins (2002) Text Classification Using String Kernels.
*Journal of Machine Learning Research*, 2, 419–444. - SmSM98: Alex J. Smola, Bernhard Schölkopf, Klaus-Robert Müller (1998) The connection between regularization operators and support vector kernels.
*Neural Networks*, 11(4), 637–649. DOI - CLVZ11: Ken Chatfield, Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman (2011) The devil is in the details: an evaluation of recent feature encoding methods.
- WaST14: Yu-Xiang Wang, Alex Smola, Ryan J. Tibshirani (2014) The Falling Factorial Basis and Its Statistical Applications.
*ArXiv:1405.0558 [Stat]*. - GrSt91: L. Greengard, J. Strain (1991) The Fast Gauss Transform.
*SIAM Journal on Scientific and Statistical Computing*, 12(1), 79–94. DOI - WDLX15: Andrew Gordon Wilson, Christoph Dann, Christopher G. Lucas, Eric P. Xing (2015) The Human Kernel.
*ArXiv:1510.07389 [Cs, Stat]*. - RaDu05: Vikas C. Raykar, Ramani Duraiswami (2005)
*The improved fast Gauss transform with applications to machine learning* - Pill16: Gianluigi Pillonetto (2016) The interplay between system identification and machine learning.
*ArXiv:1612.09158 [Cs, Stat]*. - BLGR16: Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh (2016) The Mondrian Kernel.
*ArXiv:1606.05241 [Stat]*. - GrDa05: K. Grauman, T. Darrell (2005) The pyramid match kernel: discriminative classification with sets of image features. In Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005 (Vol. 2, pp. 1458-1465 Vol. 2). DOI
- Kail71: Thomas Kailath (1971) The Structure of Radon-Nikodym Derivatives with Respect to Wiener and Related Measures.
*The Annals of Mathematical Statistics*, 42(3), 1054–1067. - Aron50: N. Aronszajn (1950) Theory of Reproducing Kernels.
*Transactions of the American Mathematical Society*, 68(3), 337–404. DOI - RaRe09: Ali Rahimi, Benjamin Recht (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.