# (Reproducing) kernel tricks

Kernel in the sense of the “kernel trick”, which augments covariance estiamtion with clever distance measures. Not to be confused with density-estimation-type convolution kernels, nor the dozens of other clashing definitions of kernel; they can have their respective own pages.

Kernel machine use not-necessarily-Euclidean “reproducing” kernels, aka Mercer kernels (Merc09) to implicitly define convenient Hilbert “feature” spaces for your purpose. Alternatively, you might like to make your Hilbert space basis explicit by doing a basis transform, or by taking your implicit feature map and approximating it but you don’t need to. In fact, the implicit space induced by your reproducing kernels might (in general will) look odd indeed, something with no finite dimensional representation. That’s the “trick” part.

TODO: clear explanation, less blather. Until then, see ScSm02, which is a very well-written textbook covering an unbelievable amount of ground without pausing to make a fuss fuss, or MaAm14, which is more narrowly focussed on just the Mercer-kernel part, or ChLi09 from an approximation-theory perspective.

Spoiler: you upgrade your old boring linear algebra on finite (usually low-) dimensional spaces to sexy new curvy algebra on potentially-infinite-dimensional manifolds. Or, if you’d like, to apply statistical learning methods based on things with an obvious finite vector space representation ($\mathbb{R}^n$) to things without one (Sentences, piano-rolls, $\mathcal{C}^d_\ell$).

The other use that I’ve seen of the kernel trick is to show something clever about your learning method in terms of fancy kernel representations through formal mathematics. This one is nice to keep in mind, because practically, kernel methods have serious problems with scalability - problems that even afflict me with a mere $N\simeq 10^5$ data points, since the Gram matrix of inner products might not admit of any accurate representation in less than the $N(N-1)/2$ dimensions, or inversion in less than $\mathcal{O}(N^3).$

TODO: discuss covariance matrices, positive definite matrices, mention convenient nexus of SVMs, gaussian process learning and learning theory etc.

I’m especially interested in

1. Nonparametric kernel independence tests
2. efficient kernel pre-image approximation.
3. connection between kernel PCA and clustering (SKSB98 and Will01)
4. kernel regression with rbfs
5. kernel layers in neural networks

Automating kernel design has some weird hacks. See the Automated statistician project by David Duvenaud, James Robert Lloyd, Roger Grosse and colleagues. For traditionalists, one of the co-authors has written a page on doing kernel design by hand. See the GSFT12 for a mind-melting compositional kernel diagram.

Alex Smola (who with, Bernhard Schölkopf) has his name on a terrifying proportion of publications in this area, also has all his publications online.

The oft-cited grandfather of all the reproducing kernel stuff is Aronszajn’s 1950 work (Aron50) — see also Mercer (Merc09) — although this didn’t percolate into machine-learning for decades.

See that page.

## Refs

AaKa73
Aasnaes, H., & Kailath, T. (1973) An innovations approach to least-squares estimation–Part VII: Some applications of vector autoregressive-moving average models. IEEE Transactions on Automatic Control, 18(6), 601–607. DOI.
AlMa14
Alaoui, A. E., & Mahoney, M. W.(2014) Fast Randomized Kernel Methods With Statistical Guarantees. ArXiv:1411.0306 [Cs, Stat].
AlSH04
Altun, Y., Smola, A. J., & Hofmann, T. (2004) Exponential Families for Conditional Random Fields. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 2–9). Arlington, Virginia, United States: AUAI Press
Aron50
Aronszajn, N. (1950) Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68(3), 337–404. DOI.
Bach15
Bach, F. (2015) On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions. ArXiv Preprint ArXiv:1502.06800.
Bach00
Bach, F. (n.d.) Exploring large feature spaces with hierarchical multiple kernel learning. In In Advances in Neural Information Processing Systems (NIPS (p. 2008).
Bach13
Bach, F. R.(2013) Sharp analysis of low-rank kernel matrix approximations. In COLT (Vol. 30, pp. 185–209).
BaIS17
Backurs, A., Indyk, P., & Schmidt, L. (2017) On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks. ArXiv:1704.02958 [Cs, Stat].
BaZT04
Bakır, G. H., Zien, A., & Tsuda, K. (2004) Learning to Find Graph Pre-images. In C. E. Rasmussen, H. H. Bülthoff, B. Schölkopf, & M. A. Giese (Eds.), Pattern Recognition (pp. 253–261). Springer Berlin Heidelberg
Ball13
Balle Pigem, B. de. (2013, July 12) Learning finite-state machines: statistical and algorithmic aspects (info:eu-repo/semantics/doctoralThesis). . Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, Barcelona
BLGR16
Balog, M., Lakshminarayanan, B., Ghahramani, Z., Roy, D. M., & Teh, Y. W.(2016) The Mondrian Kernel. ArXiv:1606.05241 [Stat].
BaRo02
Baxter, B., & Roussos, G. (2002) A New Error Estimate of the Fast Gauss Transform. SIAM Journal on Scientific Computing, 24(1), 257–259. DOI.
BOSS08
Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schölkopf, B., & Rätsch, G. (2008) Support Vector Machines and Kernels for Computational Biology. PLoS Comput Biol, 4(10), e1000173. DOI.
BrLi04
Brown, L. D., & Lin, Y. (2004) Statistical properties of the method of regularization with periodic Gaussian reproducing kernel. The Annals of Statistics, 32(4), 1723–1743. DOI.
Burg98
Burges, C. J. C.(1998) Geometry and Invariance in Kernel Based Methods. In B. Schölkopf, C. J. Burges, & A. J. Smola (Eds.), Advances in Kernel Methods - Support Vector Learning. Cambridge, MA: MIT Press
CaOC01
Carrasco, R. C., Oncina, J., & Calera-Rubio, J. (2001) Stochastic Inference of Regular Tree Languages. Machine Learning, 44(1–2), 185–197. DOI.
CLVZ11
Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011) The devil is in the details: an evaluation of recent feature encoding methods.
ChLi09
Cheney, E. W., & Light, W. A.(2009) A Course in Approximation Theory. . American Mathematical Soc.
ChSi16
Choromanski, K., & Sindhwani, V. (2016) Recycling Randomness with Structure for Sublinear time Kernel Expansions. ArXiv:1605.09049 [Cs, Stat].
ClFW06
Clark, A., Florêncio, C. C., & Watkins, C. (2006) Languages as Hyperplanes: Grammatical Inference with String Kernels. In J. Fürnkranz, T. Scheffer, & M. Spiliopoulou (Eds.), Machine Learning: ECML 2006 (pp. 90–101). Springer Berlin Heidelberg
CFWS06
Clark, A., Florêncio, C. C., Watkins, C., & Serayet, M. (2006) Planar Languages and Learnability. In Y. Sakakibara, S. Kobayashi, K. Sato, T. Nishino, & E. Tomita (Eds.), Grammatical Inference: Algorithms and Applications (pp. 148–160). Springer Berlin Heidelberg
ClWa08
Clark, A., & Watkins, C. (2008) Some Alternatives to Parikh Matrices Using String Kernels. Fundamenta Informaticae, 84(3), 291–303.
CoDu02
Collins, M., & Duffy, N. (2002) Convolution Kernels for Natural Language. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14 (pp. 625–632). MIT Press
CoHM04
Cortes, C., Haffner, P., & Mohri, M. (2004) Rational Kernels: Theory and Algorithms. J. Mach. Learn. Res., 5, 1035–1062.
CuSm02
Cucker, F., & Smale, S. (2002) On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39(1), 1–49. DOI.
CuSS08
Cunningham, J. P., Shenoy, K. V., & Sahani, M. (2008) Fast Gaussian process methods for point process intensity estimation. (pp. 192–199). ACM Press DOI.
DaFG14
Danafar, S., Fukumizu, K., & Gomez, F. (2014) Kernel-based Information Criterion. ArXiv:1408.5810 [Stat].
DeGL96
Devroye, L., Györfi, L., & Lugosi, G. (1996) A probabilistic theory of pattern recognition. . New York: Springer
DrMa05
Drineas, P., & Mahoney, M. W.(2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6, 2153–2175.
DuKa73a
Duttweiler, D., & Kailath, T. (1973a) RKHS approach to detection and estimation problems–IV: Non-Gaussian detection. IEEE Transactions on Information Theory, 19(1), 19–28. DOI.
DuKa73b
Duttweiler, D., & Kailath, T. (1973b) RKHS approach to detection and estimation problems–V: Parameter estimation. IEEE Transactions on Information Theory, 19(1), 29–37. DOI.
DLGT13
Duvenaud, D., Lloyd, J., Grosse, R., Tenenbaum, J., & Zoubin, G. (2013) Structure Discovery in Nonparametric Regression through Compositional Kernel Search. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1166–1174).
ElDD03
Elgammal, A., Duraiswami, R., & Davis, L. S.(2003) Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. Pattern Analysis and Machine Intelligence, IEEE Transactions On, 25(11), 1499–1504. DOI.
FLRP13
FitzGerald, D., Liukus, A., Rafii, Z., Pardo, B., & Daudet, L. (2013) Harmonic/percussive separation using kernel additive modelling. In Irish Signals & Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT 2014). 25th IET (pp. 35–40). IET
FlTS16
Flaxman, S., Teh, Y. W., & Sejdinovic, D. (2016) Poisson intensity estimation with reproducing kernels. ArXiv:1610.08623 [Stat].
FrKL75
Friedlander, B., Kailath, T., & Ljung, L. (1975) Scattering theory and linear least squares estimation: Part II: Discrete-time problems. In 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes (pp. 57–58). DOI.
Gent02
Genton, M. G.(2002) Classes of Kernels for Machine Learning: A Statistics Perspective. Journal of Machine Learning Research, 2, 299–312.
GeKa73
Gevers, M., & Kailath, T. (1973) An innovations approach to least-squares estimation–Part VI: Discrete-time innovations representations and recursive estimation. IEEE Transactions on Automatic Control, 18(6), 588–600. DOI.
GlLi16
Globerson, A., & Livni, R. (2016) Learning Infinite-Layer Networks: Beyond the Kernel Trick. ArXiv:1606.05316 [Cs].
GrDa05
Grauman, K., & Darrell, T. (2005) The pyramid match kernel: discriminative classification with sets of image features. In Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005 (Vol. 2, p. 1458–1465 Vol. 2). DOI.
GrSt91
Greengard, L., & Strain, J. (1991) The Fast Gauss Transform. SIAM Journal on Scientific and Statistical Computing, 12(1), 79–94. DOI.
GFTS08
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J.(2008) A Kernel Statistical Test of Independence. In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference. Cambridge, MA: MIT Press
GSFT12
Grosse, R., Salakhutdinov, R. R., Freeman, W. T., & Tenenbaum, J. B.(2012) Exploiting compositionality to explore a large space of model structures. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Haus99
Haussler, D. (1999) Convolution kernels on discrete structures. . Technical report, UC Santa Cruz
HeBu14
Heinonen, M., & d’Alché-Buc, F. (2014) Learning nonparametric differential equations with operator-valued kernels and gradient matching. ArXiv:1411.5172 [Cs, Stat].
HoSS08
Hofmann, T., Schölkopf, B., & Smola, A. J.(2008) Kernel methods in machine learning. The Annals of Statistics, 36(3), 1171–1220. DOI.
Jung13
Jung, A. (2013) An RKHS Approach to Estimation with Sparsity Constraints. In Advances in Neural Information Processing Systems 29.
Kail71a
Kailath, T. (1971a) A note on least-squares estimation by the innovations method. In 1971 IEEE Conference on Decision and Control (pp. 407–411). DOI.
Kail71b
Kailath, T. (1971b) RKHS approach to detection and estimation problems–I: Deterministic signals in Gaussian noise. IEEE Transactions on Information Theory, 17(5), 530–549. DOI.
Kail71c
Kailath, T. (1971c) The Structure of Radon-Nikodym Derivatives with Respect to Wiener and Related Measures. The Annals of Mathematical Statistics, 42(3), 1054–1067.
Kail74
Kailath, T. (1974) A view of three decades of linear filtering theory. IEEE Transactions on Information Theory, 20(2), 146–181. DOI.
Kailath, T., & Duttweiler, D. (1972) An RKHS approach to detection and estimation problems– III: Generalized innovations representations and a likelihood-ratio formula. IEEE Transactions on Information Theory, 18(6), 730–745. DOI.
KaGe71
Kailath, T., & Geesey, R. (1971) An innovations approach to least squares estimation–Part IV: Recursive estimation given lumped covariance functions. IEEE Transactions on Automatic Control, 16(6), 720–727. DOI.
KaGe73
Kailath, T., & Geesey, R. (1973) An innovations approach to least-squares estimation–Part V: Innovations representations and recursive estimation in colored noise. IEEE Transactions on Automatic Control, 18(5), 435–453. DOI.
KaGW72
Kailath, T., Geesey, R., & Weinert, H. (1972) Some relations among RKHS norms, Fredholm equations, and innovations representations. IEEE Transactions on Information Theory, 18(3), 341–348. DOI.
KaWe75
Kailath, T., & Weinert, H. (1975) An RKHS approach to detection and estimation problems–II: Gaussian signal detection. IEEE Transactions on Information Theory, 21(1), 15–23. DOI.
KaFu00
Kanagawa, M., & Fukumizu, K. (n.d.) Recovering Distributions from Gaussian RKHS Embeddings. In Journal of Machine Learning Research.
KeCh72
Kemerait, R., & Childers, D. (1972) Signal detection and extraction by cepstrum techniques. IEEE Transactions on Information Theory, 18(6), 745–759. DOI.
KBGP16
Keriven, N., Bourrier, A., Gribonval, R., & Pérez, P. (2016) Sketching for Large-Scale Learning of Mixture Models. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6190–6194). DOI.
KiWa70
Kimeldorf, G. S., & Wahba, G. (1970) A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines. The Annals of Mathematical Statistics, 41(2), 495–502. DOI.
KlRB10
Kloft, M., Rückert, U., & Bartlett, P. L.(2010) A Unifying View of Multiple Kernel Learning. In J. L. Balcázar, F. Bonchi, A. Gionis, & M. Sebag (Eds.), Machine Learning and Knowledge Discovery in Databases (pp. 66–81). Springer Berlin Heidelberg DOI.
KoCM08
Kontorovich, L. (Aryeh), Cortes, C., & Mohri, M. (2008) Kernel methods for learning languages. Theoretical Computer Science, 405(3), 223–236. DOI.
KoCM06
Kontorovich, L., Cortes, C., & Mohri, M. (2006) Learning Linearly Separable Languages. In J. L. Balcázar, P. M. Long, & F. Stephan (Eds.), Algorithmic Learning Theory (pp. 288–303). Springer Berlin Heidelberg
KWSR16
Koppel, A., Warnell, G., Stump, E., & Ribeiro, A. (2016) Parsimonious Online Learning with Kernels via Sparse Projections in Function Space. ArXiv:1612.04111 [Cs, Stat].
LaSH03
Lawrence, N., Seeger, M., & Herbrich, R. (2003) Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (pp. 609–616).
LiIS10
Li, F., Ionescu, C., & Sminchisescu, C. (2010) Random Fourier Approximations for Skewed Multiplicative Histogram Kernels. In M. Goesele, S. Roth, A. Kuijper, B. Schiele, & K. Schindler (Eds.), Pattern Recognition (pp. 262–271). Springer Berlin Heidelberg DOI.
LRPF14
Liutkus, A., Rafii, Z., Pardo, B., Fitzgerald, D., & Daudet, L. (2014) Kernel spectrogram models for source separation. (pp. 6–10). IEEE DOI.
LjKa76
Ljung, L., & Kailath, T. (1976) Backwards Markovian models for second-order stochastic processes (Corresp). IEEE Transactions on Information Theory, 22(4), 488–491. DOI.
LjKF75
Ljung, L., Kailath, T., & Friedlander, B. (1975) Scattering theory and linear least squares estimation: Part I: Continuous-time problems. In 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes (pp. 55–56). DOI.
LDGT14
Lloyd, J. R., Duvenaud, D., Grosse, R., Tenenbaum, J. B., & Ghahramani, Z. (2014) Automatic Construction and Natural-Language Description of Nonparametric Regression Models. ArXiv:1402.4304 [Cs, Stat].
LSSC02
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002) Text Classification Using String Kernels. Journal of Machine Learning Research, 2, 419–444.
LNCS16
Lopez-Paz, D., Nishihara, R., Chintala, S., Schölkopf, B., & Bottou, L. (2016) Discovering Causal Signals in Images. ArXiv:1605.08179 [Cs, Stat].
LLHE08
Lu, Z., Leen, T. K., Huang, Y., & Erdogmus, D. (2008) A Reproducing Kernel Hilbert Space Framework for Pairwise Time Series Distances. In Proceedings of the 25th International Conference on Machine Learning (pp. 624–631). New York, NY, USA: ACM DOI.
MaBe17
Ma, S., & Belkin, M. (2017) Diving into the shallows: a computational perspective on large-scale shallow learning. ArXiv:1703.10622 [Cs, Stat].
MaAm14
Manton, J. H., & Amblard, P.-O. (2014) A Primer on Reproducing Kernel Hilbert Spaces. ArXiv:1408.0952 [Math].
McEl11
McFee, B., & Ellis, D. P.(2011) Analyzing song structure with spectral clustering. In IEEE conference on Computer Vision and Pattern Recognition (CVPR).
Merc09
Mercer, J. (1909) Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 209(441–458), 415–446. DOI.
MFSG14
Muandet, K., Fukumizu, K., Sriperumbudur, B., Gretton, A., & Schölkopf, B. (2014) Kernel Mean Shrinkage Estimators. ArXiv:1405.5505 [Cs, Stat].
MFSS16
Muandet, K., Fukumizu, K., Sriperumbudur, B., & Schölkopf, B. (2016) Kernel Mean Embedding of Distributions: A Review and Beyonds. ArXiv:1605.09522 [Cs, Stat].
MMRT01
Muller, K., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001) An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201. DOI.
Pill16
Pillonetto, G. (2016) The interplay between system identification and machine learning. ArXiv:1612.09158 [Cs, Stat].
PoGi90
Poggio, T., & Girosi, F. (1990) Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497. DOI.
RaRe07
Rahimi, A., & Recht, B. (2007) Random features for large-scale kernel machines. In Advances in neural information processing systems (pp. 1177–1184). Curran Associates, Inc.
RaRe09
Rahimi, A., & Recht, B. (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
RaWe14
Ramdas, A., & Wehbe, L. (2014) Stein Shrinkage for Cross-Covariance Operators and Kernel Independence Testing. ArXiv:1406.1922 [Stat].
Raykar, V. C., & Duraiswami, R. (2005) The improved fast Gauss transform with applications to machine learning. . Presented at the NIPS
ScHS01
Schölkopf, B., Herbrich, R., & Smola, A. J.(2001) A Generalized Representer Theorem. In D. Helmbold & B. Williamson (Eds.), Computational Learning Theory (pp. 416–426). Springer Berlin Heidelberg DOI.
SKSB98
Schölkopf, B., Knirsch, P., Smola, A., & Burges, C. (1998) Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces. In P. Levi, M. Schanz, R.-J. Ahlers, & F. May (Eds.), Mustererkennung 1998 (pp. 125–132). Springer Berlin Heidelberg DOI.
SMBK99
Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. J.(1999) Input Space Versus Feature Space in Kernel-Based Methods. IEEE TRANSACTIONS ON NEURAL NETWORKS, 10, 1000–1017.
SMFP15
Schölkopf, B., Muandet, K., Fukumizu, K., & Peters, J. (2015) Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations. ArXiv:1501.06794 [Cs, Stat].
ScSm02
Schölkopf, B., & Smola, A. J.(2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. . MIT Press
ScSm03
Schölkopf, B., & Smola, A. J.(2003) A Short Introduction to Learning with Kernels. In S. Mendelson & A. J. Smola (Eds.), Advanced Lectures on Machine Learning (pp. 41–64). Springer Berlin Heidelberg DOI.
ScSM97
Schölkopf, B., Smola, A., & Müller, K.-R. (1997) Kernel principal component analysis. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial Neural Networks — ICANN’97 (pp. 583–588). Springer Berlin Heidelberg DOI.
SeDK75
Segall, A., Davis, M., & Kailath, T. (1975) Nonlinear filtering with counting observations. IEEE Transactions on Information Theory, 21(2), 143–149. DOI.
SeKa76
Segall, A., & Kailath, T. (1976) Orthogonal functionals of independent-increment processes. IEEE Transactions on Information Theory, 22(3), 287–298. DOI.
ShBG16
Shen, Y., Baingana, B., & Giannakis, G. B.(2016) Nonlinear Structural Vector Autoregressive Models for Inferring Effective Brain Network Connectivity. ArXiv:1610.06551 [Stat].
SmSc98
Smola, A. J., & Schölkopf, B. (1998) On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion. Algorithmica, 22(1–2), 211–231. DOI.
SmSc00
Smola, A. J., & Schölkopf, B. (2000) Sparse greedy matrix approximation for machine learning.
SmSc04
Smola, A. J., & Schölkopf, B. (2004) A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. DOI.
SmSM98
Smola, A. J., Schölkopf, B., & Müller, K.-R. (1998) The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637–649. DOI.
SnGh05
Snelson, E., & Ghahramani, Z. (2005) Sparse Gaussian processes using pseudo-inputs. In Advances in neural information processing systems (pp. 1257–1264).
SGFL08
Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Lanckriet, G., & Schölkopf, B. (2008) Injective Hilbert Space Embeddings of Probability Measures. In Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008).
SzRi09
Székely, G. J., & Rizzo, M. L.(2009) Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265. DOI.
SzRB07
Székely, G. J., Rizzo, M. L., & Bakirov, N. K.(2007) Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794. DOI.
TiNh01
Tipping, M. E., & Nh, C. C.(2001) Sparse Kernel Principal Component Analysis. In Advances in Neural Information Processing Systems 13 (pp. 633–639). MIT Press
VeZi12
Vedaldi, A., & Zisserman, A. (2012) Efficient Additive Kernels via Explicit Feature Maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 480–492. DOI.
VVZJ10
Vempati, S., Vedaldi, A., Zisserman, A., & Jawahar, C. (2010) Generalized RBF feature maps for efficient detection. In BMVC (pp. 1–11).
VeTS04
Vert, J.-P., Tsuda, K., & Schölkopf, B. (2004) A primer on kernel methods. In Kernel Methods in Computational Biology. MIT Press
VSKB10
Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., & Borgwardt, K. M.(2010) Graph Kernels. J. Mach. Learn. Res., 11, 1201–1242.
WaKS08
Walder, C., Kim, K. I., & Schölkopf, B. (2008) Sparse Multiscale Gaussian Process Regression. In Proceedings of the 25th International Conference on Machine Learning (pp. 1112–1119). New York, NY, USA: ACM DOI.
WaSC06
Walder, C., Schölkopf, B., & Chapelle, O. (2006) Implicit Surface Modelling with a Globally Regularised Basis of Compact Support. Computer Graphics Forum, 25(3), 635–644. DOI.
WaST14
Wang, Y.-X., Smola, A., & Tibshirani, R. J.(2014) The Falling Factorial Basis and Its Statistical Applications. ArXiv:1405.0558 [Stat].
Wein78
Weinert, H. L.(1978) Statistical methods in optimal curve fitting. Communications in Statistics - Simulation and Computation, 7(4), 417–435. DOI.
WeKa74
Weinert, H. L., & Kailath, T. (1974) Stochastic Interpretations and Recursive Algorithms for Spline Functions. The Annals of Statistics, 2(4), 787–794. DOI.
WeSi78
Weinert, H., & Sidhu, G. (1978) A stochastic framework for recursive computation of spline functions–Part I: Interpolating splines. IEEE Transactions on Information Theory, 24(1), 45–50. DOI.
Will01
Williams, C. K. I.(2001) On a Connection between Kernel PCA and Metric Multidimensional Scaling. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems 13 (Vol. 46, pp. 675–681). MIT Press DOI.
Wilson, A. G., & Adams, R. P.(2013) Gaussian Process Kernels for Pattern Discovery and Extrapolation. ArXiv:1302.4245 [Cs, Stat].
WDLX15
Wilson, A. G., Dann, C., Lucas, C. G., & Xing, E. P.(2015) The Human Kernel. ArXiv:1510.07389 [Cs, Stat].
WuZh08
Wu, Q., & Zhou, D.-X. (2008) Learning with sample dependent hypothesis spaces. Computers & Mathematics with Applications, 56(11), 2896–2907. DOI.
XPPP08
Xu, J.-W., Paiva, A. R. C., Park, I., & Principe, J. C.(2008) A Reproducing Kernel Hilbert Space Framework for Information-Theoretic Learning. IEEE Transactions on Signal Processing, 56(12), 5891–5902. DOI.
Yang, C., Duraiswami, R., & Davis, L. S.(2004) Efficient kernel machines using the improved fast Gauss transform. In Advances in neural information processing systems (pp. 1561–1568).
YDGD03
Yang, C., Duraiswami, R., Gumerov, N. A., & Davis, L. (2003) Improved Fast Gauss Transform and Efficient Kernel Density Estimation. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2 (p. 464–). Washington, DC, USA: IEEE Computer Society DOI.
YSAM14
Yang, J., Sindhwani, V., Avron, H., & Mahoney, M. (2014) Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels. ArXiv:1412.8293 [Cs, Math, Stat].
YLMJ12
Yang, T., Li, Y.-F., Mahdavi, M., Jin, R., & Zhou, Z.-H. (2012) Nyström method vs random fourier features: A theoretical and empirical comparison. In Advances in neural information processing systems (pp. 476–484).
YCSS13
Yu, Y., Cheng, H., Schuurmans, D., & Szepesvári, C. (2013) Characterizing the representer theorem. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 570–578).
ZPJS12
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. ArXiv:1202.3775 [Cs, Stat].
ZFGS16
Zhang, Q., Filippi, S., Gretton, A., & Sejdinovic, D. (2016) Large-Scale Kernel Methods for Independence Testing. ArXiv:1606.07892 [Stat].
ZhZS13
Zhou, K., Zha, H., & Song, L. (2013) Learning triggering kernels for multi-dimensional Hawkes processes. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1301–1309).