Forget QR and LU decompositions, there are now so many ways of factorising matrices that there are not enough acronyms in the alphabet to hold them, especially if you suspect your matrix is sparse, or could be made sparse because of some underlying constraint, or probably could, if squinted at in the right fashion, be such as a graph transition matrix, or Laplacian, or noisy transform of some smooth object, or at least would be very close to sparse or…

Your big matrix is the product (or sum, or…) of two matrices that are in some way simple (small-rank, small dimension, sparse), possibly with additional constraints. Can you find these simple matrices?

Here’s an example. Godec — A decomposition into low-rank *and* sparse components. (looks combined multidimensional factorisation and outlier detection). Implementations for MATLAB and R exist.

A hip one from the previous decade is Non Negative Matrix factorisation (NMF), which I think is a classic. I should explain why.

There are so many of these things, depending on your preferred choice of loss function, free variable and whatever.

Keywords: Matrix sketching, low-rank approximation, Generalised version of traditional dimensionality reduction. There seems to be little contact with the related area of \([\mathcal{H}]\)-matrix methods, as seen in, e.g. covariance matrices. Why is that?

(See hmatrix.org for one lab’s backgrounder and their implementation, h2lib, hlibpro for a black-box closed-source one.)

Matrix concentration inequalities turn out to be useful in making this work.

I would like to learn more about

- sparse or low-rank matrix approximation as clustering for density estimation, which is how I imagine high-dimensional mixture models would need to work, and thereby also
- Mercer kernel approximation.
- Connection to manifold learning is also probably worth examining.

Igor Carron’s Matrix Factorization Jungle classifies the following problems as matrix-factorisation type.

- Kernel Factorizations
- … Spectral clustering
- \([A = DX]\) with unknown D and X, solve for sparse X and X_i = 0 or 1 K-Means / K-Median clustering
- \([A = DX]\) with unknown D and X, solve for XX^T = I and X_i = 0 or 1 Subspace clustering
- \([A = AX]\) with unknown X, solve for sparse/other conditions on X Graph Matching
- \([A = XBX^T]\) with unknown X, B solve for B and X as a permutation NMF
- \([A = DX]\) with unknown D and X, solve for elements of D,X positive Generalized Matrix Factorization
- \([W.*L − W.*UV']\) with W a known mask, U,V unknowns solve for U,V and L lowest rank possible Matrix Completion
- \([A = H.*L]\) with H a known mask, L unknown solve for L lowest rank possible Stable Principle Component Pursuit (SPCP)/ Noisy Robust PCA
- \([A = L + S + N]\) with L, S, N unknown, solve for L low rank, S sparse, N noise Robust PCA
- \([A = L + S]\) with L, S unknown, solve for L low rank, S sparse Sparse PCA
- \([A = DX]\) with unknown D and X, solve for sparse D Dictionary Learning
- \([A = DX]\) with unknown D and X, solve for sparse X Archetypal Analysis
- \([A = DX]\) with unknown D and X, solve for D = AB with D, B positive Matrix Compressive Sensing (MCS)
- find a rank-r matrix L such that \([A(L) ~= b]\) / or \([A(L+S) = b]\) Multiple Measurement Vector (MMV)
- \([Y = A X]\) with unknown X and rows of X are sparse. compressed sensing
- \([Y = A X]\) with unknown X and rows of X are sparse, X is one column. Blind Source Separation (BSS)
- \([Y = A X]\) with unknown A and X and statistical independence between columns of X or subspaces of columns of X Partial and Online SVD/PCA
- … Tensor Decomposition
- … **Not sure about this one but see orthogonally decomposable tensors

Truncated Classic PCA is clearly also an example of this, but is excluded from the list for some reason. Boringness? the fact it’s a special case of Sparse PCA?

See also learning on manifolds, compressed sensing, optimisation random linear algebra and clustering, sparse regression…

## Overviews

- Data mining seminar: Matrix sketching
- Kumar and Schneider have a literature survey on on low rank approximation of matrices (KuSh16_)
- Preconditioning tutorial by Erica Klarreich
- Andrew McGregor’s ICML Tutorial Streaming, sampling, sketching
Spielman’s Laplacian Linear Equations, Graph Sparsification, Local Clustering, Low-Stretch Trees, etc. is the best start and links lots of online textbooks and so on. Is it the same as this other page? Laplacian Linear Equations, Graph Sparsification, Local Clustering, Low-Stretch Trees, etc.:

Shang-Hua Teng and I wrote a large paper on the problem of solving systems of linear equations in the Laplacian matrices of graphs. This paper required many graph-theoretic algorithms, most of which have been greatly improved. This page is an attempt to keep track of the major developments in and applications of these ideas.

- Another one that makes the link to clustering_ is Chris Ding’s Principal Component Analysis and Matrix Factorizations for Learning
Igor Carron’s Advanced Matrix Factorization Jungle.

## Sketching

“Sketching” is what I am going to use to describe the subset of factorisations which reduce the dimensionality of the matrices in question in a way I will make clear shortly.

[#Mart16] mentions CUR and interpolative decompositions. Does preconditioning fit here?

## Randomized methods

Most of these algorithms have multiple optima and use a greedy search to find solutions; that is, they are deterministic up to choice of starting parameters.

There are also randomised versions.

## Implementations

“Enough theory! Plug the hip new toy into my algorithm!”

OK.

HPC for matlab, R, python, c++: libpmf:

LIBPMF implements the CCD++ algorithm, which aims to solve large-scale matrix factorization problems such as the low-rank factorization problems for recommender systems.

NMF (R): TBD

laplacians.jl (Julia):

Laplacians is a package containing graph algorithms, with an emphsasis on tasks related to spectral and algebraic graph theory. It contains (and will contain more) code for solving systems of linear equations in graph Laplacians, low stretch spanning trees, sparsifiation, clustering, local clustering, and optimization on graphs.

All graphs are represented by sparse adjacency matrices. This is both for speed, and because our main concerns are algebraic tasks. It does not handle dynamic graphs. It would be very slow to implement dynamic graphs this way.

Laplacians.jl was started by Daniel A. Spielman. Other contributors include Rasmus Kyng, Xiao Shi, Sushant Sachdeva, Serban Stan and Jackson Thea.

Matlab: Chih-Jen Lin’s nmf.m - “This tool solves NMF by alternative non-negative least squares using projected gradients. It converges faster than the popular multiplicative update approach.”

distributed nmf: In this repository, we offer both MPI and OPENMP implementation for MU, HALS and ANLS/BPP based NMF algorithms. This can run off the shelf as well easy to integrate in other source code. These are very highly tuned NMF algorithms to work on super computers. We have tested this software in NERSC as well OLCF cluster. The openmp implementation is tested on many different linux variants with intel processors. The library works well for both sparse and dense matrix. (KaBP16_, Kann16_, FKPB15_)

Spams (C++/MATLAB/python) includes some matrix factorisations in its sparse approx toolbox. (see optimisation)

`scikit-learn`

(python) does a few matrix factorisation in its inimitable batteries-in-the-kitchen-sink way.

nimfa (python) - “Nimfa is a Python library for nonnegative matrix factorization. It includes implementations of several factorization methods, initialization approaches, and quality scoring. Both dense and sparse matrix representation are supported.”

Tapkee (C++). Pro-tip – even without coding C++, tapkee does a long list of dimensionality reduction from the CLI.

- PCA and randomized PCA
- Kernel PCA (kPCA)
- Random projection
- Factor analysis

## Refs

- RoTy08: Vladimir Rokhlin, Mark Tygert (2008) A fast randomized algorithm for overdetermined linear least-squares regression.
*Proceedings of the National Academy of Sciences*, 105(36), 13212–13217. DOI - WLRT08: Franco Woolfe, Edo Liberty, Vladimir Rokhlin, Mark Tygert (2008) A fast randomized algorithm for the approximation of matrices.
*Applied and Computational Harmonic Analysis*, 25(3), 335–366. DOI - KoMP12: Ioannis Koutis, Gary L. Miller, Richard Peng (2012) A fast solver for a class of linear systems.
*Communications of the ACM*, 55(10), 99–107. DOI - FHHP11: Wai Shing Fung, Ramesh Hariharan, Nicholas J.A. Harvey, Debmalya Panigrahi (2011) A General Framework for Graph Sparsification. In Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing (pp. 71–80). New York, NY, USA: ACM DOI
- HuPC14: Tao Hu, Cengiz Pehlevan, Dmitri B. Chklovskii (2014) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. In 2014 48th Asilomar Conference on Signals, Systems and Computers. DOI
- KaBP16: Ramakrishnan Kannan, Grey Ballard, Haesun Park (2016) A High-performance Parallel Algorithm for Nonnegative Matrix Factorization. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 9:1–9:11). New York, NY, USA: ACM DOI
- SpTe08a: Daniel A. Spielman, Shang-Hua Teng (2008a) A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning.
*ArXiv:0809.3232 [Cs]*. - SoCh17: Yong Sheng Soh, Venkat Chandrasekaran (2017) A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.
*ArXiv:1701.01207 [Cs, Math, Stat]*. - AGHM12: Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, … Michael Zhu (2012) A Practical Algorithm for Topic Modeling with Provable Guarantees.
*ArXiv:1212.4777 [Cs, Stat]*. - CoPe08: Patrick L. Combettes, Jean-Christophe Pesquet (2008) A proximal decomposition method for solving convex variational.
*Inverse Problems*, 24(6), 065014. DOI - RoST09: Vladimir Rokhlin, Arthur Szlam, Mark Tygert (2009) A Randomized Algorithm for Principal Component Analysis.
*SIAM J. Matrix Anal. Appl.*, 31(3), 1100–1124. DOI - MaRT06: Per-Gunnar Martinsson, Vladimir Rockhlin, Mark Tygert (2006) A randomized algorithm for the approximation of matrices. DTIC Document
- Türk15: Ali Caner Türkmen (2015) A Review of Nonnegative Matrix Factorization Methods for Clustering.
*ArXiv:1507.03194 [Cs, Stat]*. - Lin00: Zhouchen Lin (n.d.) A Review on Low-Rank Models in Signal and Data Analysis.
- SoVM14: C. O. S. Sorzano, J. Vargas, A. Pascual Montano (2014) A survey of dimensionality reduction techniques.
*ArXiv:1403.2877 [Cs, q-Bio, Stat]*. - Kesh03: Nirmal Keshava (2003) A survey of spectral unmixing algorithms.
*Lincoln Laboratory Journal*, 14(1), 55–78. - SiGo08: Ajit P. Singh, Geoffrey J. Gordon (2008) A unified view of matrix factorization models. In Machine Learning and Knowledge Discovery in Databases (pp. 358–373). Springer
- ZaSh05: Ron Zass, Amnon Shashua (2005) A Unifying Approach to Hard and Probabilistic Clustering. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 - Volume 01 (pp. 294–301). Washington, DC, USA: IEEE Computer Society DOI
- LiPo02: Chi-Kwong Li, Edward Poon (2002) Additive Decomposition of Real Matrices.
*Linear and Multilinear Algebra*, 50(4), 321–326. DOI - BBLP07: Michael W. Berry, Murray Browne, Amy N. Langville, V. Paul Pauca, Robert J. Plemmons (2007) Algorithms and applications for approximate nonnegative matrix factorization.
*Computational Statistics & Data Analysis*, 52(1), 155–173. DOI - LeSe01: Daniel D. Lee, H. Sebastian Seung (2001) Algorithms for Non-negative Matrix Factorization. In Advances in Neural Information Processing Systems 13 (pp. 556–562). MIT Press
- YuMB17: Chenhan D. Yu, William B. March, George Biros (2017) An \(N \log N\) Parallel Fast Direct Solver for Kernel Matrices. In arXiv:1701.02324 [cs].
- DaGu03: Sanjoy Dasgupta, Anupam Gupta (2003) An elementary proof of a theorem of Johnson and Lindenstrauss.
*Random Structures & Algorithms*, 22(1), 60–65. DOI - KhLM09: B. N. Khoromskij, A. Litvinenko, H. G. Matthies (2009) Application of hierarchical matrices for computing the Karhunen–Loève expansion.
*Computing*, 84(1–2), 49–67. DOI - BlCH10: David M. Blei, Perry R. Cook, Matthew Hoffman (2010) Bayesian nonparametric matrix factorization for recorded music. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 439–446).
- FKPB15: James P. Fairbanks, Ramakrishnan Kannan, Haesun Park, David A. Bader (2015) Behavioral clusters in dynamic graphs.
*Parallel Computing*, 47, 38–50. DOI - DoGr17: Ivan Dokmanić, Rémi Gribonval (2017) Beyond Moore-Penrose Part II: The Sparse Pseudoinverse.
*ArXiv:1706.08701 [Cs, Math]*. - ZDLZ07: Zhongyuan Zhang, Chris Ding, Tao Li, Xiangsun Zhang (2007) Binary matrix factorization with applications. In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007 (pp. 391–400). IEEE DOI
- CoDF92: Albert Cohen, Ingrid Daubechies, Jean-Christophe Feauveau (1992) Biorthogonal bases of compactly supported wavelets.
*Communications on Pure and Applied Mathematics*, 45(5), 485–560. - HuKL13: G. Huang, M. Kaess, J. J. Leonard (2013) Consistent sparsification for graph optimization. In 2013 European Conference on Mobile Robots (ECMR) (pp. 150–157). DOI
- DiLJ10: C. Ding, Tao Li, M.I. Jordan (2010) Convex and Semi-Nonnegative Matrix Factorizations.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 32(1), 45–55. DOI - VaTN16: Colin Vaz, Asterios Toutios, Shrikanth S. Narayanan (2016) Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data. (pp. 963–967). DOI
- Bach13: Francis Bach (2013) Convex relaxations of structured matrix factorizations.
*ArXiv:1309.3117 [Cs, Math]*. - DeBV16: Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst (2016) Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Advances In Neural Information Processing Systems.
- Achl03: Dimitris Achlioptas (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins.
*Journal of Computer and System Sciences*, 66(4), 671–687. DOI - CSSW00: Bin Cao, Dou Shen, Jian-Tao Sun, Xuanhui Wang, Qiang Yang, Zheng Chen (n.d.) Detect and Track Latent Factors with Online Nonnegative Matrix Factorization.
- LiTX16: Tongliang Liu, Dacheng Tao, Dong Xu (2016) Dimensionality-Dependent Generalization Bounds for \(k\)-Dimensional Coding Schemes.
*ArXiv:1601.00238 [Cs, Stat]*. - BeBV10: N. Bertin, R. Badeau, E. Vincent (2010) Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription.
*IEEE Transactions on Audio, Speech, and Language Processing*, 18(3), 538–549. DOI - TuKu82: D. W. Tufts, R. Kumaresan (1982) Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood.
*Proceedings of the IEEE*, 70(9), 975–989. DOI - YeLi16: Ke Ye, Lek-Heng Lim (2016) Every Matrix is a Product of Toeplitz Matrices.
*Foundations of Computational Mathematics*, 16(3), 577–598. DOI - ElLa92: Robert L. Ellis, David C. Lay (1992) Factorization of finite rank Hankel and Toeplitz matrices.
*Linear Algebra and Its Applications*, 173, 19–38. DOI - TuLi00: Frederick Tung, James J. Little (n.d.) Factorized Binary Codes for Large-Scale Nearest Neighbor Search.
- HeRo11: Georg Heinig, Karla Rost (2011) Fast algorithms for Toeplitz and Hankel matrices.
*Linear Algebra and Its Applications*, 435(1), 1–59. DOI - HsDh11: Cho-Jui Hsieh, Inderjit S. Dhillon (2011) Fast Coordinate Descent Methods with Variable Selection for Non-negative Matrix Factorization. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1064–1072). New York, NY, USA: ACM DOI
- MGVB11: Boris Mailhé, Rémi Gribonval, Pierre Vandergheynst, Frédéric Bimbot (2011) Fast orthogonal sparse approximation algorithms over local dictionaries.
*Signal Processing*, 91(12), 2822–2835. DOI - HaMT09: Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2009) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.
*ArXiv:0909.4061 [Math]*. - WaJi04: Yuan Wang, Yunde Jia (2004) Fisher non-negative matrix factorization for learning local features. In In Proc. Asian Conf. on Comp. Vision (pp. 27–30).
- GLPW15: Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff (2015) Frequent Directions : Simple and Deterministic Matrix Sketching.
*ArXiv:1501.01711 [Cs]*. - SrDh06: Suvrit Sra, Inderjit S. Dhillon (2006) Generalized Nonnegative Matrix Approximations with Bregman Divergences. In Advances in Neural Information Processing Systems 18 (pp. 283–290). MIT Press
- ZhTa11: Tianyi Zhou, Dacheng Tao (2011) Godec: Randomized low-rank & sparse matrix decomposition in noisy case.
- SpSr11: D. Spielman, N. Srivastava (2011) Graph Sparsification by Effective Resistances.
*SIAM Journal on Computing*, 40(6), 1913–1926. DOI - ViBB08: E. Vincent, N. Bertin, R. Badeau (2008) Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 109–112). DOI
- Hack15: Wolfgang Hackbusch (2015)
*Hierarchical Matrices: Algorithms and Analysis*. Heidelberg New York Dordrecht London: Springer Publishing Company, Incorporated - YaMM15: Jiyan Yang, Xiangrui Meng, Michael W. Mahoney (2015) Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments.
*ArXiv:1502.03032 [Cs, Math, Stat]*. - DeGP16: A. Desai, M. Ghashami, J. M. Phillips (2016) Improved Practical Matrix Sketching with Guarantees.
*IEEE Transactions on Knowledge and Data Engineering*, 28(7), 1678–1690. DOI - Bauc15: Christian Bauckhage (2015) k-Means Clustering Is Matrix Factorization.
*ArXiv:1512.07548 [Stat]*. - NoLi13: W. Nowak, A. Litvinenko (2013) Kriging and Spatial Design Accelerated by Orders of Magnitude: Combining Low-Rank Covariance Approximations with FFT-Techniques.
*Mathematical Geosciences*, 45(4), 411–435. DOI - YiGL16: M. Yin, J. Gao, Z. Lin (2016) Laplacian Regularized Low-Rank Representation and Its Applications.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 38(3), 504–517. DOI - GNHS11: Rainer Gemulla, Erik Nijkamp, Peter J. Haas, Yannis Sismanis (2011) Large-scale Matrix Factorization with Distributed Stochastic Gradient Descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 69–77). New York, NY, USA: ACM DOI
- LHZC01: S.Z. Li, XinWen Hou, HongJiang Zhang, Qiansheng Cheng (2001) Learning spatially localized, parts-based representation. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001 (Vol. 1, pp. I-207-I–212 vol.1). DOI
- LeSe99: Daniel D. Lee, H. Sebastian Seung (1999) Learning the parts of objects by non-negative matrix factorization.
*Nature*, 401(6755), 788–791. DOI - KuSh16: N. Kishore Kumar, Jan Shneider (2016) Literature survey on low rank approximation of matrices.
*ArXiv:1606.06511 [Cs, Math]*. - WHGS17: Boyue Wang, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, Baocai Yin (2017) Locality Preserving Projections for Grassmann manifold. In PRoceedings of IJCAI, 2017.
- CLOP16: A. Cichocki, N. Lee, I. V. Oseledets, A.-H. Phan, Q. Zhao, D. Mandic (2016) Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1.
*ArXiv:1609.00893 [Cs]*. - Vish13: Nisheeth K. Vishnoi (2013) Lx = b.
*Foundations and Trends® in Theoretical Computer Science*, 8(1–2), 1–141. DOI - KoBV09: Yehuda Koren, Robert Bell, Chris Volinsky (2009) Matrix Factorization Techniques for Recommender Systems.
*Computer*, 42(8), 30–37. DOI - Virt07: T. Virtanen (2007) Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria.
*IEEE Transactions on Audio, Speech, and Language Processing*, 15(3), 1066–1074. DOI - ZhTa12: Tianyi Zhou, Dacheng Tao (2012) Multi-label Subspace Ensemble.
*Journal of Machine Learning Research*. - DuMF05: Delbert Dueck, Quaid D. Morris, Brendan J. Frey (2005) Multi-way clustering of microarray data using probabilistic sparse matrix factorization.
*Bioinformatics*, 21(suppl 1), i144–i151. DOI - CVVR11: J. J. Carabias-Orti, T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, F. J. Canadas-Quesada (2011) Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.
*IEEE Journal of Selected Topics in Signal Processing*, 5(6), 1144–1158. DOI - HIKP12: Haitham Hassanieh, Piotr Indyk, Dina Katabi, Eric Price (2012) Nearly Optimal Sparse Fourier Transform. In Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing (pp. 563–578). New York, NY, USA: ACM DOI
- SpTe04: Daniel A. Spielman, Shang-Hua Teng (2004) Nearly-linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems. In Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing (pp. 81–90). New York, NY, USA: ACM DOI
- SpTe06: Daniel A. Spielman, Shang-Hua Teng (2006) Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems.
*ArXiv:Cs/0607105*. - GTLY12: Naiyang Guan, Dacheng Tao, Zhigang Luo, Bo Yuan (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization.
*IEEE Transactions on Signal Processing*, 60(6), 2882–2898. - AgNR16: Alireza Aghasi, Nam Nguyen, Justin Romberg (2016) Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks.
*ArXiv:1611.05162 [Cs, Stat]*. - CiZA06: A. Cichocki, R. Zdunek, S. Amari (2006) New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 5, pp. V–V). DOI
- LaUr09: Neil D. Lawrence, Raquel Urtasun (2009) Non-linear Matrix Factorization with Gaussian Processes. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 601–608). New York, NY, USA: ACM DOI
- Krus64: J. B. Kruskal (1964) Nonmetric multidimensional scaling: A numerical method.
*Psychometrika*, 29(2), 115–129. DOI - WaZh13: Y. X. Wang, Y. J. Zhang (2013) Nonnegative Matrix Factorization: A Comprehensive Review.
*IEEE Transactions on Knowledge and Data Engineering*, 25(6), 1336–1353. DOI - Deva08: Karthik Devarajan (2008) Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology.
*PLoS Comput Biol*, 4(7), e1000029. DOI - KiPa08: H. Kim, H. Park (2008) Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method.
*SIAM Journal on Matrix Analysis and Applications*, 30(2), 713–730. DOI - FéBD08: Cédric Févotte, Nancy Bertin, Jean-Louis Durrieu (2008) Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis.
*Neural Computation*, 21(3), 793–830. DOI - Hoye02: P.O. Hoyer (2002) Non-negative sparse coding. In Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002 (pp. 557–565). DOI
- RyRy10: Daniil Ryabko, Boris Ryabko (2010) Nonparametric Statistical Inference for Ergodic Processes.
*IEEE Transactions on Information Theory*, 56(3), 1430–1435. DOI - DiHS05: C. Ding, X. He, H. Simon (2005) On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering. In Proceedings of the 2005 SIAM International Conference on Data Mining (pp. 606–610). Society for Industrial and Applied Mathematics
- DrMa05: Petros Drineas, Michael W. Mahoney (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning.
*Journal of Machine Learning Research*, 6, 2153–2175. - LiTa15: T. Liu, D. Tao (2015) On the Performance of Manhattan Nonnegative Matrix Factorization.
*IEEE Transactions on Neural Networks and Learning Systems*, PP(99), 1–1. DOI - BrEZ08a: A. M. Bruckstein, Michael Elad, M. Zibulevsky (2008a) On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.
*IEEE Transactions on Information Theory*, 54(11), 4813–4820. DOI - PZWL14: Gang Pan, Wangsheng Zhang, Zhaohui Wu, Shijian Li (2014) Online Community Detection for Large Complex Networks.
*PLoS ONE*, 9(7), e102799. DOI - MBPS09: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro (2009) Online Dictionary Learning for Sparse Coding. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 689–696). New York, NY, USA: ACM DOI
- HoBB10: Matthew Hoffman, Francis R. Bach, David M. Blei (2010) Online learning for latent dirichlet allocation. In advances in neural information processing systems (pp. 856–864).
- MBPS10: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro (2010) Online learning for matrix factorization and sparse coding.
*The Journal of Machine Learning Research*, 11, 19–60. - GTLY12: N. Guan, D. Tao, Z. Luo, B. Yuan (2012) Online Nonnegative Matrix Factorization With Robust Stochastic Approximation.
*IEEE Transactions on Neural Networks and Learning Systems*, 23(7), 1087–1099. DOI - BJMO12: Francis Bach, Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski (2012) Optimization with Sparsity-Inducing Penalties.
*Foundations and Trends® in Machine Learning*, 4(1), 1–106. DOI - YHSD14: Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon (2014) Parallel matrix factorization for recommender systems.
*Knowledge and Information Systems*, 41(3), 793–819. DOI - AbPl04: Samer A. Abdallah, Mark D. Plumbley (2004) Polyphonic Music Transcription by Non-Negative Sparse Coding of Power Spectra.
- PaTa94: Pentti Paatero, Unto Tapper (1994) Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.
*Environmetrics*, 5(2), 111–126. DOI - TYUC17: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher (2017) Practical sketching algorithms for low-rank matrix approximation.
*SIAM Journal on Matrix Analysis and Applications*, 38(4), 1454–1485. DOI - Lin07: Chih-Jen Lin (2007) Projected Gradient Methods for Nonnegative Matrix Factorization.
*Neural Computation*, 19(10), 2756–2779. DOI - GLFB10: David Gross, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, Jens Eisert (2010) Quantum state tomography via compressed sensing.
*Physical Review Letters*, 105(15). DOI - FGLE12: Steven T. Flammia, David Gross, Yi-Kai Liu, Jens Eisert (2012) Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators.
*New Journal of Physics*, 14(9), 095022. DOI - LaGG16: Subhaneil Lahiri, Peiran Gao, Surya Ganguli (2016) Random projections of random manifolds.
*ArXiv:1607.04331 [Cs, q-Bio, Stat]*. - ZLZX17: Kai Zhang, Chuanren Liu, Jie Zhang, Hui Xiong, Eric Xing, Jieping Ye (2017) Randomization or Condensation?: Linear-Cost Matrix Sketching Via Cascaded Compression Sampling. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 615–623). New York, NY, USA: ACM DOI
- LWMR07: Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, Mark Tygert (2007) Randomized algorithms for the low-rank approximation of matrices.
*Proceedings of the National Academy of Sciences*, 104(51), 20167–20172. DOI - Mart16: Per-Gunnar Martinsson (2016) Randomized methods for matrix computations and analysis of high dimensional data.
*ArXiv:1607.01649 [Math]*. - TYUC16: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher (2016) Randomized single-view algorithms for low-rank matrix approximation.
*ArXiv:1609.00048 [Cs, Math, Stat]*. - Gros11: D. Gross (2011) Recovering Low-Rank Matrices From Few Coefficients in Any Basis.
*IEEE Transactions on Information Theory*, 57(3), 1548–1566. DOI - Kann16: Ramakrishnan Kannan (2016) Scalable and distributed constrained low rank approximations.
- YHSD12: Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon (2012) Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems. In IEEE International Conference of Data Mining (pp. 765–774). DOI
- Bach13: Francis R. Bach (2013) Sharp analysis of low-rank kernel matrix approximations. In COLT (Vol. 30, pp. 185–209).
- Libe13: Edo Liberty (2013) Simple and Deterministic Matrix Sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 581–588). New York, NY, USA: ACM DOI
- HIKP12: H. Hassanieh, P. Indyk, D. Katabi, E. Price (2012) Simple and Practical Algorithm for Sparse Fourier Transform. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1183–1194). Kyoto, Japan: Society for Industrial and Applied Mathematics
- WaGM17: Shusen Wang, Alex Gittens, Michael W. Mahoney (2017) Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging.
*ArXiv:1702.04837 [Cs, Stat]*. - Wood14: David P. Woodruff (2014) Sketching as a Tool for Numerical Linear Algebra.
*Foundations and Trends® in Theoretical Computer Science*, 10(1–2), 1–157. DOI - KBGP16: Nicolas Keriven, Anthony Bourrier, Rémi Gribonval, Patrick Pérez (2016) Sketching for Large-Scale Learning of Mixture Models. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6190–6194). DOI
- DSBN13: Gautam Dasarathy, Parikshit Shah, Badri Narayan Bhaskar, Robert Nowak (2013) Sketching Sparse Matrices.
*ArXiv:1303.6544 [Cs, Math]*. - MaBP14: Julien Mairal, Francis Bach, Jean Ponce (2014) Sparse modeling for image and vision processing.
*Foundations and Trends® in Comput Graph. Vis.*, 8(2–3), 85–283. DOI - BrEZ08b: A. M. Bruckstein, Michael Elad, M. Zibulevsky (2008b) Sparse non-negative solution of a linear system of equations is unique. In 3rd International Symposium on Communications, Control and Signal Processing, 2008. ISCCSP 2008 (pp. 762–767). DOI
- SpTe08b: Daniel A. Spielman, Shang-Hua Teng (2008b) Spectral Sparsification of Graphs.
*ArXiv:0808.4134 [Cs]*. - SuSt16: Ying Sun, Michael L. Stein (2016) Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets.
*Journal of Computational and Graphical Statistics*, 25(1), 187–208. DOI - MMTV17: Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux (2017) Stochastic Subsampling for Factorizing Huge Matrices.
*ArXiv:1701.05363 [Math, q-Bio, Stat]*. - ZhWG17: Xiao Zhang, Lingxiao Wang, Quanquan Gu (2017) Stochastic Variance-reduced Gradient Descent for Low-rank Matrix Recovery from Linear Measurements.
*ArXiv:1701.00481 [Stat]*. - YaXu15: Wenzhuo Yang, Huan Xu (2015) Streaming Sparse Principal Component Analysis. In Journal of Machine Learning Research (pp. 494–503).
- BaMM17: Jean Barbier, Nicolas Macris, Léo Miolane (2017) The Layered Structure of Tensor Estimation and its Mutual Information.
*ArXiv:1709.10368 [Cond-Mat, Physics:Math-Ph]*. - BaSS08: Joshua Batson, Daniel A. Spielman, Nikhil Srivastava (2008) Twice-Ramanujan Sparsifiers.
*ArXiv:0808.0163 [Cs]*. - NeVe09: Deanna Needell, Roman Vershynin (2009) Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit.
*Foundations of Computational Mathematics*, 9(3), 317–334. DOI - OyTr15: Samet Oymak, Joel A. Tropp (2015) Universality laws for randomized dimension reduction, with applications.
*ArXiv:1511.09433 [Cs, Math, Stat]*. - ScLH07: M.N. Schmidt, J. Larsen, Fu-Tien Hsiao (2007) Wind Noise Reduction using Non-Negative Sparse Coding. In 2007 IEEE Workshop on Machine Learning for Signal Processing (pp. 431–436). DOI