# Clustering

Getting a bunch of data points and approximating them (in some sense) as membership (possibly fuzzy) in some groups, or regions of feature space.

For certain definitions this can be the same thing as non-negative and/or low rank matrix factorisations if you use mixture models, and is only really different in emphasis from dimensionality reduction. If you start with a list of features then think about “distances” between observations you have just implicitly intuced a weighted graph from your hitherto non-graphy data and are now looking at a networks problem.

If you really care about clustering as such, spectral clustering feels least inelegant if not fastest. Here is Chris Ding’s tutorial on spectral clustering

• CONCOR induces a cute similarity measure.
• MCL: Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.

There are many useful tricks in here, e.g. BeNi03 shows how to use a graph Laplacian (possibly a contrived or arbitrary one) to construct “natural” Euclidean coordinates for your data, such that nodes that have much traffic between them in the Laplacian representation have a small Euclidean distance (The “Urban Traffic Planner Fantasy Transformation”) Quickly gives you a similarity measure on really non-Euclidean data. Questions: Under which metrics is it equivalent to multidimensional scaling? Is it worthwhile going the other way and constructing density estimates from induced flow graphs?

## Clustering as matrix factorisation

If I know me, I might be looking at this page trying remember which papers situate k-means-type clustering in matrix factorisation literature.

The single-serve paper doing that is Bauc15, but there are broader versions in SiGo08, Türk15, some computer science connections in MiVW16, and an older one in ZaSh05.

Further things I might discuss here are the graph-flow/Laplacian notions of clustering and the density/centroids approach. I will discuss that under mixture models

AuVi15
Auvolat, A., & Vincent, P. (2015) Clustering is Efficient for Approximate Maximum Inner Product Search. arXiv:1507.05910 [cs, Stat].
BaSS08
Batson, J., Spielman, D. A., & Srivastava, N. (2008) Twice-Ramanujan Sparsifiers. arXiv:0808.0163 [cs].
Bauc15
Bauckhage, C. (2015) k-Means Clustering Is Matrix Factorization. arXiv:1512.07548 [stat].
BeNi03
Belkin, M., & Niyogi, P. (2003) Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6), 1373–1396. DOI.
CJSX14
Chen, Y., Jalali, A., Sanghavi, S., & Xu, H. (2014) Clustering Partially Observed Graphs via Convex Optimization. Journal of Machine Learning Research, 15(1), 2213–2238.
Clau05
Clauset, A. (2005) Finding local community structure in networks.
ClNM04
Clauset, A., Newman, M. E. J., & Moore, C. (2004) Finding community structure in very large networks. Physical Review E, 70(6), 066111. DOI.
Ding00
Ding, C. (n.d.) Spectral Clustering, ICML 2004 Tutorial.
DiHS05
Ding, C., He, X., & Simon, H. (2005) On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering. In Proceedings of the 2005 SIAM International Conference on Data Mining (pp. 606–610). Society for Industrial and Applied Mathematics
DoGr03
Donoho, D. L., & Grimes, C. (2003) Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10), 5591–5596. DOI.
DuMF05
Dueck, D., Morris, Q. D., & Frey, B. J.(2005) Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics, 21(suppl 1), i144–i151. DOI.
ElVi12
Elhamifar, E., & Vidal, R. (2012) Sparse Subspace Clustering: Algorithm, Theory, and Applications. arXiv:1203.1005 [cs, Math, Stat].
FHHP11
Fung, W. S., Hariharan, R., Harvey, N. J. A., & Panigrahi, D. (2011) A General Framework for Graph Sparsification. In Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing (pp. 71–80). New York, NY, USA: ACM DOI.
HaLB15
Hallac, D., Leskovec, J., & Boyd, S. (2015) Network Lasso: Clustering and Optimization in Large Graphs. arXiv:1507.00280 [cs, Math, Stat]. DOI.
HeNi04
He, X., & Niyogi, P. (2004) Locality preserving projections. In Advances in neural information processing systems 16: proceedings of the 2003 conference (Vol. 16, p. 153).
HINM04
Hoon, D., Imoto, Nolan, & Miyano. (2004) Open source clustering software. Bioinformatics (Oxford, England), 20, 1453–1454. DOI.
HuKL13
Huang, G., Kaess, M., & Leonard, J. J.(2013) Consistent sparsification for graph optimization. In 2013 European Conference on Mobile Robots (ECMR) (pp. 150–157). DOI.
KeLi04
Keogh, E., & Lin, J. (2004) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowledge and Information Systems, 8(2), 154–177. DOI.
Kris11
Krishnamurthy, A. (2011) High-dimensional clustering with sparse gaussian mixture models. Unpublished Paper, 191–192.
Luxb07
Luxburg, U. von. (2007) A Tutorial on Spectral Clustering.
MiVW16
Mixon, D. G., Villar, S., & Ward, R. (2016) Clustering subgaussian mixtures by semidefinite programming. arXiv:1602.06612 [cs, Math, Stat].
Mohl13
Mohler, G. (2013) Modeling and estimation of multi-source clustering in crime and security data. The Annals of Applied Statistics, 7(3), 1525–1539. DOI.
Newm04
Newman, M. E. J.(2004) Detecting community structure in networks. The European Physical Journal B - Condensed Matter and Complex Systems, 38(2), 321–330. DOI.
Scha07
Schaeffer, S. E.(2007) Graph clustering. Computer Science Review, 1(1), 27–64. DOI.
ShTi11
Shamir, O., & Tishby, N. (2011) Spectral Clustering on a Budget. Journal of Machine Learning Research.
SiGo08
Singh, A. P., & Gordon, G. J.(2008) A unified view of matrix factorization models. In Machine Learning and Knowledge Discovery in Databases (pp. 358–373). Springer
SATB05
Slonim, N., Atwal, G. S., Tkačik, G., & Bialek, W. (2005) Information-based clustering. Proceedings of the National Academy of Sciences of the United States of America, 102, 18297–18302. DOI.
SpTe04
Spielman, D. A., & Teng, S.-H. (2004) Nearly-linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems. In Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing (pp. 81–90). New York, NY, USA: ACM DOI.
SpTe08
Spielman, D. A., & Teng, S.-H. (2008) A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning. arXiv:0809.3232 [cs].
SpSr11
Spielman, D., & Srivastava, N. (2011) Graph Sparsification by Effective Resistances. SIAM Journal on Computing, 40(6), 1913–1926. DOI.
StTe05
Steyvers, M., & Tenenbaum, J. B.(2005) The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth. Cognitive Science, 29(1), 41–78. DOI.
Türk15
Türkmen, A. C.(2015) A Review of Nonnegative Matrix Factorization Methods for Clustering. arXiv:1507.03194 [cs, Stat].
WaXu13
Wang, Y.-X., & Xu, H. (2013) Noisy Sparse Subspace Clustering. (pp. 89–97). Presented at the Proceedings of The 30th International Conference on Machine Learning
YaHJ09
Yan, D., Huang, L., & Jordan, M. I.(2009) Fast Approximate Spectral Clustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 907–916). New York, NY, USA: ACM DOI.
Yosh10
Yoshida, T. (2010) A Graph Model for Clustering Based on Mutual Information. In B.-T. Zhang & M. A. Orgun (Eds.), PRICAI 2010: Trends in Artificial Intelligence (Vol. 6230, pp. 339–350). Berlin, Heidelberg: Springer Berlin Heidelberg
ZaSh05
Zass, R., & Shashua, A. (2005) A Unifying Approach to Hard and Probabilistic Clustering. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 - Volume 01 (pp. 294–301). Washington, DC, USA: IEEE Computer Society DOI.
ZDLZ07
Zhang, Z., Ding, C., Li, T., & Zhang, X. (2007) Binary matrix factorization with applications. In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007 (pp. 391–400). IEEE DOI.