Getting a bunch of data points and approximating them (in some sense) as membership (possibly fuzzy) in some groups, or regions of feature space.
For certain definitions this can be the same thing as nonnegative and/or low rank matrix factorisations if you use mixture models, and is only really different in emphasis from dimensionality reduction. If you start with a list of features then think about “distances” between observations you have just implicitly intuced a weighted graph from your hitherto nongraphy data and are now looking at a networks problem.
If you really care about clustering as such, spectral clustering feels least inelegant if not fastest. Here is Chris Ding's tutorial on spectral clustering

CONCOR induces a cute similarity measure.

MCL: Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
There are many useful tricks in here, e.g. BeNi03 shows how to use a graph Laplacian (possibly a contrived or arbitrary one) to construct “natural” Euclidean coordinates for your data, such that nodes that have much traffic between them in the Laplacian representation have a small Euclidean distance (The “Urban Traffic Planner Fantasy Transformation”) Quickly gives you a similarity measure on really nonEuclidean data. Questions: Under which metrics is it equivalent to multidimensional scaling? Is it worthwhile going the other way and constructing density estimates from induced flow graphs?
Clustering as matrix factorisation
If I know me, I might be looking at this page trying remember which papers situate kmeanstype clustering in matrix factorisation literature.
The singleserve paper doing that is Bauc15, but there are broader versions in SiGo08, Türk15, some computer science connections in MiVW16, and an older one in ZaSh05.
Further things I might discuss here are the graphflow/Laplacian notions of clustering and the density/centroids approach. I will discuss that under mixture models