The Living Thing / Notebooks :

Learning of manifolds

Also topological data analysis; other hip names to follow

Berger, Daniels and Yu: Manifolds in Genome search

As in – handling your high-dimensional, or graphical, data by trying to discover a low(er)-dimensional manifold that contains it. That is, inferring a hidden constraint that happens to have the form of a smooth surface of some low-ish dimension. related: Learning on manifolds

There are a million different versions of this. Multidimensional scaling seems to be the oldest.

Tangential aside: in dynamical systems we talk about creating very high dimensional Takens embedding for state space reconstruction for arbitrary nonlinear dynamics. I imagine there are some connections between learning the lower-dimensional manifold upon which lies your data, and the higher dimensional manifold in which your data’s state space is naturally expressed. But I would not be the first person to notice this, so hopefully it’s done for me somewhere?

See also kernel methods, which do regression on an implicit manifold, (how do you reconcile these, btw?) and functional regression where the manifold isn’t even necessarily low dimensional, although typically still smooth, in some sense.

See also information geometry, which doesn’t give you a manifold for your data, but a manifold in which the parametric model itself is embedded.

To look at: ISOMAP, Locally linear embedding, spectral embeddings, multidimensional scaling…

Bioinformatics is leading to some weird use of data manifolds; see for example BeDY16 for the performance implications of knowing the manifold shape for *-omics search, using compressive manifold storage based on both fractal dimension and metric entropy concepts. Also suggestive connection with fitness landscape in evolution.

Neural networks have some implicit manifolds, if you squint right. see Christopher Olahs’s visual explanation how, whose diagrams should be stolen by someone trying to explain V-C dimension.

MoSF13 argue:

Manifold learning algorithms have recently played a crucial role in unsupervised learning tasks such as clustering and nonlinear dimensionality reduction[…] Many such algorithms have been shown to be equivalent to Kernel PCA (KPCA) with data dependent kernels, itself equivalent to performing classical multidimensional scaling (cMDS) in a high dimensional feature space (Schölkopf et al., 1998; Williams, 2002; Bengio et al., 2004).[…] Recently, it has been observed that the majority of manifold learning algorithms can be expressed as a regularized loss minimization of a reconstruction matrix, followed by a singular value truncation (Neufeld et al., 2012)

Implementations

TTK

TTK

The Topology ToolKit (TTK) is an open-source library and software collection for topological data analysis in scientific visualization.

TTK can handle scalar data defined either on regular grids or triangulations, either in 2D or in 3D. It provides a substantial collection of generic, efficient and robust implementations of key algorithms in topological data analysis. It includes:

scikit-learn

scikit-learn implements a grab-bag of algorithms

tapkee

C++: Tapkee. Pro-tip – even without coding, tapkee does a long list of nice dimensionality reduction from the CLI, some of which are explicitly manifold learners (and the rest are matrix factorisations which is not so different)

To read