TBD. Various notes on a.e. continuous monotonic changes of index to align two process, and estimation thereof.

Warping, registration problems. Especially interesting for functional data analysis, and analysis/resynthesis. A deterministic problem related to statistical changes of time.

General setup:

You have two functions \(\phi,\psi\), which we will assume for now are both \(\mathbb{R}^+\rightarrow \mathbb{R}\) and you wish to align them with respect to some pointwise loss, \(\ell\) and some class of permissable class \(\mathcal{F}\) of warping functions \(\mathbb{R}^+\rightarrow\mathbb{R}^+\), and you wish to do this over some interval \(D\).

This means that you wish to find

\[ \argmin_{f\in \mathcal{F}} \int_D(\ell(\phi(t), \psi(f(t))))dt \]

## Affine warps of a real function

Image/sound alignment problems are an interesting special case where we usually want affine warps, i.e. \(f(t)=at+b) for some constants \(a,b.\) Here the problem is mathematically not so complex, but in practice it can become computationally intractable; In particular, let’s say your function is interpolated from data, i.e. we observe \(\phi(k), k=1,2,3\dots,N,\) and \(\phi(k), k=1,2,3\dots,N.\) and construct a nice interpolant \(\hat{\phi}\) Leaving aside statistical questions about how accurate this is, which is a whole other story, how computationally feasible is this to check for alignment with some other ?

Let’s say we construct our interpolant from some mathematically justified basis, such as the sinc basis:

\[ \hat{\phi(t)}=\sum_{k=1}^N \sinc (t-k)\phi(k) \]

So that’s nicely linear and trivial to calculate. The loss integral then is

\[ \begin{aligned} \int_D\ell(\hat{\phi}(t), \hat{\psi}(f(t)))dt = \int_D\ell(\sum_{k=1}^N \sinc (t-k)\phi(k), \sum_{k=1}^N \sinc (f(t-k))\psi(k), ) \end{aligned} \]

We might try to approximate this integral by sampling it at some lattice or other subset, or we might have analytic forms for the integrals of the terms or whatever.

Either way, though, this basis expansion is expensive, requiring \(\mathcal{O}(N^2)\) evaluations, which is inconvenient.

If we’ve smoothed our data somewhat and have a spline basis, this can be better, although then we still have a smoothing problem to solve. As an irritating technical detail, `FITPACK`

– which is the Fortan spline library which seems to be everywhere – has, AFAICT, no function to take the product of two splines or to do an affine transform of the coordinate, or take derivatives with respect to a coordinate transform etc, so despite the fact that everything is simple in principle, so you have to do an enormous amount of book-keeping to do this.

## General linear transforms

Say you are in 2d, or 3d and now you wish to register two objects by finding an optimal linear transform…

## Smooth nonlinear transforms

Dynamic Time Warping. (Does it work in multiple axes?)

To compute DTW, one typically solves a minimal-cost alignment problem between two time series using dynamic programming. Our work takes advantage of a smoothed formulation of DTW, called soft-DTW, that computes the soft-minimum of all alignment costs. We show in this paper that soft-DTW is a differentiable loss function, and that both its value and gradient can be computed with quadratic time/space complexity (DTW has quadratic time but linear space complexity)

TBD.

## Other warps

Does the CTC warp-robust loss function (GFGS06) fit here? I think it’s for discretely-indexed language data, which is not quite the kind of continuous form I’m worried about.

CTC:

Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels. For example, CTC can be used to train end-to-end systems for speech recognition, which is how we have been using it at Baidu’s Silicon Valley AI Lab.

## Warping of point processes

A special interest of mine, at the intersection of functional data analysis, point processes and warping.

Though the study of multiple realisations of point processes has been considered prior to the emergence of FDA (see, e.g., Karr [22]), treating realisations of point processes as individual data objects within a functional data analysis context is a more recent development offering important advantages; a key paper is that of WuMZ13 (also see Chiou and Müller [10] and ChWH05). Such data may be an object of interest in themselves (see, e.g., WuMZ13, ArMü14, WuSr12) but may also arise as landmark data in an otherwise classical functional data analysis (see, e.g., GaKn95, ArMü14). The recent surge of interest is exemplified in an upcoming discussion paper by WuSr14, whose discussion documents early progress and challenges in the field.

## Refs

- CMTB14: (2014) Adaptive Gesture Recognition with Variation Estimation for Interactive Systems.
*ACM Trans. Interact. Intell. Syst.*, 4(4), 18:1–18:34. DOI - WuSr14: (2014) Analysis of spike train data: Alignment and comparisons using the extended Fisher-Rao metric.
*Electronic Journal of Statistics*, 8(2), 1776–1785. DOI - GFGS06: (2006) Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning (pp. 369–376). New York, NY, USA: ACM DOI
- Jame07: (2007) Curve alignment by moments.
*The Annals of Applied Statistics*, 1(2), 480–501. DOI - AABC15: (2015) Deep speech 2: end-to-end speech recognition in English and Mandarin.
*ArXiv:1512.02595 [Cs]*. - WuSr12: (2012) Estimating summary statistics in the spike-train space.
*Journal of Computational Neuroscience*, 34(3), 391–410. DOI - WuMZ13: (2013) Functional Data Analysis for Point Processes with Rare Events.
*Statistica Sinica*, 23(1), 1–23. - ArMü14: (2014) Pairwise dynamic time warping for event data.
*Computational Statistics & Data Analysis*, 69, 255–268. DOI - GiKO11: (2011) Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping.
- ArRo12: (2012) Robust depth-based estimation in the time warping model.
*Biostatistics (Oxford, England)*, 13(3), 398–414. DOI - PaZe16: (2016) Separation of Amplitude and Phase Variation in Point Processes.
*The Annals of Statistics*, 44(2), 771–812. DOI - CuBl17: (2017) Soft-DTW: a Differentiable Loss Function for Time-Series. In PMLR (pp. 894–903).
- BBVK02: (2002) The time-rescaling theorem and its application to neural spike train data analysis.
*Neural Computation*, 14(2), 325–346. DOI