The Living Thing / Notebooks :

Warping and registration problems

Matching up of bumps and wibbles in stretchy things

TBD. Various notes on a.e. continuous monotonic changes of index to align two process, and estimation thereof.

Warping, registration problems. Especially interesting for functional data analysis, and analysis/resynthesis. A deterministic problem related to statistical changes of time.

General setup:

You have two functions \(\phi,\psi\), which we will assume for now are both \(\mathbb{R}^+\rightarrow \mathbb{R}\) and you wish to align them with respect to some pointwise loss, \(\ell\) and some class of permissable class \(\mathcal{F}\) of warping functions \(\mathbb{R}^+\rightarrow\mathbb{R}^+\), and you wish to do this over some interval \(D\).

This means that you wish to find

\begin{equation*} \argmin_{f\in \mathcal{F}} \int_D(\ell(\phi(t), \psi(f(t))))dt \end{equation*}

Affine warps of a real function

Image/sound alignment problems are an interesting special case where we usually want affine warps, i.e. \(f(t)=at+b\) for some constants \(a,b.\) Here the problem is mathematically not so complex, but in practice it can become computationally intractable; In particular, let’s say your function is interpolated from data, i.e. we observe \(\phi(k), k=1,2,3\dots,N,\) and \(\phi(k), k=1,2,3\dots,N.\) and construct a nice interpolant \(\hat{\phi}\) Leaving aside statistical questions about how accurate this is, which is a whole other story, how computationally feasible is this to check for alignment with some other ?

Let’s say we construct our interpolant from some mathematically justified basis, such as the sinc basis:

\begin{equation*} \hat{\phi(t)}=\sum_{k=1}^N \sinc (t-k)\phi(k) \end{equation*}

So that’s nicely linear and trivial to calculate. The loss integral then is

\begin{equation*} \begin{aligned} \int_D\ell(\hat{\phi}(t), \hat{\psi}(f(t)))dt = \int_D\ell(\sum_{k=1}^N \sinc (t-k)\phi(k), \sum_{k=1}^N \sinc (f(t-k))\psi(k), ) \end{aligned} \end{equation*}

We might try to approximate this integral by sampling it at some lattice or other subset, or we might have analytic forms for the integrals of the terms or whatever.

Either way, though, this basis expansion is expensive, requiring \(\mathcal{O}(N^2)\) evaluations, which is inconvenient.

If we’ve smoothed our data somewhat and have a spline basis, this can be better, although then we still have a smoothing problem to solve. As an irritating technical detail, FITPACK — which is the Fortan spline library which seems to be everywhere — has, AFAICT, no function to take the product of two splines or to do an affine transform of the coordinate, or take derivatives with respect to a coordinate transform etc, so despite the fact that everything is simple in principle, so you have to do an enormous amount of book-keeping to do this.

General linear transforms

Say you are in 2d, or 3d and now you wish to register two objects by finding an optimal linear transform…

Smooth nonlinear transforms

Dynamic Time Warping. (Does it work in multiple axes?)


To compute DTW, one typically solves a minimal-cost alignment problem between two time series using dynamic programming. Our work takes advantage of a smoothed formulation of DTW, called soft-DTW, that computes the soft-minimum of all alignment costs. We show in this paper that soft-DTW is a differentiable loss function, and that both its value and gradient can be computed with quadratic time/space complexity (DTW has quadratic time but linear space complexity)


Other warps

Does the CTC warp-robust loss function (GFGS06) fit here? I think it’s for discretely-indexed language data, which is not quite the kind of continuous form I’m worried about.


Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels. For example, CTC can be used to train end-to-end systems for speech recognition, which is how we have been using it at Baidu’s Silicon Valley AI Lab.

Warping of point processes

A special interest of mine, at the intersection of functional data analysis, point processes and warping.


Though the study of multiple realisations of point processes has been considered prior to the emergence of FDA (see, e.g., Karr [22]), treating realisations of point processes as individual data objects within a functional data analysis context is a more recent development offering important advantages; a key paper is that of WuMZ13 (also see Chiou and Müller [10] and ChWH05). Such data may be an object of interest in themselves (see, e.g., WuMZ13, ArMü14, WuSr12) but may also arise as landmark data in an otherwise classical functional data analysis (see, e.g., GaKn95, ArMü14). The recent surge of interest is exemplified in an upcoming discussion paper by WuSr14, whose discussion documents early progress and challenges in the field.


Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., … Zhu, Z. (2015) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. arXiv:1512.02595 [cs].
Arribas-Gil, A., & Müller, H.-G. (2014) Pairwise dynamic time warping for event data. Computational Statistics & Data Analysis, 69, 255–268. DOI.
Arribas-Gil, A., & Romo, J. (2012) Robust depth-based estimation in the time warping model. Biostatistics (Oxford, England), 13(3), 398–414. DOI.
Brown, E., Barbieri, R., Ventura, V., Kass, R., & Frank, L. (2002) The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation, 14(2), 325–346. DOI.
Caramiaux, B., Montecchio, N., Tanaka, A., & Bevilacqua, F. (2014) Adaptive Gesture Recognition with Variation Estimation for Interactive Systems. ACM Trans. Interact. Intell. Syst., 4(4), 18:1–18:34. DOI.
Chiang, C.-T., Wang, M.-C., & Huang, C.-Y. (2005) Kernel Estimation of Rate Function for Recurrent Event Data. Scandinavian Journal of Statistics, 32(1), 77–91. DOI.
Fitzpatrick, J. M., Hill, D. L. G., & Maurer, Jr., C. R.(2000) Image Registration. In M. Sonka & J. M. Fitzpatrick (Eds.), Handbook of Medical Imaging, Volume 2. Medical Image Processing and Analysis (p. Chapter 8). 1000 20th Street, Bellingham, WA 98227-0010 USA: SPIE
Gasser, T., & Kneip, A. (1995) Searching for Structure in Curve Samples. Journal of the American Statistical Association, 90(432), 1179–1188. DOI.
Gillian, N., Knapp, B., & O’Modhrain, S. (2011) Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping.
Glocker, B., Komodakis, N., Tziritas, G., Navab, N., & Paragios, N. (2008) Dense image registration through MRFs and efficient linear programmingq. Medical Image Analysis, 12(6), 731–741. DOI.
Glocker, B., Sotiras, A., Komodakis, N., & Paragios, N. (2011) Deformable Medical Image Registration: Setting the State of the Art with Discrete Methods. Annual Review of Biomedical Engineering, 13(1), 219–244. DOI.
Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006) Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning (pp. 369–376). New York, NY, USA: ACM DOI.
James, G. M.(2007) Curve alignment by moments. The Annals of Applied Statistics, 1(2), 480–501. DOI.
Panaretos, V. M., & Zemel, Y. (2016) Separation of Amplitude and Phase Variation in Point Processes. The Annals of Statistics, 44(2), 771–812. DOI.
Wu, S., Müller, H.-G., & Zhang, Z. (2013) Functional Data Analysis for Point Processes with Rare Events. Statistica Sinica, 23(1), 1–23.
Wu, W., & Srivastava, A. (2012) Estimating summary statistics in the spike-train space. Journal of Computational Neuroscience, 34(3), 391–410. DOI.
Wu, W., & Srivastava, A. (2014) Analysis of spike train data: Alignment and comparisons using the extended Fisher-Rao metric. Electronic Journal of Statistics, 8(2), 1776–1785. DOI.
Yi, B.-K., Jagadish, H. V., & Faloutsos, C. (1998) Efficient retrieval of similar time sequences under time warping. In , 14th International Conference on Data Engineering, 1998. Proceedings (pp. 201–208). DOI.