đ§ Various notes on a.e. continuous monotonic changes of index to align two process, and estimation thereof.

Warping, registration problems. Especially interesting for functional data analysis, and analysis/resynthesis. A deterministic problem related to statistical changes of time.

General setup:

You have two functions \(\phi,\psi\), which we will assume for now are both \(\mathbb{R}^+\rightarrow \mathbb{R}\) and you wish to align them with respect to some pointwise loss, \(\ell\) and some class of permissable class \(\mathcal{F}\) of warping functions \(\mathbb{R}^+\rightarrow\mathbb{R}^+\), and you wish to do this over some interval \(D\).

This means that you wish to find

\[ \argmin_{f\in \mathcal{F}} \int_D(\ell(\phi(t), \psi(f(t))))dt \]

## Affine warps of a real function

Image/sound alignment problems are an interesting special case where we usually want affine warps, i.e.Â \(f(t)=at+b) for some constants \(a,b.\) Here the problem is mathematically not so complex, but in practice it can become computationally intractable; In particular, letâs say your function is interpolated from data, i.e.Â we observe \(\phi(k), k=1,2,3\dots,N,\) and \(\phi(k), k=1,2,3\dots,N.\) and construct a nice interpolant \(\hat{\phi}\) Leaving aside statistical questions about how accurate this is, which is a whole other story, how computationally feasible is this to check for alignment with some other ?

Letâs say we construct our interpolant from some mathematically justified basis, such as the sinc basis:

\[ \hat{\phi(t)}=\sum_{k=1}^N \sinc (t-k)\phi(k) \]

So thatâs nicely linear and trivial to calculate. The loss integral then is

\[ \begin{aligned} \int_D\ell(\hat{\phi}(t), \hat{\psi}(f(t)))dt = \int_D\ell(\sum_{k=1}^N \sinc (t-k)\phi(k), \sum_{k=1}^N \sinc (f(t-k))\psi(k), ) \end{aligned} \]

We might try to approximate this integral by sampling it at some lattice or other subset, or we might have analytic forms for the integrals of the terms or whatever.

Either way, though, this basis expansion is expensive, requiring \(\mathcal{O}(N^2)\) evaluations, which is inconvenient.

If weâve smoothed our data somewhat and have a spline basis, this can be better, although then we still have a smoothing problem to solve. As an irritating technical detail, `FITPACK`

â which is the Fortan spline library which seems to be everywhere â has, AFAICT, no function to take the product of two splines or to do an affine transform of the coordinate, or take derivatives with respect to a coordinate transform etc, so despite the fact that everything is simple in principle, so you have to do an enormous amount of book-keeping to do this.

## General linear transforms

Say you are in 2d, or 3d and now you wish to register two objects by finding an optimal linear transformâŠ

## Smooth nonlinear transforms

Dynamic Time Warping. (Does it work in multiple axes?)

To compute DTW, one typically solves a minimal-cost alignment problem between two time series using dynamic programming. Our work takes advantage of a smoothed formulation of DTW, called soft-DTW, that computes the soft-minimum of all alignment costs. We show in this paper that soft-DTW is a differentiable loss function, and that both its value and gradient can be computed with quadratic time/space complexity (DTW has quadratic time but linear space complexity)

đ§

## Other warps

Does the CTC warp-robust loss function (GFGS06) fit here? I think itâs for discretely-indexed language data, which is not quite the kind of continuous form Iâm worried about.

CTC:

Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels. For example, CTC can be used to train end-to-end systems for speech recognition, which is how we have been using it at Baiduâs Silicon Valley AI Lab.

## Warping of point processes

A special interest of mine, at the intersection of functional data analysis, point processes and warping.

Though the study of multiple realisations of point processes has been considered prior to the emergence of FDA (see, e.g., Karr [22]), treating realisations of point processes as individual data objects within a functional data analysis context is a more recent development offering important advantages; a key paper is that of WuMZ13 (also see Chiou and MuÌller [10] and ChWH05). Such data may be an object of interest in themselves (see, e.g., WuMZ13, ArMĂŒ14, WuSr12) but may also arise as landmark data in an otherwise classical functional data analysis (see, e.g., GaKn95, ArMĂŒ14). The recent surge of interest is exemplified in an upcoming discussion paper by WuSr14, whose discussion documents early progress and challenges in the field.

# Refs

Amodei, Dario, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, et al. 2015. âDeep Speech 2: End-to-End Speech Recognition in English and Mandarin,â December. http://arxiv.org/abs/1512.02595.

Arribas-Gil, Ana, and Hans-Georg MĂŒller. 2014. âPairwise Dynamic Time Warping for Event Data.â *Computational Statistics & Data Analysis* 69 (January): 255â68. https://doi.org/10.1016/j.csda.2013.08.011.

Arribas-Gil, Ana, and Juan Romo. 2012. âRobust Depth-Based Estimation in the Time Warping Model.â *Biostatistics (Oxford, England)* 13 (3): 398â414. https://doi.org/10.1093/biostatistics/kxr037.

Brown, ERVRL, Riccardo Barbieri, ValĂ©rie Ventura, R. Kass, and L. Frank. 2002. âThe Time-Rescaling Theorem and Its Application to Neural Spike Train Data Analysis.â *Neural Computation* 14 (2): 325â46. https://doi.org/10.1162/08997660252741149.

Caramiaux, Baptiste, Nicola Montecchio, Atau Tanaka, and FrĂ©dĂ©ric Bevilacqua. 2014. âAdaptive Gesture Recognition with Variation Estimation for Interactive Systems.â *ACM Trans. Interact. Intell. Syst.* 4 (4): 18:1â18:34. https://doi.org/10.1145/2643204.

Cuturi, Marco, and Mathieu Blondel. 2017. âSoft-DTW: A Differentiable Loss Function for Time-Series.â In *PMLR*, 894â903. http://proceedings.mlr.press/v70/cuturi17a.html.

Gillian, Nicholas, Benjamin Knapp, and Sile OâModhrain. 2011. âRecognition of Multivariate Temporal Musical Gestures Using N-Dimensional Dynamic Time Warping.â In. http://www.nime.org/proceedings/2011/nime2011_337.pdf.

Graves, Alex, Santiago FernĂĄndez, Faustino Gomez, and JĂŒrgen Schmidhuber. 2006. âConnectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks.â In *Proceedings of the 23rd International Conference on Machine Learning*, 369â76. ICML â06. New York, NY, USA: ACM. https://doi.org/10.1145/1143844.1143891.

James, Gareth M. 2007. âCurve Alignment by Moments.â *The Annals of Applied Statistics* 1 (2): 480â501. https://doi.org/10.1214/07-AOAS127.

Panaretos, Victor M., and Yoav Zemel. 2016. âSeparation of Amplitude and Phase Variation in Point Processes.â *The Annals of Statistics* 44 (2): 771â812. https://doi.org/10.1214/15-AOS1387.

Wu, Shuang, Hans-Georg MĂŒller, and Zhen Zhang. 2013. âFunctional Data Analysis for Point Processes with Rare Events.â *Statistica Sinica* 23 (1): 1â23. http://www3.stat.sinica.edu.tw/sstest/oldpdf/A23n11.pdf.

Wu, Wei, and Anuj Srivastava. 2014. âAnalysis of Spike Train Data: Alignment and Comparisons Using the Extended Fisher-Rao Metric.â *Electronic Journal of Statistics* 8 (2): 1776â85. https://doi.org/10.1214/14-EJS865B.

âââ. 2012. âEstimating Summary Statistics in the Spike-Train Space.â *Journal of Computational Neuroscience* 34 (3): 391â410. https://doi.org/10.1007/s10827-012-0427-3.