# Optimal transport distances

### Wasserstein distances, Monge-Kantorovich metrics, Earthmover distances

I presume there are other uses for optimal transport distances apart from as probability metrics, but so far I only care about them in that context, so this will be skewed that way.

Let $$(M,d)$$ be a metric space for which every probability measure on $$M$$ is a Radon measure. For $$p\ge 1$$, let $$\mathcal{P}_p(M)$$ denote the collection of all probability measures $$P$$ on $$M$$ with finite $$p^{\text{th}}$$ moment for some $$x_0$$ in $$M$$, $\int_{M} d(x, x_{0})^{p} \, \mathrm{d} P (x) < +\infty.$

Then the $$p^{\text{th}}$$ Wasserstein distance between two probability measures $$P$$ and $$Q$$ in $$\mathcal{P}_p(M)$$ is defined as $W_{p} ( P , Q ):= \left( \inf_{\gamma \in \Pi ( P , Q )} \int_{M \times M} d(x, y)^{p} \, \mathrm{d} \gamma (x, y) \right)^{1/p},$

where $$\Pi( P , Q )$$ denotes the collection of all measures on $$M \times M$$ with marginal distributions $$P$$ and $$Q$$ respectively.

Practically, one usually sees $$p\in\{1,2\}$$. For $$p=1$$ one uses $W_1(P,Q)=\inf_{\gamma \in \Pi( P , Q )}\mathbb{E}_{(x,y)\sim \gamma}\left[d(x,y)\right]$

This is frequently intractable, or at least has no closed form, but you can find it for some useful special cases, or bound/approximate it in others.

TODO: discuss favourable properties of this metric (triangle inequality, bounds on moments etc).

Useful: Two Gaussians may be related thusly for a Wasserstein-2 $$W_2(\mu;\nu):=\inf\mathbb{E}(\Vert X-Y\Vert_2^2)^{1/2}$$ for $$X\sim\nu$$, $$Y\sim\mu$$.

\begin{aligned} d&:= W_2(\mathcal{N}(m_1,\Sigma_1);\mathcal{N}(m_2,\Sigma_2))\\ \Rightarrow d^2&= \Vert m_1-m_2\Vert_2^2 + \mathrm{Tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \end{aligned} (ref)

But why do you are about such an intractable distance? Because it bounds the errors from approximate distributions.

We know that if $$W_p(\nu\hat{nu}) \leq \epsilon$$, then for any L-Lipschitz function $$\phi$$, $$|\nu(\phi) - \hat{\nu}(\phi)| \leq L\epsilon.$$ See [@HugginsPractical2018] for some specifics.

## Kontorovich-Rubinstein duality

Vincent Hermann gives an excellent practical introduction.

## “Neural Net distance”

Wasserstein distance with a baked in notion of the capacity of the function class which approximate the true Wasserstein. (Arjovsky et al) Is this actually used?

## Fisher distance

Specifically $$(p,\nu)$$-Fisher distances, in the terminology of [@HugginsPractical2018]. They use these distances as a computationally tractable proxy for Wasserstein distances. See Fisher distances.