Mirror descent

December 29, 2019 — August 29, 2023

Bregman

functional analysis

optimization

statmech

First order gradient descent that can “adapt to the problem structure”. Q: how does it relate to Natural gradient descent? For that has a similar description.

Beck and Teboulle (2003):

The mirror descent algorithm (MDA) was introduced by Nemirovsky and Yudin for solving convex optimization problems. This method exhibits an efficiency estimate that is mildly dependent in the decision variables dimension, and thus suitable for solving very large scale optimization problems. We present a new derivation and analysis of this algorithm. We show that the MDA can be viewed as a nonlinear projected-subgradient type method, derived from using a general distance-like function instead of the usual Euclidean squared distance. Within this interpretation, we derive in a simple way convergence and efficiency estimates. We then propose an Entropic mirror descent algorithm for convex minimization over the unit simplex, with a global efficiency estimate proven to be mildly dependent in the dimension of the problem.

Bubeck’s lectures are good: Bubeck (2019).

T Lienart, Mirror descent algorithm.

1 Incoming

2 References

Bansal, and Gupta. 2019. “Potential-Function Proofs for First-Order Methods.”

Beck, and Teboulle. 2003. “Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization.” Operations Research Letters.

Bubeck. 2015. Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning.

———. 2019. The Five Miracles of Mirror Descent.

Jacobsen, and Cutkosky. 2022. “Parameter-Free Mirror Descent.”

Lee, Panageas, Piliouras, et al. 2017. “First-Order Methods Almost Always Avoid Saddle Points.” arXiv:1710.07406 [Cs, Math, Stat].

Wibisono, and Wilson. 2015. “On Accelerated Methods in Optimization.” arXiv:1509.03616 [Math].

Zhang. 2013. “Bregman Divergence and Mirror Descent.”