A particular optimisation method for statistics for that gets you a maximum likelihood estimate despite various annoyances such as missing data.
Vague description of the algorithm:
We have an experimental process that generates a random vector \theta). We wish to estimate the parameter of interest by maximum likelihood. However, we only observe i.i.d. samples drawn from . The likelihood function of the incomplete data is tedious or intractable to maximise. But the “complete” joint likelihood of both the observed and unobserved components, , is easier to maximise. Then we are potentially in a situation where expectation maximisation can help.
Call the estimate of of at step . Write because we virtually always work in log likelihoods and especially here.
The following form of the algorithm works when the loglikelihood b). (Which is equivalent to it being in a exponential family I believe, but should check.)
At time we start with an estimate of chosen arbitrarily or by our favourite approximate method.
We attempt to improve our estimate of the parameter of interest by the following iterative algorithm:

“Expectation”: Under the completed data model with joint distribution we estimate as

“Maximisation”: Solve a (hopefully easier) maximisation problem:
In the case that this log likelihood is not linear in , you are supposed to instead take
In practice I have seen this latter nicety ignored, apparently without ill effect.
Even if you do the right thing, EM may not converge especially well, or to the global maximum, but damn it can be easy and robust to get started with, and at least it doesn't make things worse.
Literature note – apparently the proofs in Dempster and Laird (1977) are dicey; See the Wu (1983) paper for improved (i.e. correct) versions.

A Transparent Interpretation of the EM Algorithm by James Coughlan has a short point
We write data , latent variable , parameter of interest . Then…
[…]maximizing Neal and Hinton's joint function of and a distribution on is equivalent to maximum likelihood estimation.
The key point is to note that maximizing over is equivalent to maximizing
jointly over and . […]
[…We rewrite this cost function]
where is the entropy of . This expression is in turn equivalent to
which is the same as the function given in Neal and Hinton. This function is maximized $1 iteratively, where each iteration consists of two separate maximizations, one over and another

Dan Piponi, ExpectationMaximization with Less Arbitrariness
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary.
Refs
 CCFM01: Gilles Celeux, Stephane Chretien, Florence Forbes, Abdallah Mkhadri (2001) A ComponentWise EM Algorithm for Mixtures. Journal of Computational and Graphical Statistics, 10(4), 697–712. DOI
 Bilm98: Jeff A. Bilmes (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 4(510), 126.
 Navi97: William Navidi (1997) A Graphical Illustration of the EM Algorithm. The American Statistician, 51(1), 29–31. DOI
 WeTa90: Greg C. G. Wei, Martin A. Tanner (1990) A Monte Carlo Implementation of the EM Algorithm and the Poor Man’s Data Augmentation Algorithms. Journal of the American Statistical Association, 85(411), 699–704. DOI
 Pres04: Detlef Prescher (2004) A Tutorial on the ExpectationMaximization Algorithm Including MaximumLikelihood Estimation and EM Training of Probabilistic ContextFree Grammars. ArXiv:Cs/0412015.
 NeHi98: Radford M. Neal, Geoffrey E. Hinton (1998) A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants. In Learning in Graphical Models (pp. 355–368). Springer Netherlands
 DeLM99: Bernard Delyon, Marc Lavielle, Eric Moulines (1999) Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics, 27(1), 94–128. DOI
 KuLa04: Estelle Kuhn, Marc Lavielle (2004) Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: Probability and Statistics, 8, 115–131. DOI
 Roch11: Alexis Roche (2011) EM algorithm and variants: an informal tutorial. ArXiv:1105.1476 [Stat].
 LeSc12: Gyemin Lee, Clayton Scott (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Computational Statistics & Data Analysis, 56(9), 2816–2829. DOI
 CeFP03: Gilles Celeux, Florence Forbes, Nathalie Peyrard (2003) EM procedures using mean fieldlike approximations for Markov modelbased image segmentation. Pattern Recognition, 36(1), 131–144. DOI
 DeLR77: A. P. Dempster, N. M. Laird, D. B. Rubin (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
 CeCD95: Gilles Celeux, Didier Chauveau, Jean Diebolt (1995) On Stochastic Versions of the EM Algorithm (report)
 Wu83: C. F. Jeff Wu (1983) On the Convergence Properties of the EM Algorithm. The Annals of Statistics, 11(1), 95–103. DOI
 MiTS16: Hideyuki Miyahara, Koji Tsumura, Yuki Sughiyama (2016) Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models. In arXiv:1701.03268 [condmat, physics:quantph, stat] (pp. 4674–4679). DOI
 McKr08: Geoffrey J McLachlan, T Krishnan (2008) The EM algorithm and extensions. Hoboken, N.J.: WileyInterscience
 McKN04: Geoffrey J. McLachlan, Thriyambakam Krishnan, See Ket Ng (2004) The EM Algorithm (No. 2004,24). HumboldtUniversität Berlin, Center for Applied Statistics and Economics (CASE)