Estimating a quantity by choosing it to be the extremum of a function, or, if it's well-behaved enough, a zero of its derivative.
Very popular with machine learning, where loss-function based methods are ubiquitous. In statistics we see this implicitly in maximum likelihood estimation and robust estimation, and least squares loss, for which M-estimation provides a unifying formalism based on asymptotic theory.
TODO: Discuss large sample theory influence function motivation.
Robust Loss functions
Discuss representation (and implementation) in terms of weight functions for least-squares loss.
Mallows, Schweppe etc.