The classic, surprisingly deep.

A few non-comprehensive notes here.

As used in, e.g. lasso regression.

Nonlinear least squares with ceres-solver:

Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems. It is a mature, feature rich, and performant library that has been used in production at Google since 2010.

- Minimal python Iteratively reweighted least squares by A.E. Haynes
- Ricardo Carvalho, Adaptive Lasso: What it is and how to implement in R

## Refs

- YuTo09: (2009) A coordinate gradient descent method for ℓ 1-regularized convex minimization.
*Computational Optimization and Applications*, 48(2), 273–307. DOI - Röve11: (2011) A Student-t based filter for robust signal detection.
*Physical Review D*, 84(12). DOI - CGWY12: (2012) Complexity of unconstrained L_2-L_p.
*Mathematical Programming*, 143(1–2), 371–383. DOI - Orr96: (1996) Introduction to radial basis function networks. Technical Report, Center for Cognitive Science, University of Edinburgh
- ChYi08: (2008) Iteratively reweighted algorithms for compressive sensing. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3869–3872). DOI
- MaNT04: (2004) Methods for non-linear least squares problems
- ChSh16: (2016) Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM.
*ArXiv:1610.08244 [Stat]*. - KaLa10: (2010) Online Importance Weight Aware Updates.
*ArXiv:1011.1576 [Cs]*. - FHHT07: (2007) Pathwise coordinate optimization.
*The Annals of Applied Statistics*, 1(2), 302–332. DOI - RoZh07: (2007) Piecewise linear regularized solution paths.
*The Annals of Statistics*, 35(3), 1012–1030. DOI - GaRC09: (2009) Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming.
*IEEE Transactions on Signal Processing*, 57(12), 4686–4698. DOI - FrHT10: (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent.
*Journal of Statistical Software*, 33(1), 1–22. DOI - FlBa17: (2017) Stochastic Composite Least-Squares Regression with convergence rate O(1/n).
*ArXiv:1702.06429 [Math, Stat]*. - Frie02: (2002) Stochastic gradient boosting.
*Computational Statistics & Data Analysis*, 38(4), 367–378. DOI - PoKo97: (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators.
*Statistical Science*, 12(4), 279–300. DOI - BeLT17: (2017) Towards the study of least squares estimators with convex penalty.
*ArXiv:1701.09120 [Math, Stat]*. - RhGl15: (2015) Unbiased Estimation with Square Root Convergence for SDE Models.
*Operations Research*, 63(5), 1026–1043. DOI