The classic, surprisingly deep.
A few non-comprehensive notes here.
As used in, e.g. lasso regression.
Nonlinear least squares with ceres-solver:
Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems. It is a mature, feature rich, and performant library that has been used in production at Google since 2010.
- Ricardo Carvalho, Adaptive Lasso: What it is and how to implement in R
- How to correctly implement iteratively reweighted least squares algorithm for multiple logistic regression?
- YuTo09: (2009) A coordinate gradient descent method for ℓ 1-regularized convex minimization. Computational Optimization and Applications, 48(2), 273–307. DOI
- Röve11: (2011) A Student-t based filter for robust signal detection. Physical Review D, 84(12). DOI
- CGWY12: (2012) Complexity of unconstrained L_2-L_p. Mathematical Programming, 143(1–2), 371–383. DOI
- Orr96: (1996) Introduction to radial basis function networks. Technical Report, Center for Cognitive Science, University of Edinburgh
- ChYi08: (2008) Iteratively reweighted algorithms for compressive sensing. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3869–3872). DOI
- MaNT04: (2004) Methods for non-linear least squares problems
- ChSh16: (2016) Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM. ArXiv:1610.08244 [Stat].
- KaLa10: (2010) Online Importance Weight Aware Updates. ArXiv:1011.1576 [Cs].
- FHHT07: (2007) Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302–332. DOI
- RoZh07: (2007) Piecewise linear regularized solution paths. The Annals of Statistics, 35(3), 1012–1030. DOI
- GaRC09: (2009) Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming. IEEE Transactions on Signal Processing, 57(12), 4686–4698. DOI
- FrHT10: (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI
- FlBa17: (2017) Stochastic Composite Least-Squares Regression with convergence rate O(1/n). ArXiv:1702.06429 [Math, Stat].
- Frie02: (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. DOI
- PoKo97: (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Statistical Science, 12(4), 279–300. DOI
- BeLT17: (2017) Towards the study of least squares estimators with convex penalty. ArXiv:1701.09120 [Math, Stat].
- RhGl15: (2015) Unbiased Estimation with Square Root Convergence for SDE Models. Operations Research, 63(5), 1026–1043. DOI