The classic, surprisingly deep.
A few noncomprehensive notes here.
As used in, e.g. lasso regression.

Nonlinear least squares with ceressolver:
Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Nonlinear Least Squares problems with bounds constraints and general unconstrained optimization problems. It is a mature, feature rich, and performant library that has been used in production at Google since 2010.

Minimal python Iteratively reweighted least squares by A.E. Haynes
 Ricardo Carvalho, Adaptive Lasso: What it is and how to implement in R
 How to correctly implement iteratively reweighted least squares algorithm for multiple logistic regression?
Refs
 YuTo09: (2009) A coordinate gradient descent method for ℓ 1regularized convex minimization. Computational Optimization and Applications, 48(2), 273–307. DOI
 Röve11: (2011) A Studentt based filter for robust signal detection. Physical Review D, 84(12). DOI
 CGWY12: (2012) Complexity of unconstrained L_2L_p. Mathematical Programming, 143(1–2), 371–383. DOI
 Orr96: (1996) Introduction to radial basis function networks. Technical Report, Center for Cognitive Science, University of Edinburgh
 ChYi08: (2008) Iteratively reweighted algorithms for compressive sensing. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3869–3872). DOI
 MaNT04: (2004) Methods for nonlinear least squares problems
 ChSh16: (2016) Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM. ArXiv:1610.08244 [Stat].
 KaLa10: (2010) Online Importance Weight Aware Updates. ArXiv:1011.1576 [Cs].
 FHHT07: (2007) Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302–332. DOI
 RoZh07: (2007) Piecewise linear regularized solution paths. The Annals of Statistics, 35(3), 1012–1030. DOI
 GaRC09: (2009) Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming. IEEE Transactions on Signal Processing, 57(12), 4686–4698. DOI
 FrHT10: (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI
 FlBa17: (2017) Stochastic Composite LeastSquares Regression with convergence rate O(1/n). ArXiv:1702.06429 [Math, Stat].
 Frie02: (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. DOI
 PoKo97: (1997) The Gaussian hare and the Laplacian tortoise: computability of squarederror versus absoluteerror estimators. Statistical Science, 12(4), 279–300. DOI
 BeLT17: (2017) Towards the study of least squares estimators with convex penalty. ArXiv:1701.09120 [Math, Stat].
 RhGl15: (2015) Unbiased Estimation with Square Root Convergence for SDE Models. Operations Research, 63(5), 1026–1043. DOI