(Weighted) least squares fits

September 22, 2016 — October 4, 2022

Hilbert space

linear algebra

optimization

regression

statistics

A classic. Surprisingly deep.

A few non-comprehensive notes to approximating functions from data by the arbitrary-but-convenient expedient of minimising the sum of the squares of the deviances between two things; The linear algebra of least squares fits seems well-trodden and perenially classic. Used in many many problems. e.g. lasso regression, Gaussian belief propagation. Least squares problems are a kind of platonic ideal of convex optimisation.

Francis Bach is interested in least squares relaxations, which he connects to Fourier domain. See his Sums-of-squares for dummies: a view from the Fourier domain.

1 Introduction

the Ceres Solver Bibliography is a good start.
Boyd and Vandenberghe’s Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares is a solid introduction to both linear algebra and least squares. Julia Companion. Python Companion.

2 Iteratively reweighted

3 Nonlinear least squares

Trust region and Levenberg-Marquardt methods in 2nd order optimisation.

4 Tools

Nonlinear least squares with ceres-solver:

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems. It is a mature, feature rich, and performant library that has been used in production at Google since 2010.
gradslam gradslam/gradslam: gradslam is an open source differentiable dense SLAM library for PyTorch (Jatavallabhula, Iyer, and Paull 2020)
angshine/pyslam: Non-linear least-squares SLAM in Python using scipy and numpy. Modelled after Google’s Ceres solver. Supporting automatic differentiation with pytorch.

OpenSLAM.org

5 Jaxopt

jax toolkit JAXopt includes lots of neatg Nonlinear least squares tooling.

6 KeOps

The KeOps library lets you compute reductions of large arrays whose entries are given by a mathematical formula or a neural network. It combines efficient C++ routines with an automatic differentiation engine and can be used with Python (NumPy, PyTorch), Matlab and R.

It is perfectly suited to the computation of kernel matrix-vector products, K-nearest neighbors queries, N-body interactions, point cloud convolutions and the associated gradients. Crucially, it performs well even when the corresponding kernel or distance matrices do not fit into the RAM or GPU memory. Compared with a PyTorch GPU baseline, KeOps provides a x10-x100 speed-up on a wide range of geometric applications, from kernel methods to geometric deep learning.

7 Incoming

The Gauss–Markov Theorem “under certain conditions, the least squares estimator is the minimum-variance linear unbiased estimator of the model parameters.”
Weighted Least Squares

8 References

Bagge Carlson. 2018. “Machine Learning and System Identification for Estimation in Physical Systems.”

Bellec, Lecué, and Tsybakov. 2017. “Towards the Study of Least Squares Estimators with Convex Penalty.” arXiv:1701.09120 [Math, Stat].

Bhardwaj, Klep, and Magron. 2021. “Noncommutative Polynomial Optimization.”

Boyd, and Vandenberghe. 2021. Julia Companion to Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares.

Buterin, Hitzig, and Weyl. 2019. “A Flexible Design for Funding Public Goods.” Management Science.

Charlier, Feydy, Glaunès, et al. 2021. “Kernel Operations on the GPU, with Autodiff, Without Memory Overflows.” Journal of Machine Learning Research.

Chartrand, and Yin. 2008. “Iteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008.

Chatla, and Shmueli. 2016. “Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM.” arXiv:1610.08244 [Stat].

Chen, Xiaojun, Ge, Wang, et al. 2012. “Complexity of Unconstrained L_2-L_p.” Mathematical Programming.

Chen, Yan, and Oliver. 2013. “Levenberg–Marquardt Forms of the Iterative Ensemble Smoother for Efficient History Matching and Uncertainty Quantification.” Computational Geosciences.

Flammarion. n.d. “Stochastic Approximation and Least-Squares Regression, with Applications to Machine Learning.”

Flammarion, and Bach. 2017. “Stochastic Composite Least-Squares Regression with Convergence Rate O(1/n).” arXiv:1702.06429 [Math, Stat].

Friedman, Jerome H. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis, Nonlinear Methods and Data Mining,.

Friedman, Jerome, Hastie, Höfling, et al. 2007. “Pathwise Coordinate Optimization.” The Annals of Applied Statistics.

Friedman, Jerome, Hastie, and Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software.

Gasso, Rakotomamonjy, and Canu. 2009. “Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing.

Golub, and Meurant. 2010. Matrices, Moments and Quadrature with Applications.

Györfi, ed. 2002. A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics.

Huang, Huang, and Sun. 2021. “DeepLM: Large-Scale Nonlinear Least Squares on Deep Learning Frameworks Using Stochastic Domain Decomposition.” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Jatavallabhula, Iyer, and Paull. 2020. “∇SLAM: Dense SLAM Meets Automatic Differentiation.” In 2020 IEEE International Conference on Robotics and Automation (ICRA).

Karampatziakis, and Langford. 2010. “Online Importance Weight Aware Updates.” arXiv:1011.1576 [Cs].

Leung, and Matsypura. 2019. Python Language Companion to Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares.

Madsen, Nielsen, and Tingleff. 2004. “Methods for Non-Linear Least Squares Problems.”

Mahoney. 2010. Randomized Algorithms for Matrices and Data.

Orr. 1996. “Introduction to Radial Basis Function Networks.”

Portnoy, and Koenker. 1997. “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science.

Rhee, and Glynn. 2015. “Unbiased Estimation with Square Root Convergence for SDE Models.” Operations Research.

Rosset, and Zhu. 2007. “Piecewise Linear Regularized Solution Paths.” The Annals of Statistics.

Transtrum, Machta, and Sethna. 2011. “The Geometry of Nonlinear Least Squares with Applications to Sloppy Models and Optimization.” Physical Review E.

Yun, and Toh. 2009. “A Coordinate Gradient Descent Method for ℓ 1-Regularized Convex Minimization.” Computational Optimization and Applications.