Penalised regression where the penalties are sparsifying. The prediction losses could be anything – likelihood, least-squares, robust Huberised losses, absolute deviation etc.

I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a smoothing parameters, controlling how many factors we still have to consider, from a subset of the original.

I will usually discuss our intent to minimise prediction error, but one could also try to minimise model selection error too.

Then we have a simultaneous estimation and model selection procedure, probably a specific sparse model selection procedure and we possibly have to choose clever optimisation method to do the whole thing fast. Related to compressed sensing, but here we consider sampling complexity and measurement error.

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

TODO: make comprehensible

TODO: examples

TODO: disambiguate the optimisation technologies at play – iteratively reweighted least squares etc.

Now! A set of headings under which I will try to understand some things, mostly the LASSO variants.

## LASSO

Quadratic loss penalty, absolute coefficient penalty.

The original, and the one actually included in all the statistics software.

This is not a *model*, per se, but a fitting procedure.
We can, however, try to see what models and designs this procedure happens to
work with.

## Adaptive LASSO

TBD. This is the one with famous oracle properties.
Hsi Zou's paper on this (Zou06) is very readable.
I am having trouble digesting Sara van de Geer's paper (Geer08) on
the Generalised Lasso, but it *seems* to offer me guarantees
for something *very similar* to the Adaptive Lasso,
but with far more general assumptions on the model and loss functions,
and some finite sample guarnatees.

## LARS

A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with a pencil and paper.

## Graph LASSO

As used in graphical models. TBD

## Elastic net

Combination of and penalties. TBD.

## Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See Yuan and Lin (YuLi06).

## Model selection

Can be fiddly with sparse regression, which couples variable selection tightly with parameter estimation. See sparse model selection.

## Debiassed LASSO

See GBRD14 and Geer14c. Same as Geer08, or not?

## Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

## Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing. (WCLH16)

## SCAD

“Smoothly Clipped Absolute Deviation”. TBD.

## More general nonconvex coefficient penalties

TBD

## Other prediction losses

This should work for heavy-tailed noise. MAD prediction penalty, lasso-coefficient penalty. What do you call this even?

See PoKo97 and WaLJ07 for some implementations using maximum prediction error. I believe the Dantzig selector uses the infinity-norm. Robust prediction losses? How do they relate?

## Bayesian Lasso

Laplace priors on linear regression coefficients, includes normal lasso as a MAP estimate. Not particularly sparse. I have not need for this right now, but I did read Dan Simpson's critique

## Implementations

Hastie, Friedman eta's glmnet for R is fast and well-regarded, and has a MATLAB version. Here's how to use it for adaptive lasso.

SPAMS (C++, MATLAB, R, python) by Mairal, looks interesting. It's an optimisation library for many, many sparse problems.

liblinear also include lasso-type solvers, as well as support-vector regression.

## Tidbits

Sparse regression as a universal classifier explainer?
*Local Interpretable Model-agnostic Explanations* might do it (RiSG16).

## Refs

- YuTo09: Sangwoon Yun, Kim-Chuan Toh (2009) A coordinate gradient descent method for ℓ 1-regularized convex minimization.
*Computational Optimization and Applications*, 48(2), 273–307. DOI - Tibs14: Ryan J. Tibshirani (2014) A General Framework for Fast Stagewise Algorithms.
*ArXiv:1408.5801 [Stat]*. - HuPC14: Tao Hu, Cengiz Pehlevan, Dmitri B. Chklovskii (2014) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. In 2014 48th Asilomar Conference on Signals, Systems and Computers. DOI
- SoCh17: Yong Sheng Soh, Venkat Chandrasekaran (2017) A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.
*ArXiv:1701.01207 [Cs, Math, Stat]*. - HeIS15: Chinmay Hegde, Piotr Indyk, Ludwig Schmidt (2015) A nearly-linear time framework for graph-structured sparsity. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 928–937).
- ChLW14: Michaël Chichignoud, Johannes Lederer, Martin Wainwright (2014) A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees.
*ArXiv:1410.0247 [Math, Stat]*. - LTTT14: Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani (2014) A significance test for the lasso.
*The Annals of Statistics*, 42(2), 413–468. DOI - UTAK14: M. Unser, P. D. Tafti, A. Amini, H. Kirshner (2014) A unified formulation of Gaussian vs sparse stochastic processes - Part II: Discrete-domain theory.
*IEEE Transactions on Information Theory*, 60(5), 3036–3051. DOI - UnTS14: M. Unser, P. D. Tafti, Q. Sun (2014) A unified formulation of Gaussian vs sparse stochastic processes—part i: continuous-domain theory.
*IEEE Transactions on Information Theory*, 60(3), 1945–1962. DOI - YaXu13: Wenzhuo Yang, Huan Xu (2013) A Unified Robust Regression Model for Lasso-like Algorithms. In ICML (3) (pp. 585–593).
- Giro01: Mark Girolami (2001) A Variational Method for Learning Sparse and Overcomplete Representations.
*Neural Computation*, 13(11), 2517–2532. DOI - GhLa13a: Saeed Ghadimi, Guanghui Lan (2013a) Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming.
*ArXiv:1310.3787 [Math]*. - ABDJ06: Felix Abramovich, Yoav Benjamini, David L. Donoho, Iain M. Johnstone (2006) Adapting to unknown sparsity by controlling the false discovery rate.
*The Annals of Statistics*, 34(2), 584–653. DOI - ReSc10: Patricia Reynaud-Bouret, Sophie Schbath (2010) Adaptive estimation for Hawkes processes; application to genome analysis.
*The Annals of Statistics*, 38(5), 2781–2822. DOI - Reyn03: Patricia Reynaud-Bouret (2003) Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities.
*Probability Theory and Related Fields*, 126(1). DOI - GuFZ14: Jiaying Gu, Fei Fu, Qing Zhou (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data.
*ArXiv:1403.2310 [Stat]*. - BüGe11: Peter Bühlmann, Sara van de Geer (2011) Additive models and many smooth univariate functions. In Statistics for High-Dimensional Data (pp. 77–97). Springer Berlin Heidelberg DOI
- BCCZ14: Christian Borgs, Jennifer T. Chayes, Henry Cohn, Yufei Zhao (2014) An theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions.
*ArXiv:1401.2906 [Math]*. - UnTa14: Michael A. Unser, Pouya Tafti (2014)
*An introduction to sparse stochastic processes*. New York: Cambridge University Press - Jung13: Alexander Jung (2013) An RKHS Approach to Estimation with Sparsity Constraints. In Advances in Neural Information Processing Systems 29.
- HaKD13: S. Hawe, M. Kleinsteuber, K. Diepold (2013) Analysis operator learning and its application to image reconstruction.
*IEEE Transactions on Image Processing*, 22(6), 2138–2150. DOI - BCDD08: Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, Ronald A. DeVore (2008) Approximation and learning by greedy algorithms.
*The Annals of Statistics*, 36(1), 64–94. DOI - DiFr84: Persi Diaconis, David Freedman (1984) Asymptotics of Graphical Projection Pursuit.
*The Annals of Statistics*, 12(3), 793–815. - BaSB10: Dror Baron, Shriram Sarvotham, Richard G. Baraniuk (2010) Bayesian compressive sensing via belief propagation.
*IEEE Transactions on Signal Processing*, 58(1), 269–280. DOI - YoWe10: Ryo Yoshida, Mike West (2010) Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.
*Journal of Machine Learning Research*, 11(May), 1771–1798. - Brei95: Leo Breiman (1995) Better subset regression using the nonnegative garrote.
*Technometrics*, 37(4), 373–384. - BeTs16: Pierre C. Bellec, Alexandre B. Tsybakov (2016) Bounds on the prediction error of penalized least squares estimators with convex penalty.
*ArXiv:1609.06675 [Math, Stat]*. - MüGe15: Patric Müller, Sara van de Geer (2015) Censored linear model in high dimensions: Penalised linear regression on high-dimensional data with left-censored response variable.
*TEST*. DOI - BiCY14: Wei Bian, Xiaojun Chen, Yinyu Ye (2014) Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization.
*Mathematical Programming*, 149(1–2), 301–327. DOI - WoCh15: Joseph Woodworth, Rick Chartrand (2015) Compressed Sensing Recovery via Nonconvex Shrinkage Penalties.
*ArXiv:1504.02923 [Cs, Math]*. - CSPW10: Minhua Chen, J. Silva, J. Paisley, Chunping Wang, D. Dunson, L. Carin (2010) Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.
*IEEE Transactions on Signal Processing*, 58(12), 6140–6155. DOI - Carm14: Avishy Y. Carmi (2014) Compressive System Identification. In Compressed Sensing & Sparse Filtering (pp. 281–324). Springer Berlin Heidelberg DOI
- Carm13: Avishy Y. Carmi (2013) Compressive system identification: sequential methods and entropy bounds.
*Digital Signal Processing*, 23(3), 751–770. DOI - TrWr10: J. A. Tropp, S. J. Wright (2010) Computational Methods for Sparse Solution of Linear Inverse Problems.
*Proceedings of the IEEE*, 98(6), 948–958. DOI - DuBS17: Simon S. Du, Sivaraman Balakrishnan, Aarti Singh (2017) Computationally Efficient Robust Estimation of Sparse Functionals. In ICML.
- JaMo14: Adel Javanmard, Andrea Montanari (2014) Confidence Intervals and Hypothesis Testing for High-dimensional Regression.
*Journal of Machine Learning Research*, 15(1), 2869–2909. - ZhZh14: Cun-Hui Zhang, Stephanie S. Zhang (2014) Confidence intervals for low dimensional parameters in high dimensional linear models.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 76(1), 217–242. DOI - EwSc15: Karl Ewald, Ulrike Schneider (2015) Confidence Sets Based on the Lasso Estimator.
*ArXiv:1507.05315 [Math, Stat]*. - NiGe13: Richard Nickl, Sara van de Geer (2013) Confidence sets in sparse regression.
*The Annals of Statistics*, 41(6), 2852–2876. DOI - BaCa15: Rina Foygel Barber, Emmanuel J. Candès (2015) Controlling the false discovery rate via knockoffs.
*The Annals of Statistics*, 43(5), 2055–2085. DOI - WuLa08: Tong Tong Wu, Kenneth Lange (2008) Coordinate descent algorithms for lasso penalized regression.
*The Annals of Applied Statistics*, 2(1), 224–244. DOI - NeTr08: D. Needell, J. A. Tropp (2008) CoSaMP: Iterative signal recovery from incomplete and inaccurate samples.
*ArXiv:0803.2392 [Cs, Math]*. - Pour11: Mohsen Pourahmadi (2011) Covariance Estimation: The GLM and Regularization Perspectives.
*Statistical Science*, 26(3), 369–387. DOI - SoHe16: Mohammadreza Soltani, Chinmay Hegde (2016) Demixing Sparse Signals from Nonlinear Observations.
*Statistics*, 7, 9. - BrPK16: Steven L. Brunton, Joshua L. Proctor, J. Nathan Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems.
*Proceedings of the National Academy of Sciences*, 113(15), 3932–3937. DOI - Stin04: Robert A. Stine (2004) Discussion of “Least angle regression” by Efron et al.
*The Annals of Statistics*, 32(2), 407–499. - ChWa00: Yen-Chi Chen, Yu-Xiang Wang (n.d.) Discussion on ‘Confidence Intervals and Hypothesis Testing for High-Dimensional Regression.’
- TrGe16: Ilya Trofimov, Alexander Genkin (2016) Distributed Coordinate Descent for Generalized Linear Models with Regularization.
*ArXiv:1611.02101 [Cs, Stat]*. - TrGe15: Ilya Trofimov, Alexander Genkin (2015) Distributed Coordinate Descent for L1-regularized Logistic Regression. In Analysis of Images, Social Networks and Texts (pp. 243–254). Springer International Publishing DOI
- HRLV10: A. Hormati, O. Roy, Y.M. Lu, M. Vetterli (2010) Distributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications.
*IEEE Transactions on Signal Processing*, 58(3), 1095–1109. DOI - JaFH15: Lucas Janson, William Fithian, Trevor J. Hastie (2015) Effective degrees of freedom: a flawed metaphor.
*Biometrika*, 102(2), 479–485. DOI - FlHS13: Cheryl J. Flynn, Clifford M. Hurvich, Jeffrey S. Simonoff (2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models.
*ArXiv:1302.2068 [Stat]*. - RaBr15: Saiprasad Ravishankar, Yoram Bresler (2015) Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI.
*ArXiv:1501.02923 [Cs, Stat]*. - QiSG13: Zhiwei Qin, Katya Scheinberg, Donald Goldfarb (2013) Efficient block-coordinate descent algorithms for the Group Lasso.
*Mathematical Programming Computation*, 5(2), 143–169. DOI - LiLe16: Néhémy Lim, Johannes Lederer (2016) Efficient Feature Selection With Large and High-dimensional Data.
*ArXiv:1609.07195 [Stat]*. - CaWB08: Emmanuel J. Candès, Michael B. Wakin, Stephen P. Boyd (2008) Enhancing Sparsity by Reweighted ℓ 1 Minimization.
*Journal of Fourier Analysis and Applications*, 14(5–6), 877–905. DOI - DGSS14: Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, Bernhard Schölkopf (2014) Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm. In ICML.
- Geer16: Sara van de Geer (2016)
*Estimation and Testing Under Sparsity*(Vol. 2159). Cham: Springer International Publishing - ScBD11: Jürg Schelldorfer, Peter Bühlmann, Sara Van De Geer (2011) Estimation for High-Dimensional Linear Mixed-Effects Models Using ℓ1-Penalization.
*Scandinavian Journal of Statistics*, 38(2), 197–214. DOI - BCFS14: Arindam Banerjee, Sheng Chen, Farideh Fazayeli, Vidyashankar Sivakumar (2014) Estimation with Norm Regularization. In Advances in Neural Information Processing Systems 27 (pp. 1556–1564). Curran Associates, Inc.
- LSST13: Jason D. Lee, Dennis L. Sun, Yuekai Sun, Jonathan E. Taylor (2013) Exact post-selection inference, with application to the lasso.
*ArXiv:1311.6238 [Math, Stat]*. - PeEE10: Tomer Peleg, Yonina C. Eldar, Michael Elad (2010) Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery.
*IEEE Transactions on Signal Processing*, 60(5), 2286–2303. DOI - AzKS15: Martin Azizyan, Akshay Krishnamurthy, Aarti Singh (2015) Extreme Compressive Sampling for Covariance Estimation.
*ArXiv:1506.00898 [Cs, Math, Stat]*. - SuBC15: Weijie Su, Malgorzata Bogdan, Emmanuel J. Candès (2015) False Discoveries Occur Early on the Lasso Path.
*ArXiv:1511.01957 [Cs, Math, Stat]*. - FoSr11: Rina Foygel, Nathan Srebro (2011) Fast-rate and optimistic-rate error bounds for L1-regularized regression.
*ArXiv:1108.0373 [Math, Stat]*. - Batt92: Roberto Battiti (1992) First-and second-order methods for learning: between steepest descent and Newton’s method.
*Neural Computation*, 4(2), 141–166. DOI - Nest12: Yu Nesterov (2012) Gradient methods for minimizing composite functions.
*Mathematical Programming*, 140(1), 125–161. DOI - Mont12: Andrea Montanari (2012) Graphical models concepts in compressed sensing.
*Compressed Sensing: Theory and Applications*, 394–438. - RWRY11: Pradeep Ravikumar, Martin J. Wainwright, Garvesh Raskutti, Bin Yu (2011) High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence.
*Electronic Journal of Statistics*, 5, 935–980. DOI - Geer08: Sara A. van de Geer (2008) High-dimensional generalized linear models and the lasso.
*The Annals of Statistics*, 36(2), 614–645. DOI - MeBü06: Nicolai Meinshausen, Peter Bühlmann (2006) High-dimensional graphs and variable selection with the lasso.
*The Annals of Statistics*, 34(3), 1436–1462. DOI - BüGe15: Peter Bühlmann, Sara van de Geer (2015) High-dimensional inference in misspecified linear models.
*ArXiv:1503.06426 [Stat]*, 9(1), 1449–1473. DOI - CaDa11: Emmanuel J. Candès, Mark A. Davenport (2011) How well can we estimate a sparse vector?
*ArXiv:1104.5246 [Cs, Math, Stat]*. - StED05: J. L. Starck, Michael Elad, David L. Donoho (2005) Image decomposition via the combination of sparse representations and a variational approach.
*IEEE Transactions on Image Processing*, 14(10), 1570–1582. DOI - Mair15: J. Mairal (2015) Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning.
*SIAM Journal on Optimization*, 25(2), 829–855. DOI - PoHo15: Jean Pouget-Abadie, Thibaut Horel (2015) Inferring Graphs from Cascades: A Sparse Recovery Framework. In Proceedings of The 32nd International Conference on Machine Learning.
- BuLe17: Yunqi Bu, Johannes Lederer (2017) Integrating Additional Knowledge Into Estimation of Graphical Models.
*ArXiv:1704.02739 [Stat]*. - WPPA16: Scott Wisdom, Thomas Powers, James Pitton, Les Atlas (2016) Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery. In Advances in Neural Information Processing Systems 29.
- ChYi08: R. Chartrand, Wotao Yin (2008) Iteratively reweighted algorithms for compressive sensing. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3869–3872). DOI
- BoKG10: Howard D. Bondell, Arun Krishna, Sujit K. Ghosh (2010) Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.
*Biometrics*, 66(4), 1069–1077. DOI - SFJJ15: Virginia Smith, Simone Forte, Michael I. Jordan, Martin Jaggi (2015) L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.
*ArXiv:1512.04011 [Cs]*. - BLZS15: Adam Bloniarz, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, Bin Yu (2015) Lasso adjustments of treatment effect estimates in randomized experiments.
*ArXiv:1507.03652 [Math, Stat]*. - HaRR15: Niels Richard Hansen, Patricia Reynaud-Bouret, Vincent Rivoirard (2015) Lasso and probabilistic inequalities for multivariate point processes.
*Bernoulli*, 21(1), 83–143. DOI - ThAH15: Chrtistos Thrampoulidis, Ehsan Abbasi, Babak Hassibi (2015) LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements. In Advances in Neural Information Processing Systems 28 (pp. 3402–3410). Curran Associates, Inc.
- MeYu09: Nicolai Meinshausen, Bin Yu (2009) Lasso-type recovery of sparse representations for high-dimensional data.
*The Annals of Statistics*, 37(1), 246–270. DOI - ArAZ15: Bryon Aragam, Arash A. Amini, Qing Zhou (2015) Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression.
*ArXiv:1511.08963 [Cs, Math, Stat]*. - FuZh13: Fei Fu, Qing Zhou (2013) Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.
*Journal of the American Statistical Association*, 108(501), 288–300. DOI - MoBa17: Ali Mousavi, Richard G. Baraniuk (2017) Learning to Invert: Signal Recovery via Deep Convolutional Networks. In ICASSP.
- HCMF08: Tim Hesterberg, Nam Hee Choi, Lukas Meier, Chris Fraley (2008) Least angle and ℓ1 penalized regression: A review.
*Statistics Surveys*, 2, 61–93. DOI - EHJT04: Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani (2004) Least angle regression.
*The Annals of Statistics*, 32(2), 407–499. DOI - Maho16: Michael W. Mahoney (2016) Lecture Notes on Spectral Graph Methods.
*ArXiv Preprint ArXiv:1608.04845*. - FCHW08: Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin (2008) LIBLINEAR: A Library for Large Linear Classification.
*Journal of Machine Learning Research*, 9, 1871–1874. - BaRo14: Sohail Bahmani, Justin Romberg (2014) Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation.
*ArXiv:1501.00046 [Cs, Math, Stat]*. - ZhTW11: Tianyi Zhou, Dacheng Tao, Xindong Wu (2011) Manifold elastic net: a unified framework for sparse dimension reduction.
*Data Mining and Knowledge Discovery*, 22(3), 340–371. - CaPl10: Emmanuel J. Candès, Y. Plan (2010) Matrix Completion With Noise.
*Proceedings of the IEEE*, 98(6), 925–936. DOI - BHLL08: Andrew R. Barron, Cong Huang, Jonathan Q. Li, Xi Luo (2008) MDL, penalized likelihood, and statistical risk. In Information Theory Workshop, 2008. ITW’08. IEEE (pp. 247–257). IEEE DOI
- NeOW14: Sarah E. Neville, John T. Ormerod, M. P. Wand (2014) Mean field variational Bayes for continuous sparse signal shrinkage: Pitfalls and remedies.
*Electronic Journal of Statistics*, 8(1), 1113–1151. DOI - YuLi06: Ming Yuan, Yi Lin (2006) Model selection and estimation in regression with grouped variables.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 68(1), 49–67. DOI - YuLi07: Ming Yuan, Yi Lin (2007) Model selection and estimation in the Gaussian graphical model.
*Biometrika*, 94(1), 19–35. DOI - BaGA08: Onureena Banerjee, Laurent El Ghaoui, Alexandre d’Aspremont (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data.
*Journal of Machine Learning Research*, 9(Mar), 485–516. - QiYa12: Wei Qian, Yuhong Yang (2012) Model selection via standard error adjusted adaptive lasso.
*Annals of the Institute of Statistical Mathematics*, 65(2), 295–318. DOI - Bach09: Francis Bach (2009) Model-Consistent Sparse Estimation through the Bootstrap
- Zhan10: Cun-Hui Zhang (2010) Nearly unbiased variable selection under minimax concave penalty.
*The Annals of Statistics*, 38(2), 894–942. DOI - AgNR16: Alireza Aghasi, Nam Nguyen, Justin Romberg (2016) Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks.
*ArXiv:1611.05162 [Cs, Stat]*. - HaLB15: David Hallac, Jure Leskovec, Stephen Boyd (2015) Network Lasso: Clustering and Optimization in Large Graphs.
*ArXiv:1507.00280 [Cs, Math, Stat]*. DOI - Kabá14: Ata Kabán (2014) New Bounds on Compressive Linear Least Squares Regression. In Journal of Machine Learning Research (pp. 448–456).
- YNGD12: M. Yaghoobi, Sangnam Nam, R. Gribonval, M.E. Davies (2012) Noise aware analysis operator learning for approximately cosparse signals. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5409–5412). DOI
- OJFH13: S. Oymak, A. Jalali, M. Fazel, B. Hassibi (2013) Noisy estimation of simultaneously structured models: Limitations of convex relaxation. In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC) (pp. 6019–6024). DOI
- TsBö16: Michael Tschannen, Helmut Bölcskei (2016) Noisy subspace clustering via matching pursuits.
*ArXiv:1612.03450 [Cs, Math, Stat]*. - BGLM18: Jacob Bien, Irina Gaynanova, Johannes Lederer, Christian L. Müller (2018) Non-Convex Global Minimization and False Discovery Rate Control for the TREX.
*Journal of Computational and Graphical Statistics*, 27(1), 23–33. DOI - BGLM16: Jacob Bien, Irina Gaynanova, Johannes Lederer, Christian Müller (2016) Non-convex Global Minimization and False Discovery Rate Control for the TREX.
*ArXiv:1604.06815 [Cs, Stat]*. - GBRD14: Sara van de Geer, Peter Bühlmann, Ya’acov Ritov, Ruben Dezeure (2014) On asymptotically optimal confidence regions and tests for high-dimensional models.
*The Annals of Statistics*, 42(3), 1166–1202. DOI - ZoHT07: Hui Zou, Trevor Hastie, Robert Tibshirani (2007) On the “degrees of freedom” of the lasso.
*The Annals of Statistics*, 35(5), 2173–2192. DOI - BaRE17: Dmitry Batenkov, Yaniv Romano, Michael Elad (2017) On the Global-Local Dichotomy in Sparsity Modeling.
*ArXiv:1702.03446 [Cs, Math, Stat]*. - LuGF12: W. LU, Y. GOLDBERG, J. P. FINE (2012) On the robustness of the adaptive lasso to model misspecification.
*Biometrika*, 99(3), 717–731. DOI - GiSB14: Raja Giryes, Guillermo Sapiro, Alex M. Bronstein (2014) On the Stability of Deep Networks.
*ArXiv:1412.5896 [Cs, Math, Stat]*. - BrEZ08: A. M. Bruckstein, Michael Elad, M. Zibulevsky (2008) On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.
*IEEE Transactions on Information Theory*, 54(11), 4813–4820. DOI - Tadd13: Matt Taddy (2013) One-step estimator paths for concave regularization.
*ArXiv:1308.5623 [Stat]*. - ZoLi08: Hui Zou, Runze Li (2008) One-step sparse estimates in nonconcave penalized likelihood models.
*The Annals of Statistics*, 36(4), 1509–1533. DOI - BoCN16: Léon Bottou, Frank E. Curtis, Jorge Nocedal (2016) Optimization Methods for Large-Scale Machine Learning.
*ArXiv:1606.04838 [Cs, Math, Stat]*. - BJMO12: Francis Bach, Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski (2012) Optimization with Sparsity-Inducing Penalties.
*Foundations and Trends® in Machine Learning*, 4(1), 1–106. DOI - ShOw10: Yiyuan She, Art B. Owen (2010) Outlier Detection Using Nonconvex Penalized Regression.
- CFJL16: Emmanuel J. Candès, Yingying Fan, Lucas Janson, Jinchi Lv (2016) Panning for Gold: Model-free Knockoffs for High-dimensional Controlled Variable Selection.
*ArXiv Preprint ArXiv:1610.02351*. - KWSR16: Alec Koppel, Garrett Warnell, Ethan Stump, Alejandro Ribeiro (2016) Parsimonious Online Learning with Kernels via Sparse Projections in Function Space.
*ArXiv:1612.04111 [Cs, Stat]*. - FHHT07: Jerome Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani (2007) Pathwise coordinate optimization.
*The Annals of Applied Statistics*, 1(2), 302–332. DOI - ZhLZ18: Tuo Zhao, Han Liu, Tong Zhang (2018) Pathwise coordinate optimization for sparse learning: Algorithm and theory.
*The Annals of Statistics*, 46(1), 180–218. DOI - GuLi05: Jiang Gui, Hongzhe Li (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.
*Bioinformatics*, 21(13), 3001–3008. DOI - Uema15: Yoshimasa Uematsu (2015) Penalized Likelihood Estimation in High-Dimensional Time Series Models and its Application.
*ArXiv:1504.06706 [Math, Stat]*. - NaGr12: Sangnam Nam, R. Gribonval (2012) Physics-driven structured cosparse modeling for source localization. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5397–5400). DOI
- HSDR14: Cho-Jui Hsieh, Mátyás A. Sustik, Inderjit S. Dhillon, Pradeep D. Ravikumar (2014) QUIC: quadratic approximation for sparse inverse covariance estimation.
*Journal of Machine Learning Research*, 15(1), 2911–2947. - GaRC09: G. Gasso, A. Rakotomamonjy, S. Canu (2009) Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming.
*IEEE Transactions on Signal Processing*, 57(12), 4686–4698. DOI - ChHe12: Y. Chen, A. O. Hero (2012) Recursive ℓ1,∞ Group Lasso.
*IEEE Transactions on Signal Processing*, 60(8), 3978–3987. DOI - Tibs96: Robert Tibshirani (1996) Regression Shrinkage and Selection via the Lasso.
*Journal of the Royal Statistical Society. Series B (Methodological)*, 58(1), 267–288. - Tibs11: Robert Tibshirani (2011) Regression shrinkage and selection via the lasso: a retrospective.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 73(3), 273–282. DOI - ZoHa05: Hui Zou, Trevor Hastie (2005) Regularization and variable selection via the elastic net.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 67(2), 301–320. DOI - SFHT11: Noah Simon, Jerome Friedman, Trevor Hastie, Rob Tibshirani (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.
*Journal of Statistical Software*, 39(5). - KrSB09: Nicole Krämer, Juliane Schäfer, Anne-Laure Boulesteix (2009) Regularized estimation of large-scale gene association networks using graphical Gaussian models.
*BMC Bioinformatics*, 10(1), 384. DOI - WaGZ06: L. Wang, M. D. Gordon, J. Zhu (2006) Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning. In Sixth International Conference on Data Mining (ICDM’06) (pp. 690–700). DOI
- HuCB08: Cong Huang, G. L. H. Cheang, Andrew R. Barron (2008) Risk of penalized least squares, greedy selection and l1 penalization for flexible function libraries.
- XuCM10: H. Xu, C. Caramanis, S. Mannor (2010) Robust Regression and Lasso.
*IEEE Transactions on Information Theory*, 56(7), 3561–3574. DOI - WaLJ07: Hansheng Wang, Guodong Li, Guohua Jiang (2007) Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.
*Journal of Business & Economic Statistics*, 25(3), 347–355. DOI - LaZw11: Sophie Lambert-Lacroix, Laurent Zwald (2011) Robust regression through the Huber’s criterion and adaptive lasso penalty.
*Electronic Journal of Statistics*, 5, 1015–1053. DOI - HeBa12: Chinmay Hegde, Richard G. Baraniuk (2012) Signal Recovery on Incoherent Manifolds.
*IEEE Transactions on Information Theory*, 58(12), 7204–7214. DOI - Chen12: Xiaojun Chen (2012) Smoothing methods for nonsmooth, nonconvex minimization.
*Mathematical Programming*, 134(1), 71–99. DOI - GuPe16: Pawan Gupta, Marianna Pensky (2016) Solution of linear ill-posed problems using random dictionaries.
*ArXiv:1605.07913 [Math, Stat]*. - NCBK11: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng (2011) Sparse filtering. In Advances in Neural Information Processing Systems 24 (pp. 1125–1133). Curran Associates, Inc.
- FrHT08: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2008) Sparse inverse covariance estimation with the graphical lasso.
*Biostatistics*, 9(3), 432–441. DOI - ZYJZ15: Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou (2015) Sparse Learning for Large-scale and High-dimensional Data: A Randomized Convex-concave Optimization Approach.
*ArXiv:1511.03766 [Cs]*. - RiGr15: Irina Rish, Genady Ya Grabarnik (2015)
*Sparse modeling: theory, algorithms, and applications*. Boca Raton, FL: CRC Press, Taylor & Francis Group - LaLZ09: John Langford, Lihong Li, Tong Zhang (2009) Sparse Online Learning via Truncated Gradient. In Advances in Neural Information Processing Systems 21 (pp. 905–912). Curran Associates, Inc.
- WrNF09: S. J. Wright, R. D. Nowak, M. A. T. Figueiredo (2009) Sparse Reconstruction by Separable Approximation.
*IEEE Transactions on Signal Processing*, 57(7), 2479–2493. DOI - CDHB09: Volkan Cevher, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
- RiGr14: Irina Rish, Genady Grabarnik (2014) Sparse Signal Recovery with Exponential-Family Noise. In Compressed Sensing & Sparse Filtering (pp. 77–93). Springer Berlin Heidelberg DOI
- ElVi13: E. Elhamifar, R. Vidal (2013) Sparse Subspace Clustering: Algorithm, Theory, and Applications.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 35(11), 2765–2781. DOI - MaFH09: Rahul Mazumder, Jerome H Friedman, Trevor J. Hastie (2009) SparseNet: Coordinate Descent with Non-Convex Penalties. Stanford University
- RaBr15: S. Ravishankar, Y. Bresler (2015) Sparsifying Transform Learning With Efficient Optimal Updates and Convergence Guarantees.
*IEEE Transactions on Signal Processing*, 63(9), 2389–2404. DOI - LaFa09: Clifford Lam, Jianqing Fan (2009) Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.
*Annals of Statistics*, 37(6B), 4254–4278. DOI - Wahb90: Grace Wahba (1990)
*Spline Models for Observational Data*. SIAM - BeCW11: Alexandre Belloni, Victor Chernozhukov, Lie Wang (2011) Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming.
*Biometrika*, 98(4), 791–806. DOI - CaRT06: Emmanuel J. Candès, Justin K. Romberg, Terence Tao (2006) Stable signal recovery from incomplete and inaccurate measurements.
*Communications on Pure and Applied Mathematics*, 59(8), 1207–1223. DOI - WCLH16: Zhangyang Wang, Shiyu Chang, Qing Ling, Shuai Huang, Xia Hu, Honghui Shi, Thomas S. Huang (2016) Stacked Approximated Regression Machine: A Simple Deep Learning Approach.
- HaTW15: Trevor J. Hastie, Tibshirani, Rob, Martin J. Wainwright (2015)
*Statistical Learning with Sparsity: The Lasso and Generalizations*. Boca Raton: Chapman and Hall/CRC - Barb15: Jean Barbier (2015) Statistical physics and approximate message-passing algorithms for sparse linear estimation problems in signal processing and coding theory.
*ArXiv:1511.01650 [Cs, Math]*. - Geer14a: Sara van de Geer (2014a) Statistical Theory for High-Dimensional Models.
*ArXiv:1409.8557 [Math, Stat]*. - GhLa13b: Saeed Ghadimi, Guanghui Lan (2013b) Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming.
*SIAM Journal on Optimization*, 23(4), 2341–2368. DOI - SSBA95: Sashank J. Reddi, Suvrit Sra, Barnabás Póczós, Alex Smola (1995) Stochastic Frank-Wolfe Methods for Nonconvex Optimization.
- KoTo09: Matthieu Kowalski, Bruno Torrésani (2009) Structured Sparsity: from Mixed Norms to Structured Shrinkage. In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations.
- CaFe13: Emmanuel J. Candès, Carlos Fernandez-Granda (2013) Super-Resolution from Noisy Data.
*Journal of Fourier Analysis and Applications*, 19(6), 1229–1254. DOI - GeBZ11: Sara A. van de Geer, Peter Bühlmann, Shuheng Zhou (2011) The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso).
*Electronic Journal of Statistics*, 5, 688–749. DOI - Zou06: Hui Zou (2006) The Adaptive Lasso and Its Oracle Properties.
*Journal of the American Statistical Association*, 101(476), 1418–1429. DOI - GIKM16: Catherine Greenhill, Mikhail Isaev, Matthew Kwan, Brendan D. McKay (2016) The average number of spanning trees in sparse graphs with given degrees.
*ArXiv:1606.01586 [Math]*. - VeRo15: Victor Veitch, Daniel M. Roy (2015) The Class of Random Graphs Arising from Exchangeable Random Measures.
*ArXiv:1512.03099 [Cs, Math, Stat]*. - ZhRY09: Peng Zhao, Guilherme Rocha, Bin Yu (2009) The composite absolute penalties family for grouped and hierarchical variable selection.
*The Annals of Statistics*, 37(6A), 3468–3497. DOI - Geer07: Sara van de Geer (2007) The deterministic Lasso
- PoKo97: Stephen Portnoy, Roger Koenker (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators.
*Statistical Science*, 12(4), 279–300. DOI - This97: Ronald A. Thisted (1997) [The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error versus Absolute-Error Estimators]: Comment.
*Statistical Science*, 12(4), 296–298. - MeGB08: Lukas Meier, Sara van de Geer, Peter Bühlmann (2008) The group lasso for logistic regression.
*Group*, 70(Part 1), 53–71. - DaBa16: Ran Dai, Rina Foygel Barber (2016) The knockoff filter for FDR control in group-sparse and multitask regression.
*ArXiv Preprint ArXiv:1602.03589*. - BaMo12: M. Bayati, A. Montanari (2012) The LASSO Risk for Gaussian Matrices.
*IEEE Transactions on Information Theory*, 58(4), 1997–2017. DOI - HeGe11: Mohamed Hebiri, Sara A. van de Geer (2011) The Smooth-Lasso and other ℓ1+ℓ2-penalized methods.
*Electronic Journal of Statistics*, 5, 1184–1226. DOI - BeLT17: Pierre C. Bellec, Guillaume Lecué, Alexandre B. Tsybakov (2017) Towards the study of least squares estimators with convex penalty.
*ArXiv:1701.09120 [Math, Stat]*. - HeRP14: Dan He, Irina Rish, Laxmi Parida (2014) Transductive HSIC Lasso. In Proceedings of the 2014 SIAM International Conference on Data Mining (pp. 154–162). Philadelphia, PA: Society for Industrial and Applied Mathematics
- FaLi01: Jianqing Fan, Runze Li (2001) Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties.
*Journal of the American Statistical Association*, 96(456), 1348–1360. DOI - MoAV17: Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov (2017) Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of ICML.
- Geer14b: Sara van de Geer (2014b) Weakly decomposable regularization penalties and structured sparsity.
*Scandinavian Journal of Statistics*, 41(1), 72–86. DOI - RaRe09: Ali Rahimi, Benjamin Recht (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
- RiSG16: Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. (pp. 1135–1144). ACM Press DOI
- Geer14c: Sara van de Geer (2014c) Worst possible sub-directions in high-dimensional models. In arXiv:1403.7023 [math, stat] (Vol. 131).