The Living Thing / Notebooks :

Sparse regression

Penalised regression where the penalties are sparsifying. The prediction losses could be anything - likelihood, least-squares, robust Huberised losses, absolute deviation etc.

I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a smoothing parameters, controlling how many factors we still have to consider, from a subset of the original.

I will usually discuss our intent to minimise prediction error, but one could also try to minimise model selection error too.

Then we have a simultaneous estimation and model selection procedure, probably a specific sparse model selection procedure and we possibly have the choice clever optimisation method to do the whole thing fast. Related to compressed sensing, but here we consider sampling complexity and and errors.

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

TODO: make comprehensible

TODO: examples

TODO: disambiguate the optimisation technologies at play - iteratively reweighted least squares etc.

Now! A set of headings under which I will try to understand some things, mostly the LASSO variants.


Quadratic loss penalty, absolute coefficient penalty.

The original, and the one actually included in all the statistics software.

This is not a model, per se, but a fitting procedure. We can, however, try to see what models and designs this procedure happens to work with.

Adaptive LASSO

TBD. This is the one with famous oracle properties. Hsi Zou’s paper on this (Zou06) is very readable. I am having trouble digesting Sara van de Geer’s paper (Geer08) on the Generalised Lasso, but it seems to offer me guarantees for something very similar to the Adaptive Lasso, but with far more general assumptions on the model and loss functions, and some finite sample guarnatees.


A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with a pencil and paper.


As used in graphical models. TBD

Elastic net

Combination of \(L_1\) and \(L_2\) penalties. TBD.

Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See Yuan and Lin (YuLi06).

Model selection

Can be fiddly with sparse regression, which couples variable selection tightly with parameter estimation. See sparse model selection.

Debiassed LASSO

See GBRD14 and Geer14c. Same as Geer08, or not?

Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing. (WCLH16)


“Smoothly Clipped Absolute Deviation”. TBD.

More general nonconvex coefficient penalties


Other prediction losses

This should work for heavy-tailed noise. MAD prediction penalty, lasso-coefficient penalty. What do you call this even?

See PoKo97 and WaLJ07 for some implementations using maximum prediction error. I believe the Dantzig selector uses the infinity-norm. Robust prediction losses? How do they relate?

Bayesian Lasso

Laplace priors on linear regression coefficients, includes normal lasso as a MAP estimate. Not particularly sparse. I have not need for this right now, but I did read Dan Simpson’s critique


Hastie, Friedman eta’s glmnet for R is fast and well-regarded, and has a MATLAB version. Here’s how to use it for adaptive lasso.

SPAMS (C++, MATLAB, R, python) by Mairal, looks interesting. It’s an optimisation library for many, many sparse problems.

liblinear also include lasso-type solvers, as well as support-vector regression.


Sparse regression as a universal classifier explainer? *Local Interpretable Model-agnostic Explanations* might do it (RiSG16).


Abramovich, F., Benjamini, Y., Donoho, D. L., & Johnstone, I. M.(2006) Adapting to unknown sparsity by controlling the false discovery rate. The Annals of Statistics, 34(2), 584–653. DOI.
Ahmad, R., & Schniter, P. (2015) Iteratively Reweighted $ell_1$ Approaches to Sparse Composite Regularization. arXiv:1504.05110 [Cs, Math].
Alfons, A., Croux, C., & Gelper, S. (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. The Annals of Applied Statistics, 7(1), 226–248. DOI.
Azizyan, M., Krishnamurthy, A., & Singh, A. (2015) Extreme Compressive Sampling for Covariance Estimation. arXiv:1506.00898 [Cs, Math, Stat].
Bach, F. (2009) Model-Consistent Sparse Estimation through the Bootstrap.
Bach, F., Jenatton, R., Mairal, J., & Obozinski, G. (2012) Optimization with Sparsity-Inducing Penalties. Found. Trends Mach. Learn., 4(1), 1–106. DOI.
Bahmani, S., & Romberg, J. (2014) Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation. arXiv:1501.00046 [Cs, Math, Stat].
Banerjee, A., Chen, S., Fazayeli, F., & Sivakumar, V. (2014) Estimation with Norm Regularization. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 1556–1564). Curran Associates, Inc.
Banerjee, O., Ghaoui, L. E., & d’Aspremont, A. (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research, 9(Mar), 485–516.
Barbier, J. (2015) Statistical physics and approximate message-passing algorithms for sparse linear estimation problems in signal processing and coding theory. arXiv:1511.01650 [Cs, Math].
Baron, D., Sarvotham, S., & Baraniuk, R. G.(2010) Bayesian compressive sensing via belief propagation. Signal Processing, IEEE Transactions on, 58(1), 269–280. DOI.
Barron, A. R., Huang, C., Li, J. Q., & Luo, X. (2008) MDL, penalized likelihood, and statistical risk. In Information Theory Workshop, 2008. ITW’08. IEEE (pp. 247–257). IEEE DOI.
Battiti, R. (1992) First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Computation, 4(2), 141–166. DOI.
Bayati, M., & Montanari, A. (2012) The LASSO Risk for Gaussian Matrices. IEEE Transactions on Information Theory, 58(4), 1997–2017. DOI.
Belloni, A., Chernozhukov, V., & Wang, L. (2011) Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming. Biometrika, 98(4), 791–806. DOI.
Bian, W., Chen, X., & Ye, Y. (2014) Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Mathematical Programming, 149(1–2), 301–327. DOI.
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. arXiv:1507.03652 [Math, Stat].
Borgs, C., Chayes, J. T., Cohn, H., & Zhao, Y. (2014) An $L^p$ theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions. arXiv:1401.2906 [Math].
Bottou, L., Curtis, F. E., & Nocedal, J. (2016) Optimization Methods for Large-Scale Machine Learning. arXiv:1606.04838 [Cs, Math, Stat].
Breiman, L. (1995) Better subset regression using the nonnegative garrote. Technometrics, 37(4), 373–384.
Brunton, S. L., Proctor, J. L., & Kutz, J. N.(2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15), 3932–3937. DOI.
Bühlmann, P., & Geer, S. van de. (2011) Additive models and many smooth univariate functions. In Statistics for High-Dimensional Data (pp. 77–97). Springer Berlin Heidelberg
Bühlmann, P., & van de Geer, S. (2015) High-dimensional inference in misspecified linear models. arXiv:1503.06426 [Stat], 9(1), 1449–1473. DOI.
Candès, E. J., & Fernandez-Granda, C. (2013) Super-Resolution from Noisy Data. Journal of Fourier Analysis and Applications, 19(6), 1229–1254. DOI.
Candès, E. J., & Plan, Y. (2010) Matrix Completion With Noise. Proceedings of the IEEE, 98(6), 925–936. DOI.
Candès, E. J., Romberg, J. K., & Tao, T. (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8), 1207–1223. DOI.
Carmi, A. Y.(2013) Compressive system identification: Sequential methods and entropy bounds. Digital Signal Processing, 23(3), 751–770. DOI.
Carmi, A. Y.(2014) Compressive System Identification. In A. Y. Carmi, L. Mihaylova, & S. J. Godsill (Eds.), Compressed Sensing & Sparse Filtering (pp. 281–324). Springer Berlin Heidelberg
Cevher, V., Duarte, M. F., Hegde, C., & Baraniuk, R. (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
Chartrand, R., & Yin, W. (2008) Iteratively reweighted algorithms for compressive sensing. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3869–3872). DOI.
Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D., & Carin, L. (2010) Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds. IEEE Transactions on Signal Processing, 58(12), 6140–6155. DOI.
Chen, X. (2012) Smoothing methods for nonsmooth, nonconvex minimization. Mathematical Programming, 134(1), 71–99. DOI.
Chen, Y., & Hero, A. O.(2012) Recursive ℓ1,∞ Group Lasso. IEEE Transactions on Signal Processing, 60(8), 3978–3987. DOI.
Chen, Y.-C., & Wang, Y.-X. (n.d.) Discussion on “Confidence Intervals and Hypothesis Testing for High-Dimensional Regression”.
Daneshmand, H., Gomez-Rodriguez, M., Song, L., & Schölkopf, B. (2014) Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm. In ICML.
Diaconis, P., & Freedman, D. (1984) Asymptotics of Graphical Projection Pursuit. The Annals of Statistics, 12(3), 793–815.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499. DOI.
Elhamifar, E., & Vidal, R. (2013) Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2765–2781. DOI.
Ewald, K., & Schneider, U. (2015) Confidence Sets Based on the Lasso Estimator. arXiv:1507.05315 [Math, Stat].
Fan, J., & Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348–1360. DOI.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008) LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 9, 1871–1874.
Flynn, C. J., Hurvich, C. M., & Simonoff, J. S.(2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models. arXiv:1302.2068 [Stat].
Foygel, R., & Srebro, N. (2011) Fast-rate and optimistic-rate error bounds for L1-regularized regression. arXiv:1108.0373 [Math, Stat].
Friedman, J., Hastie, T., & Tibshirani, R. (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. DOI.
Gasso, G., Rakotomamonjy, A., & Canu, S. (2009) Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming. IEEE Transactions on Signal Processing, 57(12), 4686–4698. DOI.
Ghadimi, S., & Lan, G. (2013a) Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming. arXiv:1310.3787 [Math].
Ghadimi, S., & Lan, G. (2013b) Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming. SIAM Journal on Optimization, 23(4), 2341–2368. DOI.
Ghazi, B., Hassanieh, H., Indyk, P., Katabi, D., Price, E., & Shi, L. (2013) Sample-optimal average-case sparse Fourier Transform in two dimensions. In 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 1258–1265). DOI.
Girolami, M. (2001) A Variational Method for Learning Sparse and Overcomplete Representations. Neural Computation, 13(11), 2517–2532. DOI.
Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [Cs, Math, Stat].
Greenhill, C., Isaev, M., Kwan, M., & McKay, B. D.(2016) The average number of spanning trees in sparse graphs with given degrees. arXiv:1606.01586 [Math].
Gui, J., & Li, H. (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001–3008. DOI.
Hallac, D., Leskovec, J., & Boyd, S. (2015) Network Lasso: Clustering and Optimization in Large Graphs. arXiv:1507.00280 [Cs, Math, Stat]. DOI.
Hansen, N. R., Reynaud-Bouret, P., & Rivoirard, V. (2015) Lasso and probabilistic inequalities for multivariate point processes. Bernoulli, 21(1), 83–143. DOI.
Hastie, T. J., Tibshirani, Rob, & Wainwright, M. J.(2015) Statistical Learning with Sparsity: The Lasso and Generalizations. . Boca Raton: Chapman and Hall/CRC
Hawe, S., Kleinsteuber, M., & Diepold, K. (2013) Analysis operator learning and its application to image reconstruction. IEEE Transactions on Image Processing, 22(6), 2138–2150. DOI.
He, D., Rish, I., & Parida, L. (2014) Transductive HSIC Lasso. In M. Zaki, Z. Obradovic, P. N. Tan, A. Banerjee, C. Kamath, & S. Parthasarathy (Eds.), Proceedings of the 2014 SIAM International Conference on Data Mining (pp. 154–162). Philadelphia, PA: Society for Industrial and Applied Mathematics
Hebiri, M., & van de Geer, S. A.(2011) The Smooth-Lasso and other ℓ1+ℓ2-penalized methods. Electronic Journal of Statistics, 5, 1184–1226. DOI.
Hegde, C., Indyk, P., & Schmidt, L. (2015) A nearly-linear time framework for graph-structured sparsity. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 928–937).
Hesterberg, T., Choi, N. H., Meier, L., & Fraley, C. (2008) Least angle and ℓ1 penalized regression: A review. Statistics Surveys, 2, 61–93. DOI.
Hormati, A., Roy, O., Lu, Y. M., & Vetterli, M. (2010) Distributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications. IEEE Transactions on Signal Processing, 58(3), 1095–1109. DOI.
Hsieh, C.-J., Sustik, M. A., Dhillon, I. S., & Ravikumar, P. D.(2014) QUIC: quadratic approximation for sparse inverse covariance estimation. Journal of Machine Learning Research, 15(1), 2911–2947.
Hu, T., Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. arXiv:1503.00690 [Cs, Q-Bio, Stat].
Huang, C., Cheang, G. L. H., & Barron, A. R.(2008) Risk of penalized least squares, greedy selection and l1 penalization for flexible function libraries.
Janson, L., Fithian, W., & Hastie, T. J.(2015) Effective degrees of freedom: a flawed metaphor. Biometrika, 102(2), 479–485. DOI.
Javanmard, A., & Montanari, A. (2014) Confidence Intervals and Hypothesis Testing for High-dimensional Regression. Journal of Machine Learning Research, 15(1), 2869–2909.
Kabán, A. (2014) New Bounds on Compressive Linear Least Squares Regression. (pp. 448–456). Presented at the Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
Krämer, N., Schäfer, J., & Boulesteix, A.-L. (2009) Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics, 10(1), 384. DOI.
Lam, C., & Fan, J. (2009) Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. Annals of Statistics, 37(6B), 4254–4278. DOI.
Lambert-Lacroix, S., & Zwald, L. (2011) Robust regression through the Huber’s criterion and adaptive lasso penalty. Electronic Journal of Statistics, 5, 1015–1053. DOI.
Langford, J., Li, L., & Zhang, T. (2009) Sparse Online Learning via Truncated Gradient. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems 21 (pp. 905–912). Curran Associates, Inc.
Lee, J. D., Sun, D. L., Sun, Y., & Taylor, J. E.(2013) Exact post-selection inference, with application to the lasso. arXiv:1311.6238 [Math, Stat].
Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R. (2014) A significance test for the lasso. The Annals of Statistics, 42(2), 413–468. DOI.
Mahoney, M. W.(2016) Lecture Notes on Spectral Graph Methods. arXiv Preprint arXiv:1608.04845.
Mairal, J. (2015) Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning. SIAM Journal on Optimization, 25(2), 829–855. DOI.
Mazumder, R., Friedman, J. H., & Hastie, T. J.(2009) SparseNet: Coordinate Descent with Non-Convex Penalties. . Stanford University
Meier, L., van de Geer, S., & Bühlmann, P. (2008) The group lasso for logistic regression. Group, 70(Part 1), 53–71.
Meinshausen, N., & Bühlmann, P. (2006) High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI.
Meinshausen, N., & Yu, B. (2009) Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics, 37(1), 246–270. DOI.
Montanari, A. (2012) Graphical models concepts in compressed sensing. Compressed Sensing: Theory and Applications, 394–438.
Müller, P., & van de Geer, S. (2015) Censored linear model in high dimensions: Penalised linear regression on high-dimensional data with left-censored response variable. TEST. DOI.
Nam, S., & Gribonval, R. (2012) Physics-driven structured cosparse modeling for source localization. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5397–5400). DOI.
Needell, D., & Tropp, J. A.(2008) CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. arXiv:0803.2392 [Cs, Math].
Nesterov, Y. (2012) Gradient methods for minimizing composite functions. Mathematical Programming, 140(1), 125–161. DOI.
Ngiam, J., Chen, Z., Bhaskar, S. A., Koh, P. W., & Ng, A. Y.(2011) Sparse Filtering. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24 (pp. 1125–1133). Curran Associates, Inc.
Nickl, R., & van de Geer, S. (2013) Confidence sets in sparse regression. The Annals of Statistics, 41(6), 2852–2876. DOI.
Oymak, S., Jalali, A., Fazel, M., & Hassibi, B. (2013) Noisy estimation of simultaneously structured models: Limitations of convex relaxation. In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC) (pp. 6019–6024). DOI.
Peleg, T., Eldar, Y. C., & Elad, M. (2010) Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery. IEEE Transactions on Signal Processing, 60(5), 2286–2303. DOI.
Peng, Z., Gurram, P., Kwon, H., & Yin, W. (2015) Optimal Sparse Kernel Learning for Hyperspectral Anomaly Detection. arXiv:1506.02585 [Cs].
Portnoy, S., & Koenker, R. (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Statistical Science, 12(4), 279–300. DOI.
Pouget-Abadie, J., & Horel, T. (2015) Inferring Graphs from Cascades: A Sparse Recovery Framework. In Proceedings of The 32nd International Conference on Machine Learning.
Qian, W., & Yang, Y. (2012) Model selection via standard error adjusted adaptive lasso. Annals of the Institute of Statistical Mathematics, 65(2), 295–318. DOI.
Qin, Z., Scheinberg, K., & Goldfarb, D. (2013) Efficient block-coordinate descent algorithms for the Group Lasso. Mathematical Programming Computation, 5(2), 143–169. DOI.
Rahimi, A., & Recht, B. (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
Ravikumar, P., Wainwright, M. J., Raskutti, G., & Yu, B. (2011) High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935–980. DOI.
Ravishankar, S., & Bresler, Y. (2015a) Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI. arXiv:1501.02923 [Cs, Stat].
Ravishankar, S., & Bresler, Y. (2015b) Sparsifying Transform Learning With Efficient Optimal Updates and Convergence Guarantees. IEEE Transactions on Signal Processing, 63(9), 2389–2404. DOI.
Reynaud-Bouret, P. (2003) Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probability Theory and Related Fields, 126(1). DOI.
Reynaud-Bouret, P., & Schbath, S. (2010) Adaptive estimation for Hawkes processes; application to genome analysis. The Annals of Statistics, 38(5), 2781–2822. DOI.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. (pp. 1135–1144). ACM Press DOI.
Rish, I., & Grabarnik, G. (2014) Sparse Signal Recovery with Exponential-Family Noise. In A. Y. Carmi, L. Mihaylova, & S. J. Godsill (Eds.), Compressed Sensing & Sparse Filtering (pp. 77–93). Springer Berlin Heidelberg
Rish, I., & Grabarnik, G. Y.(2015) Sparse modeling: theory, algorithms, and applications. . Boca Raton, FL: CRC Press, Taylor & Francis Group
Sashank J. Reddi, Suvrit Sra, Barnabás Póczós, & Alex Smola. (1995) Stochastic Frank-Wolfe Methods for Nonconvex Optimization.
Schelldorfer, J., Bühlmann, P., & De Geer, S. V.(2011) Estimation for High-Dimensional Linear Mixed-Effects Models Using ℓ1-Penalization. Scandinavian Journal of Statistics, 38(2), 197–214. DOI.
She, Y., & Owen, A. B.(2010) Outlier Detection Using Nonconvex Penalized Regression.
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39(5).
Smith, V., Forte, S., Jordan, M. I., & Jaggi, M. (2015) L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework. arXiv:1512.04011 [Cs].
Stine, R. A.(2004) Discussion of “Least angle regression” by Efron et al. The Annals of Statistics, 32(2), 407–499.
Su, W., Bogdan, M., & Candès, E. J.(2015) False Discoveries Occur Early on the Lasso Path. arXiv:1511.01957 [Cs, Math, Stat].
Taddy, M. (2013) One-step estimator paths for concave regularization. arXiv:1308.5623 [Stat].
Thisted, R. A.(1997) [The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error versus Absolute-Error Estimators]: Comment. Statistical Science, 12(4), 296–298.
Thrampoulidis, C., Abbasi, E., & Hassibi, B. (2015) The LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements. arXiv:1506.02181 [Cs, Math, Stat].
Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
Tibshirani, R. (2011) Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273–282. DOI.
Tibshirani, R. J.(2014) A General Framework for Fast Stagewise Algorithms. arXiv:1408.5801 [Stat].
Tropp, J. A., & Wright, S. J.(2010) Computational Methods for Sparse Solution of Linear Inverse Problems. Proceedings of the IEEE, 98(6), 948–958. DOI.
Uematsu, Y. (2015) Penalized Likelihood Estimation in High-Dimensional Time Series Models and its Application. arXiv:1504.06706 [Math, Stat].
van de Geer, S. (2007) The deterministic Lasso.
van de Geer, S. (2014a) Statistical Theory for High-Dimensional Models. arXiv:1409.8557 [Math, Stat].
van de Geer, S. (2014b) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41(1), 72–86. DOI.
van de Geer, S. (2014c) Worst possible sub-directions in high-dimensional models. In arXiv:1403.7023 [math, stat] (Vol. 131).
van de Geer, S. (2016) Estimation and Testing Under Sparsity. (Vol. 2159). Cham: Springer International Publishing
van de Geer, S. A.(2008) High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36(2), 614–645. DOI.
van de Geer, S. A., Bühlmann, P., & Zhou, S. (2011) The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electronic Journal of Statistics, 5, 688–749. DOI.
van de Geer, S., Bühlmann, P., Ritov, Y. ’acov, & Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3), 1166–1202. DOI.
Veitch, V., & Roy, D. M.(2015) The Class of Random Graphs Arising from Exchangeable Random Measures. arXiv:1512.03099 [Cs, Math, Stat].
Wahba, G. (1990) Spline Models for Observational Data. . SIAM
Wang, D., Wu, P., Zhao, P., & Hoi, S. C. H.(2015) A Framework of Sparse Online Learning and Its Applications. arXiv:1507.07146 [Cs].
Wang, H., Li, G., & Jiang, G. (2007) Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso. Journal of Business & Economic Statistics, 25(3), 347–355. DOI.
Wang, Z., Chang, S., Ling, Q., Huang, S., Hu, X., Shi, H., & Huang, T. S.(2016) Stacked Approximated Regression Machine: A Simple Deep Learning Approach. . Presented at the NIPS
Woodworth, J., & Chartrand, R. (2015) Compressed Sensing Recovery via Nonconvex Shrinkage Penalties. arXiv:1504.02923 [Cs, Math].
Wright, S. J., Nowak, R. D., & Figueiredo, M. A. T.(2009) Sparse Reconstruction by Separable Approximation. IEEE Transactions on Signal Processing, 57(7), 2479–2493. DOI.
Wu, T. T., & Lange, K. (2008) Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics, 2(1), 224–244. DOI.
Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. E.(2012) Noise aware analysis operator learning for approximately cosparse signals. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5409–5412). DOI.
Yuan, M., & Lin, Y. (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67. DOI.
Yuan, M., & Lin, Y. (2007) Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1), 19–35. DOI.
Yun, S., & Toh, K.-C. (2009) A coordinate gradient descent method for ℓ 1-regularized convex minimization. Computational Optimization and Applications, 48(2), 273–307. DOI.
Zhang, C.-H. (2010) Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942. DOI.
Zhang, C.-H., & Zhang, S. S.(2014) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217–242. DOI.
Zhang, L., Yang, T., Jin, R., & Zhou, Z.-H. (2015) Sparse Learning for Large-scale and High-dimensional Data: A Randomized Convex-concave Optimization Approach. arXiv:1511.03766 [Cs].
Zhao, P., Rocha, G., & Yu, B. (2009) The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37(6A), 3468–3497. DOI.
Zhou, T., Tao, D., & Wu, X. (2011) Manifold elastic net: a unified framework for sparse dimension reduction. Data Mining and Knowledge Discovery, 22(3), 340–371.
Zou, H. (2006) The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101(476), 1418–1429. DOI.
Zou, H., & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. DOI.
Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.
Zou, H., & Li, R. (2008) One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36(4), 1509–1533. DOI.