# Sparse model selection

On choosing the right model and regularisation parameter in sparse regression, which turn out to be very nearly the same, and closely coupled to doing the regression. There are some wrinkles.

TBD: Explain my laborious reasoning that generalised Akaike information criteria don’t seem work when the penalty term is not continuous (e.g. $L_1$ ), and the issues that therefore arise in model selection for such cases.

Present alternatives for choosing the optimal regularisation coefficient, especially outside cross-validation, especially computationally tractable ones. Methods based on statistical learning theory or concentration inequalities win gratitude.

TBD

TBD.

## Degrees-of-freedom penalties

### Refs

ABDJ06
Abramovich, F., Benjamini, Y., Donoho, D. L., & Johnstone, I. M.(2006) Adapting to unknown sparsity by controlling the false discovery rate. The Annals of Statistics, 34(2), 584–653. DOI.
AhSc15
Ahmad, R., & Schniter, P. (2015) Iteratively Reweighted $ell_1$ Approaches to Sparse Composite Regularization. arXiv:1504.05110 [Cs, Math].
AlCG13
Alfons, A., Croux, C., & Gelper, S. (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. The Annals of Applied Statistics, 7(1), 226–248. DOI.
AzKS15
Azizyan, M., Krishnamurthy, A., & Singh, A. (2015) Extreme Compressive Sampling for Covariance Estimation. arXiv:1506.00898 [Cs, Math, Stat].
Bach09
Bach, F. (2009) Model-Consistent Sparse Estimation through the Bootstrap.
BJMO12
Bach, F., Jenatton, R., Mairal, J., & Obozinski, G. (2012) Optimization with Sparsity-Inducing Penalties. Found. Trends Mach. Learn., 4(1), 1–106. DOI.
BaRo14
Bahmani, S., & Romberg, J. (2014) Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation. arXiv:1501.00046 [Cs, Math, Stat].
BCFS14
Banerjee, A., Chen, S., Fazayeli, F., & Sivakumar, V. (2014) Estimation with Norm Regularization. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 1556–1564). Curran Associates, Inc.
BaGA08
Banerjee, O., Ghaoui, L. E., & d’Aspremont, A. (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research, 9(Mar), 485–516.
Barb15
Barbier, J. (2015) Statistical physics and approximate message-passing algorithms for sparse linear estimation problems in signal processing and coding theory. arXiv:1511.01650 [Cs, Math].
BaSB10
Baron, D., Sarvotham, S., & Baraniuk, R. G.(2010) Bayesian compressive sensing via belief propagation. Signal Processing, IEEE Transactions on, 58(1), 269–280. DOI.
BHLL08
Barron, A. R., Huang, C., Li, J. Q., & Luo, X. (2008) MDL, penalized likelihood, and statistical risk. In Information Theory Workshop, 2008. ITW’08. IEEE (pp. 247–257). IEEE DOI.
Batt92
Battiti, R. (1992) First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Computation, 4(2), 141–166. DOI.
BaMo12
Bayati, M., & Montanari, A. (2012) The LASSO Risk for Gaussian Matrices. IEEE Transactions on Information Theory, 58(4), 1997–2017. DOI.
BiCY14
Bian, W., Chen, X., & Ye, Y. (2014) Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Mathematical Programming, 149(1–2), 301–327. DOI.
BLZS15
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. arXiv:1507.03652 [Math, Stat].
BCCZ14
Borgs, C., Chayes, J. T., Cohn, H., & Zhao, Y. (2014) An $L^p$ theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions. arXiv:1401.2906 [Math].
BoCN16
Bottou, L., Curtis, F. E., & Nocedal, J. (2016) Optimization Methods for Large-Scale Machine Learning. arXiv:1606.04838 [Cs, Math, Stat].
Brei95
Breiman, L. (1995) Better subset regression using the nonnegative garrote. Technometrics, 37(4), 373–384.
BrPK16
Brunton, S. L., Proctor, J. L., & Kutz, J. N.(2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15), 3932–3937. DOI.
BüGe11
Bühlmann, P., & Geer, S. van de. (2011) Additive models and many smooth univariate functions. In Statistics for High-Dimensional Data (pp. 77–97). Springer Berlin Heidelberg
BüGe15
Bühlmann, P., & van de Geer, S. (2015) High-dimensional inference in misspecified linear models. arXiv:1503.06426 [Stat], 9(1), 1449–1473. DOI.
CaFe13
Candès, E. J., & Fernandez-Granda, C. (2013) Super-Resolution from Noisy Data. Journal of Fourier Analysis and Applications, 19(6), 1229–1254. DOI.
CaPl10
Candès, E. J., & Plan, Y. (2010) Matrix Completion With Noise. Proceedings of the IEEE, 98(6), 925–936. DOI.
CaRT06
Candès, E. J., Romberg, J. K., & Tao, T. (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8), 1207–1223. DOI.
Carm13
Carmi, A. Y.(2013) Compressive system identification: Sequential methods and entropy bounds. Digital Signal Processing, 23(3), 751–770. DOI.
Carm14
Carmi, A. Y.(2014) Compressive System Identification. In A. Y. Carmi, L. Mihaylova, & S. J. Godsill (Eds.), Compressed Sensing & Sparse Filtering (pp. 281–324). Springer Berlin Heidelberg
CDHB09
Cevher, V., Duarte, M. F., Hegde, C., & Baraniuk, R. (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
ChYi08
Chartrand, R., & Yin, W. (2008) Iteratively reweighted algorithms for compressive sensing. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3869–3872). DOI.
CSPW10
Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D., & Carin, L. (2010) Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds. IEEE Transactions on Signal Processing, 58(12), 6140–6155. DOI.
Chen12
Chen, X. (2012) Smoothing methods for nonsmooth, nonconvex minimization. Mathematical Programming, 134(1), 71–99. DOI.
ChWa00
Chen, Y.-C., & Wang, Y.-X. (n.d.) Discussion on “Confidence Intervals and Hypothesis Testing for High-Dimensional Regression”.
DGSS14
Daneshmand, H., Gomez-Rodriguez, M., Song, L., & Schölkopf, B. (2014) Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm. In ICML.
DiFr84
Diaconis, P., & Freedman, D. (1984) Asymptotics of Graphical Projection Pursuit. The Annals of Statistics, 12(3), 793–815.
EHJT04
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499. DOI.
ElVi13
Elhamifar, E., & Vidal, R. (2013) Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2765–2781. DOI.
EwSc15
Ewald, K., & Schneider, U. (2015) Confidence Sets Based on the Lasso Estimator. arXiv:1507.05315 [Math, Stat].
FaLi01
Fan, J., & Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348–1360. DOI.
FCHW08
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008) LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 9, 1871–1874.
FlHS13
Flynn, C. J., Hurvich, C. M., & Simonoff, J. S.(2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models. arXiv:1302.2068 [Stat].
FoSr11
Foygel, R., & Srebro, N. (2011) Fast-rate and optimistic-rate error bounds for L1-regularized regression. arXiv:1108.0373 [Math, Stat].
FrHT08
Friedman, J., Hastie, T., & Tibshirani, R. (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. DOI.
GhLa13a
Ghadimi, S., & Lan, G. (2013a) Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming. arXiv:1310.3787 [Math].
GhLa13b
Ghadimi, S., & Lan, G. (2013b) Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming. SIAM Journal on Optimization, 23(4), 2341–2368. DOI.
GHIK13
Ghazi, B., Hassanieh, H., Indyk, P., Katabi, D., Price, E., & Shi, L. (2013) Sample-optimal average-case sparse Fourier Transform in two dimensions. In 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 1258–1265). DOI.
Giro01
Girolami, M. (2001) A Variational Method for Learning Sparse and Overcomplete Representations. Neural Computation, 13(11), 2517–2532. DOI.
GiSB14
Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [Cs, Math, Stat].
GIKM16
Greenhill, C., Isaev, M., Kwan, M., & McKay, B. D.(2016) The average number of spanning trees in sparse graphs with given degrees. arXiv:1606.01586 [Math].
GuLi05
Gui, J., & Li, H. (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001–3008. DOI.
HaLB15
Hallac, D., Leskovec, J., & Boyd, S. (2015) Network Lasso: Clustering and Optimization in Large Graphs. arXiv:1507.00280 [Cs, Math, Stat]. DOI.
HaRR15
Hansen, N. R., Reynaud-Bouret, P., & Rivoirard, V. (2015) Lasso and probabilistic inequalities for multivariate point processes. Bernoulli, 21(1), 83–143. DOI.
HaTW15
Hastie, T. J., Tibshirani, Rob, & Wainwright, M. J.(2015) Statistical Learning with Sparsity: The Lasso and Generalizations. . Boca Raton: Chapman and Hall/CRC
HaKD13
Hawe, S., Kleinsteuber, M., & Diepold, K. (2013) Analysis operator learning and its application to image reconstruction. IEEE Transactions on Image Processing, 22(6), 2138–2150. DOI.
HeRP14
He, D., Rish, I., & Parida, L. (2014) Transductive HSIC Lasso. In M. Zaki, Z. Obradovic, P. N. Tan, A. Banerjee, C. Kamath, & S. Parthasarathy (Eds.), Proceedings of the 2014 SIAM International Conference on Data Mining (pp. 154–162). Philadelphia, PA: Society for Industrial and Applied Mathematics
HeGe11
Hebiri, M., & van de Geer, S. A.(2011) The Smooth-Lasso and other ℓ1+ℓ2-penalized methods. Electronic Journal of Statistics, 5, 1184–1226. DOI.
HeIS15
Hegde, C., Indyk, P., & Schmidt, L. (2015) A nearly-linear time framework for graph-structured sparsity. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 928–937).
HCMF08
Hesterberg, T., Choi, N. H., Meier, L., & Fraley, C. (2008) Least angle and ℓ1 penalized regression: A review. Statistics Surveys, 2, 61–93. DOI.
HRLV10
Hormati, A., Roy, O., Lu, Y. M., & Vetterli, M. (2010) Distributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications. IEEE Transactions on Signal Processing, 58(3), 1095–1109. DOI.
HSDR14
Hsieh, C.-J., Sustik, M. A., Dhillon, I. S., & Ravikumar, P. D.(2014) QUIC: quadratic approximation for sparse inverse covariance estimation. Journal of Machine Learning Research, 15(1), 2911–2947.
HuPC15
Hu, T., Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. arXiv:1503.00690 [Cs, Q-Bio, Stat].
HuCB08
Huang, C., Cheang, G. L. H., & Barron, A. R.(2008) Risk of penalized least squares, greedy selection and l1 penalization for flexible function libraries.
JaFH15
Janson, L., Fithian, W., & Hastie, T. J.(2015) Effective degrees of freedom: a flawed metaphor. Biometrika, 102(2), 479–485. DOI.
JaMo14
Javanmard, A., & Montanari, A. (2014) Confidence Intervals and Hypothesis Testing for High-dimensional Regression. Journal of Machine Learning Research, 15(1), 2869–2909.
Kabá14
Kabán, A. (2014) New Bounds on Compressive Linear Least Squares Regression. (pp. 448–456). Presented at the Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
KiKC12
Kim, Y., Kwon, S., & Choi, H. (2012) Consistent model selection criteria on high dimensions. Journal of Machine Learning Research, 13(Apr), 1037–1057.
KrSB09
Krämer, N., Schäfer, J., & Boulesteix, A.-L. (2009) Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics, 10, 384. DOI.
LaFa09
Lam, C., & Fan, J. (2009) Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. Annals of Statistics, 37(6B), 4254–4278. DOI.
LaZw11
Lambert-Lacroix, S., & Zwald, L. (2011) Robust regression through the Huber’s criterion and adaptive lasso penalty. Electronic Journal of Statistics, 5, 1015–1053. DOI.
LaLZ09
Langford, J., Li, L., & Zhang, T. (2009) Sparse Online Learning via Truncated Gradient. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems 21 (pp. 905–912). Curran Associates, Inc.
LSST13
Lee, J. D., Sun, D. L., Sun, Y., & Taylor, J. E.(2013) Exact post-selection inference, with application to the lasso. arXiv:1311.6238 [Math, Stat].
LTTT14
Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R. (2014) A significance test for the lasso. The Annals of Statistics, 42(2), 413–468. DOI.
Maho16
Mahoney, M. W.(2016) Lecture Notes on Spectral Graph Methods. arXiv Preprint arXiv:1608.04845.
Mair15
Mairal, J. (2015) Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning. SIAM Journal on Optimization, 25(2), 829–855. DOI.
MaFH09
Mazumder, R., Friedman, J. H., & Hastie, T. J.(2009) SparseNet: Coordinate Descent with Non-Convex Penalties. . Stanford University
MeGB08
Meier, L., van de Geer, S., & Bühlmann, P. (2008) The group lasso for logistic regression. Group, 70(Part 1), 53–71.
MeBü06
Meinshausen, N., & Bühlmann, P. (2006) High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI.
MeYu09
Meinshausen, N., & Yu, B. (2009) Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics, 37(1), 246–270. DOI.
Mont12
Montanari, A. (2012) Graphical models concepts in compressed sensing. Compressed Sensing: Theory and Applications, 394–438.
MüGe15
Müller, P., & van de Geer, S. (2015) Censored linear model in high dimensions: Penalised linear regression on high-dimensional data with left-censored response variable. TEST. DOI.
NaGr12
Nam, S., & Gribonval, R. (2012) Physics-driven structured cosparse modeling for source localization. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5397–5400). DOI.
NeTr08
Needell, D., & Tropp, J. A.(2008) CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. arXiv:0803.2392 [Cs, Math].
Nest12
Nesterov, Y. (2012) Gradient methods for minimizing composite functions. Mathematical Programming, 140(1), 125–161. DOI.
NCBK11
Ngiam, J., Chen, Z., Bhaskar, S. A., Koh, P. W., & Ng, A. Y.(2011) Sparse Filtering. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24 (pp. 1125–1133). Curran Associates, Inc.
NiGe13
Nickl, R., & van de Geer, S. (2013) Confidence sets in sparse regression. The Annals of Statistics, 41(6), 2852–2876. DOI.
OJFH13
Oymak, S., Jalali, A., Fazel, M., & Hassibi, B. (2013) Noisy estimation of simultaneously structured models: Limitations of convex relaxation. In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC) (pp. 6019–6024). DOI.
PeEE10
Peleg, T., Eldar, Y. C., & Elad, M. (2010) Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery. IEEE Transactions on Signal Processing, 60(5), 2286–2303. DOI.
PGKY15
Peng, Z., Gurram, P., Kwon, H., & Yin, W. (2015) Optimal Sparse Kernel Learning for Hyperspectral Anomaly Detection. arXiv:1506.02585 [Cs].
PoHo15
Pouget-Abadie, J., & Horel, T. (2015) Inferring Graphs from Cascades: A Sparse Recovery Framework. In Proceedings of The 32nd International Conference on Machine Learning.
QiYa12
Qian, W., & Yang, Y. (2012) Model selection via standard error adjusted adaptive lasso. Annals of the Institute of Statistical Mathematics, 65(2), 295–318. DOI.
RaRe09
Rahimi, A., & Recht, B. (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
RWRY11
Ravikumar, P., Wainwright, M. J., Raskutti, G., & Yu, B. (2011) High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935–980. DOI.
RaBr15a
Ravishankar, S., & Bresler, Y. (2015a) Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI. arXiv:1501.02923 [Cs, Stat].
RaBr15b
Ravishankar, S., & Bresler, Y. (2015b) Sparsifying Transform Learning With Efficient Optimal Updates and Convergence Guarantees. IEEE Transactions on Signal Processing, 63(9), 2389–2404. DOI.
Reyn03
Reynaud-Bouret, P. (2003) Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probability Theory and Related Fields, 126(1). DOI.
ReSc10
Reynaud-Bouret, P., & Schbath, S. (2010) Adaptive estimation for Hawkes processes; application to genome analysis. The Annals of Statistics, 38(5), 2781–2822. DOI.
RiSG16
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. (pp. 1135–1144). ACM Press DOI.
RiGr14
Rish, I., & Grabarnik, G. (2014) Sparse Signal Recovery with Exponential-Family Noise. In A. Y. Carmi, L. Mihaylova, & S. J. Godsill (Eds.), Compressed Sensing & Sparse Filtering (pp. 77–93). Springer Berlin Heidelberg
RiGr15
Rish, I., & Grabarnik, G. Y.(2015) Sparse modeling: theory, algorithms, and applications. . Boca Raton, FL: CRC Press, Taylor & Francis Group
SSBA95
Sashank J. Reddi, Suvrit Sra, Barnabás Póczós, & Alex Smola. (1995) Stochastic Frank-Wolfe Methods for Nonconvex Optimization.
ScBD11
Schelldorfer, J., Bühlmann, P., & De Geer, S. V.(2011) Estimation for High-Dimensional Linear Mixed-Effects Models Using ℓ1-Penalization. Scandinavian Journal of Statistics, 38(2), 197–214. DOI.
ShOw10
She, Y., & Owen, A. B.(2010) Outlier Detection Using Nonconvex Penalized Regression.
SFHT11
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39(5).
SFJJ15
Smith, V., Forte, S., Jordan, M. I., & Jaggi, M. (2015) L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework. arXiv:1512.04011 [Cs].
Stin04
Stine, R. A.(2004) Discussion of “Least angle regression” by Efron et al. The Annals of Statistics, 32(2), 407–499.
SuBC15
Su, W., Bogdan, M., & Candès, E. J.(2015) False Discoveries Occur Early on the Lasso Path. arXiv:1511.01957 [Cs, Math, Stat].
Taddy, M. (2013) One-step estimator paths for concave regularization. arXiv:1308.5623 [Stat].
ThAH15
Thrampoulidis, C., Abbasi, E., & Hassibi, B. (2015) The LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements. arXiv:1506.02181 [Cs, Math, Stat].
Tibs96
Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
Tibs14
Tibshirani, R. J.(2014) A General Framework for Fast Stagewise Algorithms. arXiv:1408.5801 [Stat].
TrWr10
Tropp, J. A., & Wright, S. J.(2010) Computational Methods for Sparse Solution of Linear Inverse Problems. Proceedings of the IEEE, 98(6), 948–958. DOI.
Uema15
Uematsu, Y. (2015) Penalized Likelihood Estimation in High-Dimensional Time Series Models and its Application. arXiv:1504.06706 [Math, Stat].
Geer07
van de Geer, S. (2007) The deterministic Lasso.
Geer14a
van de Geer, S. (2014a) Statistical Theory for High-Dimensional Models. arXiv:1409.8557 [Math, Stat].
Geer14b
van de Geer, S. (2014b) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41(1), 72–86. DOI.
Geer14c
van de Geer, S. (2014c) Worst possible sub-directions in high-dimensional models. In arXiv:1403.7023 [math, stat] (Vol. 131).
Geer16
van de Geer, S. (2016) Estimation and Testing Under Sparsity. (Vol. 2159). Cham: Springer International Publishing
GeBZ11
van de Geer, S. A., Bühlmann, P., & Zhou, S. (2011) The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electronic Journal of Statistics, 5, 688–749. DOI.
GBRD14
van de Geer, S., Bühlmann, P., Ritov, Y. ’acov, & Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3), 1166–1202. DOI.
VeRo15
Veitch, V., & Roy, D. M.(2015) The Class of Random Graphs Arising from Exchangeable Random Measures. arXiv:1512.03099 [Cs, Math, Stat].
Wahb90
Wahba, G. (1990) Spline Models for Observational Data. . SIAM
WWZH15
Wang, D., Wu, P., Zhao, P., & Hoi, S. C. H.(2015) A Framework of Sparse Online Learning and Its Applications. arXiv:1507.07146 [Cs].
WCLH16
Wang, Z., Chang, S., Ling, Q., Huang, S., Hu, X., Shi, H., & Huang, T. S.(2016) Stacked Approximated Regression Machine: A Simple Deep Learning Approach. . Presented at the NIPS
WoCh15
Woodworth, J., & Chartrand, R. (2015) Compressed Sensing Recovery via Nonconvex Shrinkage Penalties. arXiv:1504.02923 [Cs, Math].
WuLa08
Wu, T. T., & Lange, K. (2008) Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics, 2(1), 224–244. DOI.
YNGD12
Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. E.(2012) Noise aware analysis operator learning for approximately cosparse signals. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5409–5412). DOI.
YuLi07
Yuan, M., & Lin, Y. (2007) Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1), 19–35. DOI.
YuTo09
Yun, S., & Toh, K.-C. (2009) A coordinate gradient descent method for ℓ 1-regularized convex minimization. Computational Optimization and Applications, 48(2), 273–307. DOI.
Zhan10
Zhang, C.-H. (2010) Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942. DOI.
ZhZh14
Zhang, C.-H., & Zhang, S. S.(2014) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217–242. DOI.
ZYJZ15
Zhang, L., Yang, T., Jin, R., & Zhou, Z.-H. (2015) Sparse Learning for Large-scale and High-dimensional Data: A Randomized Convex-concave Optimization Approach. arXiv:1511.03766 [Cs].
ZhRY09
Zhao, P., Rocha, G., & Yu, B. (2009) The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37(6A), 3468–3497. DOI.
ZhTW11
Zhou, T., Tao, D., & Wu, X. (2011) Manifold elastic net: a unified framework for sparse dimension reduction. Data Mining and Knowledge Discovery, 22(3), 340–371.
Zou06
Zou, H. (2006) The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101(476), 1418–1429. DOI.
ZoHa05
Zou, H., & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. DOI.
ZoHT07
Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.
ZoLi08
Zou, H., & Li, R. (2008) One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36(4), 1509–1533. DOI.