The Living Thing / Notebooks :

Multiple testing

How to go data mining for models without “dredging” for models. (accidentally or otherwise) If you keep on testing models until you find some that fit (which you usually will) how do you know that the fit is in some sense interesting? How sharp will your conclusions be? How does it work when you are testing against a possibly uncountable continuum of hypotheses? (One perspective on sparsity penalties is precisely this, I am told.)

Model selection is this writ small - when you are testing how many variables to include in your model.

In modern high-dimensional models, where you have potentially many explanatory variables, the question of how to handle the combinatorial explosion of possible variables to include, this can also be considered a multiple testing problem. We tend to regard this as a smoothing and model selection problem though.

This all gets more complicated when you think about many people testing many hypothesese in many different experiments then you are going to run into many more issues than just these - also publication bias and suchlike.

Nice hack:

The reusable holdout: Preserving validity in adaptive data analysis which, like everything these days, uses differential privacy methods. Soon I will have my smoothies made by ensuring differential privacy for my bananas’ identities.

Suggestive connection:

Moritz Hardt, The machine learning leaderboard problem:

In this post, I will describe a method to climb the public leaderboard without even looking at the data. The algorithm is so simple and natural that an unwitting analyst might just run it. We will see that in Kaggle’s famous Heritage Health Prize competition this might have propelled a participant from rank around 150 into the top 10 on the public leaderboard without making progress on the actual problem.[…]

I get super excited. I keep climbing the leaderboard! Who would’ve thought that this machine learning thing was so easy? So, I go write a blog post on Medium about Big Data and score a job at DeepCompeting.ly, the latest data science startup in the city. Life is pretty sweet. I pick up indoor rock climbing, sign up for wood working classes; I read Proust and books about espresso. Two months later the competition closes and Kaggle releases the final score. What an embarrassment! Wacky boosting did nothing whatsoever on the final test set. I get fired from DeepCompeting.ly days before the buyout. My spouse dumps me. The lease expires. I get evicted from my apartment in the Mission. Inevitably, I hike the Pacific Crest Trail and write a novel about it.

See BlHa15 and DFHP15 for more of that.

P-value hacking

Sub topic.

To read

ABDJ06
Abramovich, F., Benjamini, Y., Donoho, D. L., & Johnstone, I. M.(2006) Adapting to unknown sparsity by controlling the false discovery rate. The Annals of Statistics, 34(2), 584–653. DOI.
AiGe96
Aickin, M., & Gensler, H. (1996) Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. American Journal of Public Health, 86(5), 726–728. DOI.
ArEm11
Arnold, T. B., & Emerson, J. W.(2011) Nonparametric goodness-of-fit tests for discrete null distributions. The R Journal, 3(2), 34–39.
Bach00
Bach, F. (n.d.) Model-Consistent Sparse Estimation through the Bootstrap.
BaHy01
Bashtannyk, D. M., & Hyndman, R. J.(2001) Bandwidth selection for kernel conditional density estimation. Computational Statistics & Data Analysis, 36(3), 279–298. DOI.
Benj10
Benjamini, Y. (2010) Simultaneous and selective inference: Current successes and future challenges. Biometrical Journal, 52(6), 708–721. DOI.
BeGa09
Benjamini, Y., & Gavrilov, Y. (2009) A simple forward selection procedure based on false discovery rate control. The Annals of Applied Statistics, 3(1), 179–198. DOI.
BeHo95
Benjamini, Y., & Hochberg, Y. (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.
BeYe05
Benjamini, Y., & Yekutieli, D. (2005) False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters. Journal of the American Statistical Association, 100(469), 71–81. DOI.
BBBZ13
Berk, R., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2013) Valid post-selection inference. The Annals of Statistics, 41(2), 802–837. DOI.
BlHa15
Blum, A., & Hardt, M. (2015) The Ladder: A Reliable Leaderboard for Machine Learning Competitions. arXiv:1502.04585 [cs].
BuBA97
Buckland, S. T., Burnham, K. P., & Augustin, N. H.(1997) Model Selection: An Integral Part of Inference. Biometrics, 53(2), 603–618. DOI.
BüGe15
Bühlmann, P., & van de Geer, S. (2015) High-dimensional inference in misspecified linear models. arXiv:1503.06426 [stat], 9(1), 1449–1473. DOI.
Bune04
Bunea, F. (2004) Consistent covariate selection and post model selection inference in semiparametric regression. The Annals of Statistics, 32(3), 898–927. DOI.
BuAn04
Burnham, K. P., & Anderson, D. R.(2004) Multimodel Inference Understanding AIC and BIC in Model Selection. Sociological Methods & Research, 33(2), 261–304. DOI.
Came13
Cameron, A. C.(2013) Inference for Health Econometrics: Inference, Model Tests, Diagnostics, Multiple Tests, and Bootstrap.
CaRT06
Candès, E. J., Romberg, J., & Tao, T. (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509. DOI.
Cava97
Cavanaugh, J. E.(1997) Unifying the derivations for the Akaike and corrected Akaike information criteria. Statistics & Probability Letters, 33(2), 201–208. DOI.
CaSh98
Cavanaugh, J. E., & Shumway, R. H.(1998) An Akaike information criterion for model selection in the presence of incomplete data. Journal of Statistical Planning and Inference, 67(1), 45–65. DOI.
ChHS15
Chernozhukov, V., Hansen, C., & Spindler, M. (2015) Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach. Annual Review of Economics, 7(1), 649–688. DOI.
ClKO09
Claeskens, G., Krivobokova, T., & Opsomer, J. D.(2009) Asymptotic properties of penalized spline estimators. Biometrika, 96(3), 529–544. DOI.
ClZi75
Clevenson, M. L., & Zidek, J. V.(1975) Simultaneous Estimation of the Means of Independent Poisson Laws. Journal of the American Statistical Association, 70(351a), 698–705. DOI.
CoMa85
Collings, B. J., & Margolin, B. H.(1985) Testing Goodness of Fit for the Poisson Assumption When Observations Are Not Identically Distributed. Journal of the American Statistical Association, 80(390), 411–418. DOI.
CuVD11
Cule, E., Vineis, P., & De Iorio, M. (2011) Significance testing in ridge regression for genetic data. BMC Bioinformatics, 12, 372. DOI.
Dasg08
DasGupta, A. (2008) Asymptotic Theory of Statistics and Probability. . New York: Springer New York
DeHM08
Delaigle, A., Hall, P., & Meister, A. (2008) On Deconvolution with Repeated Measurements. The Annals of Statistics, 36(2), 665–685. DOI.
DBMM14
Dezeure, R., Bühlmann, P., Meier, L., & Meinshausen, N. (2014) High-dimensional Inference: Confidence intervals, p-values and R-Software hdi. arXiv:1408.4026 [stat].
DoJo95
Donoho, D. L., & Johnstone, I. M.(1995) Adapting to Unknown Smoothness via Wavelet Shrinkage. Journal of the American Statistical Association, 90(432), 1200–1224. DOI.
DJKP95
Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard, D. (1995) Wavelet Shrinkage: Asymptopia?. Journal of the Royal Statistical Society. Series B (Methodological), 57(2), 301–369.
DFHP14
Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., & Roth, A. (2014) Preserving Statistical Validity in Adaptive Data Analysis. arXiv:1411.2664 [cs].
DFHP15
Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., & Roth, A. (2015) The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248), 636–638. DOI.
EfNi08
Efird, J. T., & Nielsen, S. S.(2008) A method to compute multiplicity corrected confidence intervals for odds ratios and other relative effect estimates. International Journal of Environmental Research and Public Health, 5(5), 394–398.
Efro79
Efron, B. (1979) Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1–26. DOI.
Efro86
Efron, B. (1986) How biased is the apparent error rate of a prediction rule?. Journal of the American Statistical Association, 81(394), 461–470. DOI.
Efro04a
Efron, B. (2004a) Selection and Estimation for Large-Scale Simultaneous Inference.
Efro04b
Efron, B. (2004b) The Estimation of Prediction Error. Journal of the American Statistical Association, 99(467), 619–632. DOI.
Efro07
Efron, B. (2007) Doing thousands of hypothesis tests at the same time. Metron - International Journal of Statistics, LXV(1), 3–21.
Efro08
Efron, B. (2008) Simultaneous Inference: When Should Hypothesis Testing Problems Be Combined?. The Annals of Applied Statistics, 2(1), 197–223. DOI.
Efro09
Efron, B. (2009) Empirical Bayes Estimates for Large-Scale Prediction Problems. Journal of the American Statistical Association, 104(487), 1015–1028. DOI.
Efro10a
Efron, B. (2010a) Correlated z-values and the accuracy of large-scale statistical estimates. Journal of the American Statistical Association, 105(491), 1042–1055. DOI.
Efro10b
Efron, B. (2010b) The Future of Indirect Evidence. Statistical Science, 25(2), 145–157. DOI.
Efro13
Efron, B. (2013) Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. (Reprint edition.). Cambridge: Cambridge University Press
EvDi00
Evans, R. J., & Didelez, V. (n.d.) Recovering from Selection Bias using Marginal Structure in Discrete Models.
EwSc15
Ewald, K., & Schneider, U. (2015) Confidence Sets Based on the Lasso Estimator. arXiv:1507.05315 [math, Stat].
FaLi01
Fan, J., & Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348–1360. DOI.
FaLv10
Fan, J., & Lv, J. (2010) A Selective Overview of Variable Selection in High Dimensional Feature Space. Statistica Sinica, 20(1), 101–148.
FrLu14
Franz, V. H., & von Luxburg, U. (2014) Unconscious lie detection as an example of a widespread fallacy in the Neurosciences. arXiv:1407.4240 [q-Bio, Stat].
FrHT10
Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
GLAB14
Garreau, D., Lajugie, R., Arlot, S., & Bach, F. (2014) Metric Learning for Temporal Sequence Alignment. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 1817–1825). Curran Associates, Inc.
GeLo14
Gelman, A., & Loken, E. (2014) The Statistical Crisis in Science. American Scientist, 102(6), 460. DOI.
GeWa08
Genovese, C., & Wasserman, L. (2008) Adaptive confidence bands. The Annals of Statistics, 36(2), 875–905. DOI.
GhBe01
Ghosh, A., & Bera, A. K.(2001) Neyman’s Smooth Test and Its Applications in Econometrics (SSRN Scholarly Paper No. ID 272888). . Rochester, NY: Social Science Research Network
GoWh04
Gonçalves, S., & White, H. (2004) Maximum likelihood and the bootstrap for nonlinear dynamic models. Journal of Econometrics, 119(1), 199–219. DOI.
HKMR92
Hanson, B., Klink, K., Matsuura, K., Robeson, S. M., & Willmott, C. J.(1992) Vector Correlation: Review, Exposition, and Geographic Application. Annals of the Association of American Geographers, 82(1), 103–116. DOI.
HaGe09
Harrison, M., & Geman, S. (2009) A Rate and History-Preserving Resampling Algorithm for Neural Spike Trains. Neural Computation, 21(5), 1244–1258. DOI.
HaAK13
Harrison, M. T., Amarasingham, A., & Kass, R. E.(2013) Statistical Identification of Synchronous Spiking. In P. M. DiLorenzo & J. D. Victor (Eds.), Spike Timing: Mechanisms and Function. CRC Press
HCMF08
Hesterberg, T., Choi, N. H., Meier, L., & Fraley, C. (2008) Least angle and ℓ1 penalized regression: A review. Statistics Surveys, 2, 61–93. DOI.
Hjor92
Hjort, N. L.(1992) On Inference in Parametric Survival Data Models. International Statistical Review / Revue Internationale de Statistique, 60(3), 355–387. DOI.
HjJo96
Hjort, N. L., & Jones, M. C.(1996) Locally parametric nonparametric density estimation. The Annals of Statistics, 24(4), 1619–1647. DOI.
HjWL92
Hjort, N. L., West, M., & Leurgans, S. (1992) Semiparametric Estimation Of Parametric Hazard Rates. In J. P. Klein & P. K. Goel (Eds.), Survival Analysis: State of the Art (pp. 211–236). Springer Netherlands
HuTs89
Hurvich, C. M., & Tsai, C.-L. (1989) Regression and time series model selection in small samples. Biometrika, 76(2), 297–307. DOI.
Ichi93
Ichimura, H. (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58(1–2), 71–120. DOI.
Ioan05
Ioannidis, J. P.(2005) Why most published research findings are false. PLoS Medicine, 2(8), -124. DOI.
IyGr88
Iyengar, S., & Greenhouse, J. B.(1988) Selection Models and the File Drawer Problem. Statistical Science, 3(1), 109–117. DOI.
JaGe15
Janková, J., & van de Geer, S. (2015) Honest confidence regions and optimality in high-dimensional precision matrix estimation. arXiv:1507.02061 [math, Stat].
JaFH13
Janson, L., Fithian, W., & Hastie, T. (2013) Effective Degrees of Freedom: A Flawed Metaphor. arXiv:1312.7851 [stat].
KaRo14
Kaufman, S., & Rosset, S. (2014) When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples. Biometrika, 101(4), 771–784. DOI.
KoKi96
Konishi, S., & Kitagawa, G. (1996) Generalised information criteria in model selection. Biometrika, 83(4), 875–890. DOI.
KoCW15
Korattikara, A., Chen, Y., & Welling, M. (2015) Sequential Tests for Large-Scale Learning. Neural Computation, 28(1), 45–70. DOI.
Küns86
Künsch, H. R.(1986) Discrimination between monotonic trends and long-range dependence. Journal of Applied Probability, 23(4), 1025–1030.
LSWA15
Lancichinetti, A., Sirer, M. I., Wang, J. X., Acuna, D., Körding, K., & Amaral, L. A. N.(2015) High-Reproducibility and High-Accuracy Method for Automated Topic Classification. Physical Review X, 5(1), 011007. DOI.
LaMP15
Lavergne, P., Maistre, S., & Patilea, V. (2015) A significance test for covariates in nonparametric regression. Electronic Journal of Statistics, 9, 643–678. DOI.
LaRa12
Lazzeroni, L. C., & Ray, A. (2012) The cost of large numbers of hypothesis tests on power, effect size and sample size. Molecular Psychiatry, 17(1), 108–114. DOI.
LSST13
Lee, J. D., Sun, D. L., Sun, Y., & Taylor, J. E.(2013) Exact post-selection inference, with application to the lasso. arXiv:1311.6238 [math, Stat].
LiLi08
Li, R., & Liang, H. (2008) Variable selection in semiparametric regression modeling. The Annals of Statistics, 36(1), 261–286. DOI.
LTTT14
Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R. (2014) A significance test for the lasso. The Annals of Statistics, 42(2), 413–468. DOI.
Mein06
Meinshausen, N. (2006) False Discovery Control for Multiple Tests of Association Under General Dependence. Scandinavian Journal of Statistics, 33(2), 227–237. DOI.
Mein07
Meinshausen, N. (2007) Relaxed Lasso. Computational Statistics & Data Analysis, 52(1), 374–393. DOI.
Mein14
Meinshausen, N. (2014) Group bound: confidence intervals for groups of variables in sparse high dimensional regression without assumptions on the design. Journal of the Royal Statistical Society: Series B (Statistical Methodology), n/a–n/a. DOI.
MeBü05
Meinshausen, N., & Bühlmann, P. (2005) Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures. Biometrika, 92(4), 893–907. DOI.
MeBü06
Meinshausen, N., & Bühlmann, P. (2006) High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI.
MeBü10
Meinshausen, N., & Bühlmann, P. (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI.
MeMB09
Meinshausen, N., Meier, L., & Bühlmann, P. (2009) p-Values for High-Dimensional Regression. Journal of the American Statistical Association, 104(488), 1671–1681. DOI.
MeRi06
Meinshausen, N., & Rice, J. (2006) Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. The Annals of Statistics, 34(1), 373–393. DOI.
MeYu09
Meinshausen, N., & Yu, B. (2009) Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics, 37(1), 246–270. DOI.
MüBe14
Müller, A. C., & Behnke, S. (2014) pystruct - Learning Structured Prediction in Python. Journal of Machine Learning Research, 15, 2055–2060.
NiGe13
Nickl, R., & van de Geer, S. (2013) Confidence sets in sparse regression. The Annals of Statistics, 41(6), 2852–2876. DOI.
Nobl09
Noble, W. S.(2009) How does multiple testing correction work?. Nature Biotechnology, 27(12), 1135–1137. DOI.
RoZh07
Rosset, S., & Zhu, J. (2007) Piecewise linear regularized solution paths. The Annals of Statistics, 35(3), 1012–1030. DOI.
Roth90
Rothman, K. J.(1990) No adjustments are needed for multiple comparisons. Epidemiology (Cambridge, Mass.), 1(1), 43–46.
RFFE15
Rzhetsky, A., Foster, J. G., Foster, I. T., & Evans, J. A.(2015) Choosing experiments to accelerate collective discovery. Proceedings of the National Academy of Sciences, 112(47), 14569–14574. DOI.
SiLi14
Siegmund, D. O., & Li, J. (2014) Higher Criticism: p-values and Criticism. arXiv:1411.1437 [math, Stat].
Ston77
Stone, M. (1977) An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 44–47.
Stor02
Storey, J. D.(2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 479–498. DOI.
SuBC15
Su, W., Bogdan, M., & Candès, E. J.(2015) False Discoveries Occur Early on the Lasso Path. arXiv:1511.01957 [cs, Math, Stat].
Tadd13
Taddy, M. (2013) One-step estimator paths for concave regularization. arXiv:1308.5623 [stat].
TKPS14
Tansey, W., Koyejo, O., Poldrack, R. A., & Scott, J. G.(2014) False discovery rate smoothing. arXiv:1411.6144 [stat].
TPSR15
Tansey, W., Padilla, O. H. M., Suggala, A. S., & Ravikumar, P. (2015) Vector-Space Markov Random Fields via Exponential Families. (pp. 684–692). Presented at the Proceedings of The 32nd International Conference on Machine Learning
Tarm11
Tarmaratram, K. (2011) Robust Estimation and Model Selection in Semiparametric Regression Models. . Katholieke Universiteit Leuven
TLTT14
Taylor, J., Lockhart, R., Tibshirani, R. J., & Tibshirani, R. (2014) Exact Post-selection Inference for Forward Stepwise and Least Angle Regression. arXiv:1401.3889 [stat].
Tibs14
Tibshirani, R. J.(2014) A General Framework for Fast Stagewise Algorithms. arXiv:1408.5801 [stat].
TRTW15
Tibshirani, R. J., Rinaldo, A., Tibshirani, R., & Wasserman, L. (2015) Uniform Asymptotic Inference and the Bootstrap After Model Selection. arXiv:1506.06266 [math, Stat].
GBRD14
van de Geer, S., Bühlmann, P., Ritov, Y. ’acov, & Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3), 1166–1202. DOI.
GeLe11
van de Geer, S., & Lederer, J. (2011) The Lasso, correlated design, and improved oracle inequalities. arXiv:1107.0189 [stat].
WaRo09
Wasserman, L., & Roeder, K. (2009) High-dimensional variable selection. Annals of Statistics, 37(5A), 2178–2201. DOI.
ZhZh14
Zhang, C.-H., & Zhang, S. S.(2014) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217–242. DOI.
ZoHT07
Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.