How to go data mining for models without “dredging” for models. (accidentally or otherwise) If you keep on testing models until you find some that fit (which you usually will) how do you know that the fit is in some sense interesting? How sharp will your conclusions be? How does it work when you are testing against a possibly uncountable continuum of hypotheses? (One perspective on sparsity penalties is precisely this, I am told.)
Model selection is this writ small  when you are testing how many variables to include in your model.
In modern highdimensional models, where you have potentially many explanatory variables, the question of how to handle the combinatorial explosion of possible variables to include, this can also be considered a multiple testing problem. We tend to regard this as a smoothing and model selection problem though.
This all gets more complicated when you think about many people testing many hypothesese in many different experiments then you are going to run into many more issues than just these  also publication bias and suchlike.
Suggestive connection:
Moritz Hardt, The machine learning leaderboard problem:
In this post, I will describe a method to climb the public leaderboard without even looking at the data. The algorithm is so simple and natural that an unwitting analyst might just run it. We will see that in Kaggle’s famous Heritage Health Prize competition this might have propelled a participant from rank around 150 into the top 10 on the public leaderboard without making progress on the actual problem.[…]
I get super excited. I keep climbing the leaderboard! Who would’ve thought that this machine learning thing was so easy? So, I go write a blog post on Medium about Big Data and score a job at DeepCompeting.ly, the latest data science startup in the city. Life is pretty sweet. I pick up indoor rock climbing, sign up for wood working classes; I read Proust and books about espresso. Two months later the competition closes and Kaggle releases the final score. What an embarrassment! Wacky boosting did nothing whatsoever on the final test set. I get fired from DeepCompeting.ly days before the buyout. My spouse dumps me. The lease expires. I get evicted from my apartment in the Mission. Inevitably, I hike the Pacific Crest Trail and write a novel about it.
See BlHa15 and DFHP15 for more of that.
Pvalue hacking

I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here's How  also the journalism problem, the journals problem, the vacuousfluffthatpassesforpublicdiscussion problem…
False discovery rate
FDR control…
 Testing Millions of Hypotheses is Larry Wasserman's introduction to controlling the false discovery rate. See also Screening and the false discovery rate. The man can explain clearly.
Familywise error rate
Šidák correction, Bonferroni correction…
Post selection inference
Misc applied
http://kadavy.net/blog/posts/aatesting/ http://businessofsoftware.org/2013/06/jasoncohenceowpenginewhydatacanmakeyoudothewrongthing/ http://www.evanmiller.org/thelowbaserateproblem.html
Refs
 Stor02: (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 479–498. DOI
 Tibs14: (2014) A General Framework for Fast Stagewise Algorithms. ArXiv:1408.5801 [Stat].
 EfNi08: (2008) A method to compute multiplicity corrected confidence intervals for odds ratios and other relative effect estimates. International Journal of Environmental Research and Public Health, 5(5), 394–398.
 RGSG17: (2017) A million variables and more: the Fast Greedy Equivalence Search algorithm for learning highdimensional graphical causal models, with an application to functional magnetic resonance images. International Journal of Data Science and Analytics, 3(2), 121–129. DOI
 FaLv10: (2010) A Selective Overview of Variable Selection in High Dimensional Feature Space. Statistica Sinica, 20(1), 101–148.
 LaMP15: (2015) A significance test for covariates in nonparametric regression. Electronic Journal of Statistics, 9, 643–678. DOI
 LTTT14: (2014) A significance test for the lasso. The Annals of Statistics, 42(2), 413–468. DOI
 BeGa09: (2009) A simple forward selection procedure based on false discovery rate control. The Annals of Applied Statistics, 3(1), 179–198. DOI
 DoJo95: (1995) Adapting to Unknown Smoothness via Wavelet Shrinkage. Journal of the American Statistical Association, 90(432), 1200–1224. DOI
 ABDJ06: (2006) Adapting to unknown sparsity by controlling the false discovery rate. The Annals of Statistics, 34(2), 584–653. DOI
 GeWa08: (2008) Adaptive confidence bands. The Annals of Statistics, 36(2), 875–905. DOI
 AiGe96: (1996) Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. American Journal of Public Health, 86(5), 726–728. DOI
 CaSh98: (1998) An Akaike information criterion for model selection in the presence of incomplete data. Journal of Statistical Planning and Inference, 67(1), 45–65. DOI
 Ston77: (1977) An Asymptotic Equivalence of Choice of Model by CrossValidation and Akaike’s Criterion. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 44–47.
 ClKO09: (2009) Asymptotic properties of penalized spline estimators. Biometrika, 96(3), 529–544. DOI
 Dasg08: (2008) Asymptotic Theory of Statistics and Probability. New York: Springer New York
 BaHy01: (2001) Bandwidth selection for kernel conditional density estimation. Computational Statistics & Data Analysis, 36(3), 279–298. DOI
 Efro79: (1979) Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1–26. DOI
 RFFE15: (2015) Choosing experiments to accelerate collective discovery. Proceedings of the National Academy of Sciences, 112(47), 14569–14574. DOI
 ZhZh14: (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217–242. DOI
 EwSc15: (2015) Confidence Sets Based on the Lasso Estimator. ArXiv:1507.05315 [Math, Stat].
 NiGe13: (2013) Confidence sets in sparse regression. The Annals of Statistics, 41(6), 2852–2876. DOI
 Bune04: (2004) Consistent covariate selection and post model selection inference in semiparametric regression. The Annals of Statistics, 32(3), 898–927. DOI
 BeHo95: (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.
 BaCa15: (2015) Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5), 2055–2085. DOI
 Efro10a: (2010a) Correlated zvalues and the accuracy of largescale statistical estimates. Journal of the American Statistical Association, 105(491), 1042–1055. DOI
 Küns86: (1986) Discrimination between monotonic trends and longrange dependence. Journal of Applied Probability, 23(4), 1025–1030.
 Efro07: (2007) Doing thousands of hypothesis tests at the same time. Metron  International Journal of Statistics, LXV(1), 3–21.
 JaFH15: (2015) Effective degrees of freedom: a flawed metaphor. Biometrika, 102(2), 479–485. DOI
 Efro09: (2009) Empirical Bayes Estimates for LargeScale Prediction Problems. Journal of the American Statistical Association, 104(487), 1015–1028. DOI
 CaWB08: (2008) Enhancing Sparsity by Reweighted ℓ 1 Minimization. Journal of Fourier Analysis and Applications, 14(5–6), 877–905. DOI
 MeRi06: (2006) Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. The Annals of Statistics, 34(1), 373–393. DOI
 AnKo85: (1985) Estimation, Filtering, and Smoothing in State Space Models with Incompletely Specified Initial Conditions. The Annals of Statistics, 13(4), 1286–1316. DOI
 TLTT14: (2014) Exact Postselection Inference for Forward Stepwise and Least Angle Regression. ArXiv:1401.3889 [Stat].
 LSST13: (2013) Exact postselection inference, with application to the lasso. ArXiv:1311.6238 [Math, Stat].
 SuBC15: (2015) False Discoveries Occur Early on the Lasso Path. ArXiv:1511.01957 [Cs, Math, Stat].
 Mein06: (2006) False Discovery Control for Multiple Tests of Association Under General Dependence. Scandinavian Journal of Statistics, 33(2), 227–237. DOI
 TKPS14: (2014) False discovery rate smoothing. ArXiv:1411.6144 [Stat].
 BeYe05: (2005) False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters. Journal of the American Statistical Association, 100(469), 71–81. DOI
 KoKi96: (1996) Generalised information criteria in model selection. Biometrika, 83(4), 875–890. DOI
 Mein14: (2014) Group bound: confidence intervals for groups of variables in sparse high dimensional regression without assumptions on the design. Journal of the Royal Statistical Society: Series B (Statistical Methodology), n/an/a. DOI
 MeBü06: (2006) Highdimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI
 DBMM14: (2014) Highdimensional Inference: Confidence intervals, pvalues and RSoftware hdi. ArXiv:1408.4026 [Stat].
 BüGe15: (2015) Highdimensional inference in misspecified linear models. ArXiv:1503.06426 [Stat], 9(1), 1449–1473. DOI
 WaRo09: (2009) Highdimensional variable selection. Annals of Statistics, 37(5A), 2178–2201. DOI
 SiLi14: (2014) Higher Criticism: pvalues and Criticism. ArXiv:1411.1437 [Math, Stat].
 LSWA15: (2015) HighReproducibility and HighAccuracy Method for Automated Topic Classification. Physical Review X, 5(1), 011007. DOI
 JaGe15: (2015) Honest confidence regions and optimality in highdimensional precision matrix estimation. ArXiv:1507.02061 [Math, Stat].
 Efro86: (1986) How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81(394), 461–470. DOI
 Nobl09: (2009) How does multiple testing correction work? Nature Biotechnology, 27(12), 1135–1137. DOI
 CoBa17: (2017) Large numbers of explanatory variables, a semidescriptive analysis. Proceedings of the National Academy of Sciences, 114(32), 8592–8595. DOI
 CaSu17: (2017) LargeScale Global and Simultaneous Inference: Estimation and Testing in Very High Dimensions. Annual Review of Economics, 9(1), 411–439. DOI
 Efro13: (2013) LargeScale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge: Cambridge University Press
 MeYu09: (2009) Lassotype recovery of sparse representations for highdimensional data. The Annals of Statistics, 37(1), 246–270. DOI
 HCMF08: (2008) Least angle and ℓ1 penalized regression: A review. Statistics Surveys, 2, 61–93. DOI
 HjJo96: (1996) Locally parametric nonparametric density estimation. The Annals of Statistics, 24(4), 1619–1647. DOI
 MeBü05: (2005) Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures. Biometrika, 92(4), 893–907. DOI
 GoWh04: (2004) Maximum likelihood and the bootstrap for nonlinear dynamic models. Journal of Econometrics, 119(1), 199–219. DOI
 GLAB14: (2014) Metric Learning for Temporal Sequence Alignment. In Advances in Neural Information Processing Systems 27 (pp. 1817–1825). Curran Associates, Inc.
 BuBA97: (1997) Model Selection: An Integral Part of Inference. Biometrics, 53(2), 603–618. DOI
 Bach09: (2009) ModelConsistent Sparse Estimation through the Bootstrap
 BuAn04: (2004) Multimodel Inference Understanding AIC and BIC in Model Selection. Sociological Methods & Research, 33(2), 261–304. DOI
 Roth90: (1990) No adjustments are needed for multiple comparisons. Epidemiology (Cambridge, Mass.), 1(1), 43–46.
 ArEm11: (2011) Nonparametric goodnessoffit tests for discrete null distributions. The R Journal, 3(2), 34–39.
 GBRD14: (2014) On asymptotically optimal confidence regions and tests for highdimensional models. The Annals of Statistics, 42(3), 1166–1202. DOI
 DeHM08: (2008) On Deconvolution with Repeated Measurements. The Annals of Statistics, 36(2), 665–685. DOI
 Hjor92: (1992) On Inference in Parametric Survival Data Models. International Statistical Review / Revue Internationale de Statistique, 60(3), 355–387. DOI
 ZoHT07: (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI
 Tadd13: (2013) Onestep estimator paths for concave regularization. ArXiv:1308.5623 [Stat].
 CFJL16: (2016) Panning for Gold: Modelfree Knockoffs for Highdimensional Controlled Variable Selection. ArXiv Preprint ArXiv:1610.02351.
 RoZh07: (2007) Piecewise linear regularized solution paths. The Annals of Statistics, 35(3), 1012–1030. DOI
 DFHP14: (2014) Preserving Statistical Validity in Adaptive Data Analysis. ArXiv:1411.2664 [Cs].
 MeMB09: (2009) pValues for HighDimensional Regression. Journal of the American Statistical Association, 104(488), 1671–1681. DOI
 MüBe14: (2014) pystruct  Learning Structured Prediction in Python. Journal of Machine Learning Research, 15, 2055–2060.
 EvDi00: (n.d.) Recovering from Selection Bias using Marginal Structure in Discrete Models.
 HuTs89: (1989) Regression and time series model selection in small samples. Biometrika, 76(2), 297–307. DOI
 FrHT10: (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI
 Mein07: (2007) Relaxed Lasso. Computational Statistics & Data Analysis, 52(1), 374–393. DOI
 CaRT06: (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509. DOI
 Efro04a: (2004a) Selection and Estimation for LargeScale Simultaneous Inference.
 IyGr88: (1988) Selection Models and the File Drawer Problem. Statistical Science, 3(1), 109–117. DOI
 HjWL92: (1992) Semiparametric Estimation Of Parametric Hazard Rates. In Survival Analysis: State of the Art (pp. 211–236). Springer Netherlands
 Ichi93: (1993) Semiparametric least squares (SLS) and weighted SLS estimation of singleindex models. Journal of Econometrics, 58(1–2), 71–120. DOI
 KoCW15: (2015) Sequential Tests for LargeScale Learning. Neural Computation, 28(1), 45–70. DOI
 CuVD11: (2011) Significance testing in ridge regression for genetic data. BMC Bioinformatics, 12, 372. DOI
 Benj10: (2010) Simultaneous and selective inference: Current successes and future challenges. Biometrical Journal, 52(6), 708–721. DOI
 ClZi75: (1975) Simultaneous Estimation of the Means of Independent Poisson Laws. Journal of the American Statistical Association, 70(351a), 698–705. DOI
 Efro08: (2008) Simultaneous Inference: When Should Hypothesis Testing Problems Be Combined? The Annals of Applied Statistics, 2(1), 197–223. DOI
 MeBü10: (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI
 CoMa85: (1985) Testing Goodness of Fit for the Poisson Assumption When Observations Are Not Identically Distributed. Journal of the American Statistical Association, 80(390), 411–418. DOI
 LaRa12: (2012) The cost of large numbers of hypothesis tests on power, effect size and sample size. Molecular Psychiatry, 17(1), 108–114. DOI
 Efro04b: (2004b) The Estimation of Prediction Error. Journal of the American Statistical Association, 99(467), 619–632. DOI
 Efro10b: (2010b) The Future of Indirect Evidence. Statistical Science, 25(2), 145–157. DOI
 DaBa16: (2016) The knockoff filter for FDR control in groupsparse and multitask regression. ArXiv Preprint ArXiv:1602.03589.
 BlHa15: (2015) The Ladder: A Reliable Leaderboard for Machine Learning Competitions. ArXiv:1502.04585 [Cs].
 GeLe11: (2011) The Lasso, correlated design, and improved oracle inequalities. ArXiv:1107.0189 [Stat].
 DFHP15: (2015) The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248), 636–638. DOI
 GeLo14: (2014) The Statistical Crisis in Science. American Scientist, 102(6), 460. DOI
 FrLu14: (2014) Unconscious lie detection as an example of a widespread fallacy in the Neurosciences. ArXiv:1407.4240 [qBio, Stat].
 TRTW15: (2015) Uniform Asymptotic Inference and the Bootstrap After Model Selection. ArXiv:1506.06266 [Math, Stat].
 Cava97: (1997) Unifying the derivations for the Akaike and corrected Akaike information criteria. Statistics & Probability Letters, 33(2), 201–208. DOI
 ChHS15: (2015) Valid PostSelection and PostRegularization Inference: An Elementary, General Approach. Annual Review of Economics, 7(1), 649–688. DOI
 BBBZ13: (2013) Valid postselection inference. The Annals of Statistics, 41(2), 802–837. DOI
 LiLi08: (2008) Variable selection in semiparametric regression modeling. The Annals of Statistics, 36(1), 261–286. DOI
 FaLi01: (2001) Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348–1360. DOI
 TPSR15: (2015) VectorSpace Markov Random Fields via Exponential Families. In Journal of Machine Learning Research (pp. 684–692).
 DJKP95: (1995) Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society. Series B (Methodological), 57(2), 301–369.
 KaRo14: (2014) When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples. Biometrika, 101(4), 771–784. DOI
 Ioan05: (2005) Why most published research findings are false. PLoS Medicine, 2(8), 124. DOI