# Covariance estimation for stochastic processes

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧 🚧 🚧

Estimating the thing that is always given to you by oracles in statistics homework assignments, the covariance, precision, concentration matrices of things, or more generally, the covariance kernel. A complement to Gaussian process simulation. A thing that can make Gaussian process regression go better.

Estimating. Turns about to be a lot more involved than estimating means in various ways and at various times. Long story.

NB I am not doing a complete theory of covariance estimation here, just mentioning a couple of tidbits for future reference.

Wishart priors 🚧

## Sandwich estimators

For robust covariances of vector data. AKA Heteroskedasticity-consistent covariance estimators. Incorporating Eicker-Huber-White sandwich estimator, Andrews kernel HAC estimator, Newey-West and others. For an intro see Achim Zeileis, Open-Source Econometric Computing in R.

# Refs

Aragam, Bryon, Jiaying Gu, and Qing Zhou. 2017. “Learning Large-Scale Bayesian Networks with the Sparsebn Package,” March. http://arxiv.org/abs/1703.04025.

Azizyan, Martin, Akshay Krishnamurthy, and Aarti Singh. 2015. “Extreme Compressive Sampling for Covariance Estimation,” June. http://arxiv.org/abs/1506.00898.

Baik, Jinho, Gérard Ben Arous, and Sandrine Péché. 2005. “Phase Transition of the Largest Eigenvalue for Nonnull Complex Sample Covariance Matrices.” The Annals of Probability 33 (5): 1643–97.

Banerjee, Onureena, Laurent El Ghaoui, and Alexandre d’Aspremont. 2008. “Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research 9 (Mar): 485–516. http://www.jmlr.org/papers/v9/banerjee08a.html.

Barnard, John, Robert McCulloch, and Xiao-Li Meng. 2000. “Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, with Application to Shrinkage.” Statistica Sinica 10 (4): 1281–1311. http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=10&num=4&art=16.

Ben Arous, Gérard, and Sandrine Péché. 2005. “Universality of Local Eigenvalue Statistics for Some Sample Covariance Matrices.” Communications on Pure and Applied Mathematics 58 (10): 1316–57. https://doi.org/10.1002/cpa.20070.

Cai, T. Tony, Cun-Hui Zhang, and Harrison H. Zhou. 2010. “Optimal Rates of Convergence for Covariance Matrix Estimation.” The Annals of Statistics 38 (4): 2118–44. https://doi.org/10.1214/09-AOS752.

Chan, G., and A. T. A. Wood. 1999. “Simulation of Stationary Gaussian Vector Fields.” Statistics and Computing 9 (4): 265–68. https://doi.org/10.1023/A:1008903804954.

Chan, Tony F., Gene H. Golub, and Randall J. Leveque. 1983. “Algorithms for Computing the Sample Variance: Analysis and Recommendations.” The American Statistician 37 (3): 242–47. https://doi.org/10.1080/00031305.1983.10483115.

Cook, R. Dennis. 2018. “Principal Components, Sufficient Dimension Reduction, and Envelopes.” Annual Review of Statistics and Its Application 5 (1): 533–59. https://doi.org/10.1146/annurev-statistics-031017-100257.

Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. “Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press. https://doi.org/10.1145/1390156.1390181.

Daniels, M. J., and M. Pourahmadi. 2009. “Modeling Covariance Matrices via Partial Autocorrelations.” Journal of Multivariate Analysis 100 (10): 2352–63. https://doi.org/10.1016/j.jmva.2009.04.015.

Dasgupta, Sanjoy, and Daniel Hsu. 2007. “On-Line Estimation with the Multivariate Gaussian Distribution.” In Learning Theory, edited by Nader H. Bshouty and Claudio Gentile, 4539:278–92. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_21.

Davies, Tilman M., and David Bryant. 2013. “On Circulant Embedding for Gaussian Random Fields in R.” Journal of Statistical Software 55 (9). https://doi.org/10.18637/jss.v055.i09.

Dietrich, C. R., and G. N. Newsam. 1993. “A Fast and Exact Method for Multidimensional Gaussian Stochastic Simulations.” Water Resources Research 29 (8): 2861–9. https://doi.org/10.1029/93WR01070.

Efron, Bradley. 2010. “Correlated Z-Values and the Accuracy of Large-Scale Statistical Estimates.” Journal of the American Statistical Association 105 (491): 1042–55. https://doi.org/10.1198/jasa.2010.tm09129.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41. https://doi.org/10.1093/biostatistics/kxm045.

Fuentes, Montserrat. 2006. “Testing for Separability of Spatial–Temporal Covariance Functions.” Journal of Statistical Planning and Inference 136 (2): 447–66. https://doi.org/10.1016/j.jspi.2004.07.004.

Gneiting, Tilmann, William Kleiber, and Martin Schlather. 2010. “Matérn Cross-Covariance Functions for Multivariate Random Fields.” Journal of the American Statistical Association 105 (491): 1167–77. https://doi.org/10.1198/jasa.2010.tm09420.

Gray, Robert M. 2006. “Toeplitz and Circulant Matrices: A Review.” Foundations and Trends® in Communications and Information Theory 2 (3): 155–239. https://doi.org/10.1561/0100000006.

Guinness, Joseph, and Montserrat Fuentes. 2016. “Circulant Embedding of Approximate Covariances for Inference from Gaussian Data on Large Lattices.” Journal of Computational and Graphical Statistics 26 (1): 88–97. https://doi.org/10.1080/10618600.2016.1164534.

Hackbusch, Wolfgang. 2015. Hierarchical Matrices: Algorithms and Analysis. 1st ed. Springer Series in Computational Mathematics 49. Heidelberg New York Dordrecht London: Springer Publishing Company, Incorporated.

Hansen, Christian B. 2007. “Generalized Least Squares Inference in Panel and Multilevel Models with Serial Correlation and Fixed Effects.” Journal of Econometrics 140 (2): 670–94. https://doi.org/10.1016/j.jeconom.2006.07.011.

Heinrich, Claudio, and Mark Podolskij. 2014. “On Spectral Distribution of High Dimensional Covariation Matrices,” October. http://arxiv.org/abs/1410.6764.

Hsieh, Cho-Jui, Mátyás A. Sustik, Inderjit S. Dhillon, and Pradeep D. Ravikumar. 2014. “QUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation.” Journal of Machine Learning Research 15 (1): 2911–47. http://www.jmlr.org/papers/volume15/hsieh14a/hsieh14a.pdf.

Huang, Jianhua Z., Naiping Liu, Mohsen Pourahmadi, and Linxu Liu. 2006. “Covariance Matrix Selection and Estimation via Penalised Normal Likelihood.” Biometrika 93 (1): 85–98. https://doi.org/10.1093/biomet/93.1.85.

James, William, and Charles Stein. 1961. “Estimation with Quadratic Loss.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:361–79. http://projecteuclid.org/euclid.bsmsp/1200512173.

Janková, Jana, and Sara van de Geer. 2015. “Honest Confidence Regions and Optimality in High-Dimensional Precision Matrix Estimation,” July. http://arxiv.org/abs/1507.02061.

Kauermann, Göran, and Raymond J. Carroll. 2001. “A Note on the Efficiency of Sandwich Covariance Matrix Estimation.” Journal of the American Statistical Association 96 (456): 1387–96. https://www.jstor.org/stable/3085907.

Khoromskij, B. N., A. Litvinenko, and H. G. Matthies. 2009. “Application of Hierarchical Matrices for Computing the Karhunen–Loève Expansion.” Computing 84 (1-2): 49–67. https://doi.org/10.1007/s00607-008-0018-3.

Khoshgnauz, Ehsan. 2012. “Learning Markov Network Structure Using Brownian Distance Covariance,” June. http://arxiv.org/abs/1206.6361.

Krumin, Michael, and Shy Shoham. 2009. “Generation of Spike Trains with Controlled Auto- and Cross-Correlation Functions.” Neural Computation 21 (6): 1642–64. https://doi.org/10.1162/neco.2009.08-08-847.

Lam, Clifford, and Jianqing Fan. 2009. “Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics 37 (6B): 4254–78. https://doi.org/10.1214/09-AOS720.

Ledoit, Olivier, and Michael Wolf. 2004. “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices.” Journal of Multivariate Analysis 88 (2): 365–411. https://doi.org/10.1016/S0047-259X(03)00096-4.

Ling, Robert F. 1974. “Comparison of Several Algorithms for Computing Sample Means and Variances.” Journal of the American Statistical Association 69 (348): 859–66. https://doi.org/10.1080/01621459.1974.10480219.

Loh, Wei-Liem. 1991. “Estimating Covariance Matrices II.” Journal of Multivariate Analysis 36 (2): 163–74. https://doi.org/10.1016/0047-259X(91)90055-7.

MacKay, David J C. 2002. “Gaussian Processes.” In Information Theory, Inference & Learning Algorithms, Chapter 45. Cambridge University Press. http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/534.548.pdf.

Mardia, K. V., and R. J. Marshall. 1984. “Maximum Likelihood Estimation of Models for Residual Covariance in Spatial Regression.” Biometrika 71 (1): 135–46. https://doi.org/10.1093/biomet/71.1.135.

Meinshausen, Nicolai, and Peter Bühlmann. 2006. “High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics 34 (3): 1436–62. https://doi.org/10.1214/009053606000000281.

Minasny, Budiman, and Alex. B. McBratney. 2005. “The Matérn Function as a General Model for Soil Variograms.” Geoderma, Pedometrics 2003, 128 (3–4): 192–207. https://doi.org/10.1016/j.geoderma.2005.04.003.

Nowak, W., and A. Litvinenko. 2013. “Kriging and Spatial Design Accelerated by Orders of Magnitude: Combining Low-Rank Covariance Approximations with FFT-Techniques.” Mathematical Geosciences 45 (4): 411–35. https://doi.org/10.1007/s11004-013-9453-6.

Pébay, Philippe. 2008. “Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments.” Sandia Report SAND2008-6212, Sandia National Laboratories. http://prod.sandia.gov/techlib/access-control.cgi/2008/086212.pdf.

Pourahmadi, Mohsen. 2011. “Covariance Estimation: The GLM and Regularization Perspectives.” Statistical Science 26 (3): 369–87. https://doi.org/10.1214/11-STS358.

Powell, Catherine E. 2014. “Generating Realisations of Stationary Gaussian Random Fields by Circulant Embedding.” Matrix 2 (2): 1.

Ramdas, Aaditya, and Leila Wehbe. 2014. “Stein Shrinkage for Cross-Covariance Operators and Kernel Independence Testing,” June. http://arxiv.org/abs/1406.1922.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press. http://www.gaussianprocess.org/gpml/.

Ravikumar, Pradeep, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu. 2011. “High-Dimensional Covariance Estimation by Minimizing ℓ1-Penalized Log-Determinant Divergence.” Electronic Journal of Statistics 5: 935–80. https://doi.org/10.1214/11-EJS631.

Rosenblatt, M. 1984. “Asymptotic Normality, Strong Mixing and Spectral Density Estimates.” The Annals of Probability 12 (4): 1167–80. https://doi.org/10.1214/aop/1176993146.

Sampson, P D, and P Guttorp. 1992. “Nonparametric Estimation of Nonstationary Spatial Covariance Structure.” Journal of the American Statistical Association 87 (417): 108–19.

Schäfer, Juliane, and Korbinian Strimmer. 2005. “A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics.” Statistical Applications in Genetics and Molecular Biology 4: Article32. https://doi.org/10.2202/1544-6115.1175.

Shao, Xiaofeng, and Wei Biao Wu. 2007. “Asymptotic Spectral Theory for Nonlinear Time Series.” The Annals of Statistics 35 (4): 1773–1801. https://doi.org/10.1214/009053606000001479.

Stein, Michael L. 2005. “Space-Time Covariance Functions.” Journal of the American Statistical Association 100 (469): 310–21. https://doi.org/10.1198/016214504000000854.

Sun, Ying, and Michael L. Stein. 2016. “Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets.” Journal of Computational and Graphical Statistics 25 (1): 187–208. https://doi.org/10.1080/10618600.2014.975230.

Takemura, Akimichi. 1984. “An Orthogonally Invariant Minimax Estimator of the Covariance Matrix of a Multivariate Normal Population.” Tsukuba Journal of Mathematics 8 (2): 367–76.

Whittle, P. 1952. “Tests of Fit in Time Series.” Biometrika 39 (3-4): 309–18. https://doi.org/10.1093/biomet/39.3-4.309.

———. 1953a. “The Analysis of Multiple Stationary Time Series.” Journal of the Royal Statistical Society. Series B (Methodological) 15 (1): 125–39.

———. 1953b. “Estimation and Information in Stationary Time Series.” Arkiv För Matematik 2 (5): 423–34. https://doi.org/10.1007/BF02590998.

Whittle, Peter. 1952. “Some Results in Time Series Analysis.” Scandinavian Actuarial Journal 1952 (1-2): 48–60. https://doi.org/10.1080/03461238.1952.10414182.

Yuan, Ming, and Yi Lin. 2007. “Model Selection and Estimation in the Gaussian Graphical Model.” Biometrika 94 (1): 19–35. https://doi.org/10.1093/biomet/asm018.

Zeileis, Achim. 2004. “Econometric Computing with HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software 11 (10). https://doi.org/10.18637/jss.v011.i10.

———. 2006a. “Implementing a Class of Structural Change Tests: An Econometric Computing Approach.” Computational Statistics & Data Analysis 50 (11): 2987–3008. https://doi.org/10.1016/j.csda.2005.07.001.

———. 2006b. “Object-Oriented Computation of Sandwich Estimators.” Journal of Statistical Software 16 (1): 1–16. https://doi.org/10.18637/jss.v016.i09.

Zhang, T., and H. Zou. 2014. “Sparse Precision Matrix Estimation via Lasso Penalized D-Trace Loss.” Biometrika 101 (1): 103–20. https://doi.org/10.1093/biomet/ast059.