# The Living Thing / Notebooks : Covariance estimation

Estimating the thing that is always given to you by oracles in homework assignments. The meat of Gaussian process regression

Estimating the covariance, precision, concentration matrices of things. Turns about to be a lot more involved than estimating means in various ways and at various times. Long story.

Now, why did I want to know this again? I think it may have been something about minimalist GRF inference for the Synestizer project. Right.

Connections also to random matrix theory (Ben Arous et al). $$\mathcal{H}$$-matrix methods.

## Parametric covariance models

I don’t know anything about this, but for spatial statistics I am told I should look up Matérn covariance matrices for a parametric covariance field.

## Non-stationary covariance models

Particular reference to dynamically updating covariance estimates for a possibly-evolving system. This not quite the Kalman filter problem, since that presumes the (co)variance of our estimates, which is to say precision, gets updated, but that the (co)variance of the presumed process is stationary. I just learned, thanks to the retirement lecture of Hans-Ruedi Künsch that one solution to this problem might in fact be the Ensemble Kalman Filter.

## An inverse problem

• given a time-series autocorrelation lag, devise an algorithm to produce a sequence with this structure.

• for deterministic autocorrelation
• for probabilistic autocorrelation in expectation

Surely the ARIMA jockeys do this, yes? Granger et al? But that presumes short memory. Should I check the fractal/multifractal literature here?

## To read

• Basic inference using Inverse Wishart -by having a very basic “process model” that increases unvertainty of the covariance estimate as some convenient monotonic function of time, i should be able to get this one.

• general moment combination tricks

• John Cook’s version

• AzKS15 have a neat snark:

Our work deviates from the majority of work on compressive covariance estimation in that we do not make structural assumptions on the estimand, in this case the target covariance. A number of papers assume that the target covariance is low rank, sparse, or that the inverse covariance is sparse The broad theme of this line of work is that when the target covariance has some low-dimensional structure, far fewer total measurements (via random project) are necessary to achieve the same error as direct observation in the unstructured case. However when the target covariance does not have low-dimensional structure, these methods can fail dramatically, as we show with our lower bounds.

In contrast, our work instead examines the statistical price one pays for compressing the data vectors when the covariance matrix does not exhibit any low dimensional structure. Instead of using fewer measurements than direct observation, in this setting, compressing the data requires that one use significantly more measurements to achieve the same level of accuracy as direct observation. We precisely quantify this increase in measurement, showing that the effective sample size shifts from $$n$$ to $$nm^2/d^2$$, where the projection dimension is m and the ambient dimension is d. Since we must have m ≤ d, this means that one needs more samples to achieve a specified accuracy under our measurement model, when compared with direct observation. This effective sample size is present in all of our upper and lower bounds, showing that indeed, there is a price to pay for compression without structural assumptions. Note that this quadratic growth in effective sample size also matches recent results on covariance estimation from missing data

## Refs

Abra97
Abrahamsen, P. (1997) A review of Gaussian random fields and correlation functions.
AzKS15
Azizyan, M., Krishnamurthy, A., & Singh, A. (2015) Extreme Compressive Sampling for Covariance Estimation. arXiv:1506.00898 [Cs, Math, Stat].
BaAP05
Baik, J., Arous, G. B., & Péché, S. (2005) Phase Transition of the Largest Eigenvalue for Nonnull Complex Sample Covariance Matrices. The Annals of Probability, 33(5), 1643–1697.
BaGA08
Banerjee, O., Ghaoui, L. E., & d’Aspremont, A. (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research, 9(Mar), 485–516.
BaMM00
Barnard, J., McCulloch, R., & Meng, X.-L. (2000) Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, with Application to Shrinkage. Statistica Sinica, 10(4), 1281–1311.
BePé05
Ben Arous, G., & Péché, S. (2005) Universality of local eigenvalue statistics for some sample covariance matrices. Communications on Pure and Applied Mathematics, 58(10), 1316–1357. DOI.
CaZZ10
Cai, T. T., Zhang, C.-H., & Zhou, H. H.(2010) Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38(4), 2118–2144. DOI.
DaPo09
Daniels, M. J., & Pourahmadi, M. (2009) Modeling covariance matrices via partial autocorrelations. Journal of Multivariate Analysis, 100(10), 2352–2363. DOI.
Efro10
Efron, B. (2010) Correlated z-values and the accuracy of large-scale statistical estimates. Journal of the American Statistical Association, 105(491), 1042–1055. DOI.
FrHT08
Friedman, J., Hastie, T., & Tibshirani, R. (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. DOI.
Fuen06
Fuentes, M. (2006) Testing for separability of spatial–temporal covariance functions. Journal of Statistical Planning and Inference, 136(2), 447–466. DOI.
Hack15
Hackbusch, W. (2015) Hierarchical Matrices: Algorithms and Analysis. (1st ed.). Heidelberg New York Dordrecht London: Springer Publishing Company, Incorporated
Hans07
Hansen, C. B.(2007) Generalized least squares inference in panel and multilevel models with serial correlation and fixed effects. Journal of Econometrics, 140(2), 670–694. DOI.
HePo14
Heinrich, C., & Podolskij, M. (2014) On spectral distribution of high dimensional covariation matrices. arXiv:1410.6764 [Math].
HSDR14
Hsieh, C.-J., Sustik, M. A., Dhillon, I. S., & Ravikumar, P. D.(2014) QUIC: quadratic approximation for sparse inverse covariance estimation. Journal of Machine Learning Research, 15(1), 2911–2947.
HLPL06
Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006) Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 93(1), 85–98. DOI.
JaSt61
James, W., & Stein, C. (1961) Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379).
JaGe15
Janková, J., & van de Geer, S. (2015) Honest confidence regions and optimality in high-dimensional precision matrix estimation. arXiv:1507.02061 [Math, Stat].
KhLM09
Khoromskij, B. N., Litvinenko, A., & Matthies, H. G.(2009) Application of hierarchical matrices for computing the Karhunen–Loève expansion. Computing, 84(1–2), 49–67. DOI.
Khos12
Khoshgnauz, E. (2012) Learning Markov Network Structure using Brownian Distance Covariance. arXiv:1206.6361 [Cs, Stat].
KrSh09
Krumin, M., & Shoham, S. (2009) Generation of Spike Trains with Controlled Auto- and Cross-Correlation Functions. Neural Computation, 21(6), 1642–1664. DOI.
LaFa09
Lam, C., & Fan, J. (2009) Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. Annals of Statistics, 37(6B), 4254–4278. DOI.
LeWo04
Ledoit, O., & Wolf, M. (2004) A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411. DOI.
Loh91
Loh, W.-L. (1991) Estimating covariance matrices II. Journal of Multivariate Analysis, 36(2), 163–174. DOI.
MaMa84
Mardia, K. V., & Marshall, R. J.(1984) Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika, 71(1), 135–146. DOI.
MeBü06
Meinshausen, N., & Bühlmann, P. (2006) High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI.
MiMc05
Minasny, B., & McBratney, A. B.(2005) The Matérn function as a general model for soil variograms. Geoderma, 128(3–4), 192–207. DOI.
NoLi13
Nowak, W., & Litvinenko, A. (2013) Kriging and Spatial Design Accelerated by Orders of Magnitude: Combining Low-Rank Covariance Approximations with FFT-Techniques. Mathematical Geosciences, 45(4), 411–435. DOI.
Péba08
Pébay, P. (2008) Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments. Sandia Report SAND2008-6212, Sandia National Laboratories.
RaWe14
Ramdas, A., & Wehbe, L. (2014) Stein Shrinkage for Cross-Covariance Operators and Kernel Independence Testing. arXiv:1406.1922 [Stat].
RaWi06
Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
RWRY11
Ravikumar, P., Wainwright, M. J., Raskutti, G., & Yu, B. (2011) High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935–980. DOI.
Rose84
Rosenblatt, M. (1984) Asymptotic Normality, Strong Mixing and Spectral Density Estimates. The Annals of Probability, 12(4), 1167–1180. DOI.
SaGu92
Sampson, P. D., & Guttorp, P. (1992) Nonparametric estimation of nonstationary spatial covariance structure. Journal of the American Statistical Association, 87(417), 108–119.
ScSt05
Schäfer, J., & Strimmer, K. (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4, Article32. DOI.
ShWu07
Shao, X., & Wu, W. B.(2007) Asymptotic spectral theory for nonlinear time series. The Annals of Statistics, 35(4), 1773–1801. DOI.
Stei05
Stein, M. L.(2005) Space-time covariance functions. Journal of the American Statistical Association, 100(469), 310–321. DOI.
SuSt16
Sun, Y., & Stein, M. L.(2016) Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets. Journal of Computational and Graphical Statistics, 25(1), 187–208. DOI.
Take84
Takemura, A. (1984) An Orthogonally Invariant Minimax Estimator of the Covariance Matrix of a Multivariate Normal Population. Tsukuba Journal of Mathematics, 8(2), 367–376.
YuLi07
Yuan, M., & Lin, Y. (2007) Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1), 19–35. DOI.
ZhZo14
Zhang, T., & Zou, H. (2014) Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika, 101(1), 103–120. DOI.