# Feedback system identification, linear

In system identification, we infer the parameters of a stochastic dynamical system of a certain type, i.e. usually one with feedback, so that we can e.g. simulate it, or deconvolve it to find the inputs and hidden state, maybe using state filters. In statistical terms, this is the parameter inference problem for dynamical systems.

Moreover, it totally works without Gaussian noise; that’s just convenient in optimal linear filtering, Kalman filtering isn’t rocket science, after all. Also, mathematically Gaussian is a useful crutch if you decide to go to a continuous time index, cf Gaussian processes.

This is the offline version. For online, recusive estimates, see recursive estimation, which will be handled separately.

## Intros

Oppenheim and Verghese, Signals, Systems, and Inference is free online.

Martin (Mart99a):

Consider the basic autoregressive model,

\begin{equation*} Y(k) = \sum_{j=1}^pa_jY(k-j)=\epsilon(k). \end{equation*}

Estimating AR(p) coefficients:

The [power] spectrum is easily obtained from [the above] as

\begin{align*} P(f) = \frac{\sigma^2}{|1+ \sum_{j=1}^pa_jz^{-1}|^2},\\ z=\exp 2\pi if\delta t \end{align*}

with $\delta t$ the intersample spacing.[…] for any given set of data, we need to be able to estimate the AR coeficients $\{a_j\}_{j=1}^N$ conveniently. Three methods for achieving this are the Yule-Walker, Burg and Covariance methods. The Yule-Walker technique uses the sample autocovariance to obtain the coefficients; the Covariance method defines, for a set of numbers $\mathbf{a}=\{a_j\}_{j=1}^N,$ a quantity known as the total forward and backward prediction error power:

\begin{equation*} E(Y,\mathbf{a}) = \frac{1}{2(N-p)}\sum_{n=p+1}^N\left\{ \left|Y(n)+\sum{j=1}^pa_jY(n-p)\right|^2 + \left|Y(n-p)+\sum{j=1}^pa^*_jY(n-p+j)\right|^2 \right\} \end{equation*}

and minimises this w.r.t. $\mathbf{a}$. As $E(Y, \mathbf{a})$ is a quadratic function of $\mathbf{a}$, $\partial E(Y, \mathbf{a})/partial a$ is linear in $\mathbf{a}$ and so this is a linear optimisation problem. The Burg method is a constrained minimisation of $E(Y, \mathbf{a})$ using the Levinson recursion, a computational device derived from the Yule-Walker method.

## Misc

Gradient descent learns Linear Dynamical systems

### Linear Predictive Coding

LPC introductions traditionally start with a physical model of the human vocal tract as a resonating pipe, then mumble away the details. This confused the hell out of me. AFAICT, an LPC model is just a list of AR regression coefficients and a driving noise source coefficient. This is “coding” because you can round the numbers, pack them down a smidgen and then use it to encode certain time series, such as the human voice, compactly. But it’s still a regression analysis, and can be treated as such.

The twists are that

1. we usually think about it in a compression context
2. Traditionally one performs many regressions to get time-varying models

It’s commonly described as a physical model because we can imagine these regression coefficients corresponding to a simplified physical model of the human vocal tract; But we can think of the regression coefficients as corresponding to any all-pole linear system, so I don’t think that brings special insight; especially as the models of, say, a resonating pipe, would intuitively be described by time-delays corresponding to the length of the pipe, not time-lags corresponding to a corresponding sample plus computational convenience. Sure we can get similar spectral response for this model as with a pipe, according to linear systems theory, but if you are going to assume so much advanced linear systems theory anyway, and mix it with crappy physics, why not just start with the linear systems and ditch the physics?

To discuss: these coefficients as spectrogram smoothing.

## Refs

Akai73
Akaike, H. (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika, 60(2), 255–265. DOI.
AnWa16
Antoniano-Villalobos, I., & Walker, S. G.(2016) A Nonparametric Model for Stationary Time Series. Journal of Time Series Analysis, 37(1), 126–142. DOI.
Bart46
Bartlett, M. S.(1946) On the Theoretical Specification and Sampling Properties of Autocorrelated Time-Series. Supplement to the Journal of the Royal Statistical Society, 8(1), 27–41. DOI.
BeZa76
Berkhout, A. J., & Zaanen, P. R.(1976) A Comparison Between Wiener Filtering, Kalman Filtering, and Deterministic Least Squares Estimation*. Geophysical Prospecting, 24(1), 141–197. DOI.
BJRL16
Box, George E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M.(2016) Time series analysis: forecasting and control. (Fifth edition.). Hoboken, New Jersey: John Wiley & Sons, Inc
BoPi70
Box, G. E. P., & Pierce, D. A.(1970) Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models. Journal of the American Statistical Association, 65(332), 1509–1526. DOI.
Broe06
Broersen, P. M.(2006) Automatic autocorrelation and spectral analysis. . Secaucus, NJ, USA: Springer Science & Business Media
BrBo06
Broersen, P. M. T., & Bos, R. (2006) Estimating time-series models from irregularly spaced data. IEEE Transactions on Instrumentation and Measurement, 55(4), 1124–1131. DOI.
BrWB04
Broersen, Piet M. T., de Waele, S., & Bos, R. (2004) Autoregressive spectral analysis when observations are missing. Automatica, 40(9), 1495–1504. DOI.
BüKü99
Bühlmann, P., & Künsch, H. R.(1999) Block length selection in the bootstrap for time series. Computational Statistics & Data Analysis, 31(3), 295–310. DOI.
Carm13
Carmi, A. Y.(2013) Compressive system identification: Sequential methods and entropy bounds. Digital Signal Processing, 23(3), 751–770. DOI.
Carm14
Carmi, A. Y.(2014) Compressive System Identification. In A. Y. Carmi, L. Mihaylova, & S. J. Godsill (Eds.), Compressed Sensing & Sparse Filtering (pp. 281–324). Springer Berlin Heidelberg DOI.
ChHo12
Chen, B., & Hong, Y. (2012) Testing for the Markov Property in Time Series. Econometric Theory, 28(01), 130–178. DOI.
ChKF16
Christ, M., Kempa-Liehr, A. W., & Feindt, M. (2016) Distributed and parallel time series feature extraction for industrial big data applications. arXiv:1610.07717 [Cs].
MaFe07
de Matos, J. A., & Fernandes, M. (2007) Testing the Markov property with high frequency data. Journal of Econometrics, 141(1), 44–64. DOI.
DuKo97
Durbin, J., & Koopman, S. J.(1997) Monte Carlo maximum likelihood estimation for non-Gaussian state space models. Biometrika, 84(3), 669–684. DOI.
DuKo12
Durbin, J., & Koopman, S. J.(2012) Time series analysis by state space methods. (2nd ed.). Oxford: Oxford University Press
GeMe81
Geweke, J., & Meese, R. (1981) Estimating regression models of finite but unknown order. Journal of Econometrics, 16(1), 162. DOI.
HaMR16
Hardt, M., Ma, T., & Recht, B. (2016) Gradient Descent Learns Linear Dynamical Systems. arXiv:1609.05191 [Cs, Math, Stat].
HaKo05
Harvey, A., & Koopman, S. J.(2005) Structural Time Series Models. In Encyclopedia of Biostatistics. John Wiley & Sons, Ltd
HeDG15
Hefny, A., Downey, C., & Gordon, G. (2015) A New View of Predictive State Methods for Dynamical System Learning. arXiv:1505.05310 [Cs, Stat].
HeGo15
Hencic, A., & Gouriéroux, C. (2015) Noncausal Autoregressive Model in Application to Bitcoin/USD Exchange Rates. In V.-N. Huynh, V. Kreinovich, S. Sriboonchitta, & K. Suriya (Eds.), Econometrics of Risk (pp. 17–40). Springer International Publishing DOI.
HoLD10
Holan, S. H., Lund, R., & Davis, G. (2010) The ARMA alphabet soup: A tour of ARMA model variants. Statistics Surveys, 4, 232–274. DOI.
Jone81
Jones, R. H.(1981) Fitting a continuous time autoregression to discrete data. In Applied time series analysis II (pp. 651–682).
Jone84
Jones, R. H.(1984) Fitting multivariate models to unequally spaced data. In Time series analysis of irregularly observed data (pp. 158–188). Springer
KaSH00
Kailath, T., Sayed, A. H., & Hassibi, B. (2000) Linear estimation. . Upper Saddle River, N.J: Prentice Hall
KMBT11
Kalouptsidis, N., Mileounis, G., Babadi, B., & Tarokh, V. (2011) Adaptive algorithms for sparse system identification. Signal Processing, 91(8), 1910–1919. DOI.
Kay93
Kay, S. M.(1993) Fundamentals of statistical signal processing, volume I: estimation theory.
Küns86
Künsch, H. R.(1986) Discrimination between monotonic trends and long-range dependence. Journal of Applied Probability, 23(4), 1025–1030.
LaFR04
Lahalle, E., Fleury, G., & Rivoira, A. (2004) Continuous ARMA spectral estimation from irregularly sampled observations. In Proceedings of the 21st IEEE Instrumentation and Measurement Technology Conference, 2004. IMTC 04 (Vol. 2, p. 923–927 Vol.2). DOI.
LaSö02
Larsson, E. K., & Söderström, T. (2002) Identification of continuous-time AR processes from unevenly sampled data. Automatica, 38(4), 709–718. DOI.
LiMa92
Lii, K.-S., & Masry, E. (1992) Model fitting for continuous-time stationary processes from discrete-time data. Journal of Multivariate Analysis, 41(1), 56–79. DOI.
Ljun99
Ljung, L. (1999) System identification: theory for the user. (2nd ed.). Upper Saddle River, NJ: Prentice Hall PTR
Makh75
Makhoul, J. (1975) Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580. DOI.
MaKP98
Manton, J. H., Krishnamurthy, V., & Poor, H. V.(1998) James-Stein state filtering algorithms. IEEE Transactions on Signal Processing, 46(9), 2431–2447. DOI.
Mart98
Martin, R. J.(1998) Autoregression and irregular sampling: Filtering. Signal Processing, 69(3), 229–248. DOI.
Mart99a
Martin, R. J.(1999a) Autoregression and irregular sampling: Spectral estimation. Signal Processing, 77(2), 139–157. DOI.
Mart99b
Martin, Richard James. (1999b, April 2) Irregularly Sampled Signals: Theories and Techniques for Analysis.
McSS11a
McDonald, D. J., Shalizi, C. R., & Schervish, M. (2011a) Generalization error bounds for stationary autoregressive models. arXiv:1103.0942 [Cs, Stat].
McSS11b
McDonald, D. J., Shalizi, C. R., & Schervish, M. (2011b) Risk bounds for time series without strong mixing. arXiv:1106.0730 [Cs, Stat].
Mcle98
McLeod, A. I.(1998) Hyperbolic decay time series. Journal of Time Series Analysis, 19(4), 473–483. DOI.
McZh08
McLeod, A. I., & Zhang, Y. (2008) Faster ARMA maximum likelihood estimation. Computational Statistics & Data Analysis, 52(4), 2166–2176. DOI.
MiVi93
Milanese, M., & Vicino, A. (1993) Information-Based Complexity and Nonparametric Worst-Case System Identification. Journal of Complexity, 9(4), 427–446. DOI.
PSCP16
Pereyra, M., Schniter, P., Chouzenoux, É., Pesquet, J. C., Tourneret, J. Y., Hero, A. O., & McLaughlin, S. (2016) A Survey of Stochastic Simulation and Optimization Methods in Signal Processing. IEEE Journal of Selected Topics in Signal Processing, 10(2), 224–241. DOI.
PlDY15
Plis, S., Danks, D., & Yang, J. (2015) Mesochronal Structure Learning. Uncertainty in Artificial Intelligence : Proceedings of the … Conference. Conference on Uncertainty in Artificial Intelligence, 31.
RuSW05
Rudary, M., Singh, S., & Wingate, D. (2005) Predictive Linear-Gaussian Models of Stochastic Dynamical Systems. In arXiv:1207.1416 [cs].
Scar81
Scargle, J. D.(1981) Studies in astronomical time series analysis I-Modeling random processes in the time domain. The Astrophysical Journal Supplement Series, 45, 1–71.
SöMo00
Söderström, T., & Mossberg, M. (2000) Performance evaluation of methods for identifying continuous-time autoregressive processes. Automatica, 1(36), 53–59. DOI.
StMo05
Stoica, P., & Moses, R. L.(2005) Spectral Analysis of Signals. (1 edition.). Upper Saddle River, N.J: Prentice Hall
TaKa00
Taniguchi, M., & Kakizawa, Y. (2000) Asymptotic theory of statistical inference for time series. . New York: Springer
TuKu82
Tufts, D. W., & Kumaresan, R. (1982) Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood. Proceedings of the IEEE, 70(9), 975–989. DOI.
UnTa14
Unser, M. A., & Tafti, P. (2014) An introduction to sparse stochastic processes. . New York: Cambridge University Press
Geer02
van de Geer, S. (2002) On Hoeffdoing’s inequality for dependent random variables. In Empirical Process Techniques for Dependent Data. Birkhhäuser
ZhMc06
Zhang, Y., & McLeod, A. I.(2006) Computer Algebra Derivation of the Bias of Burg Estimators. Journal of Time Series Analysis, 27(2), 157–165. DOI.