In *system identification*, we identify the parameters of a
stochastic dynamical system
of a certain type, i.e. usually
one with feedback,
so that we can
e.g. simulate it, or deconvolve it to find the inputs and hidden state, maybe using a
state filter.
In statistical terms, this is the *parameter inference* problem for time series data.

Moreover, it totally works without Gaussian noise;
that’s just *convenient* in optimal linear filtering,
Kalman filtering isn’t rocket science, after all.
Also, mathematically this is a useful crutch if you decide to go to a continuous time index,
cf Gaussian processes.

## Incoherent note chaos

### Nonuniformly sampled data

### Chaos

Parametric and “nonparametric” models.

Infinite divisibility in noise model.

Spectral (Wiener) and time-domain filters.

Welch’s method. Durbin-Levinson.

Experiment design.

Identifiability.

Sometimes you can do standard system DSP system identification, esp for linear systems with i.i.d. noise. Some parameters can sometimes be left as unobserved state variables and estimated dynamically.

### Linear Predictive Coding

LPC introductions traditionally start with a physical model of the human vocal tract as a resonating pipe, then mumble away the details. This confused the hell out of me. AFAICT, an LPC model is just a list of AR regression coefficients and a driving noise source coefficient. This is “coding” because you can round the numbers, pack them down a smidgen and then use it to encode certain time series, such as the human voice, compactly. But it’s still a regression analysis, and can be treated as such.

The twists are that

- we usually think about it in a compression context
- Traditionally one performs many regressions to get time-varying models

It’s commonly described as a physical model because
we can imagine these regression coefficients corresponding
o a simplified physical model of the human vocal tract;
But we can think of the regression coefficients as corresponding to *any*
all-pole linear system, so I don’t think that brings special insight;
especially as the models of, say, a resonating pipe,
would intuitively be described by time-delays corresponding
to the *length* of the pipe,
not time-lags corresponding to a corresponding sample
plus computational convenience.
Sure we can get *similar* spectral response for this model as with a pipe,
according to linear systems theory,
but if you are going to assume so much advanced linear systems theory anyway,
and mix it with crappy physics,
why not just start with the linear systems and ditch the physics?

To discuss: these coefficients as spectrogram smoothing.

### Harmonic regression

A random thing I saw mentioned - I wonder if this is just another smoother for regressions?

Estimating the magnitude of individual cyclic components in a signal, e.g.

Rather than count peaks to guess the period or frequency […] fit regressions at many frequencies to find hidden sinusoids. Use the estimated amplitude at these frequencies to locate hidden periodic components. It is particularly easy to estimate the amplitude at a grid of evenly spaced frequencies from 0 to 1/2.

## Marginal versus conditional regression

Pereyra et al (PSCP16)

Modern signal processing (SP) methods rely very heavily on probability and statistics to solve challenging SP problems. Expectations and demands are constantly rising, and SP methods are now expected to deal with ever more complex models, requiring ever more sophisticated computational inference techniques. This has driven the development of statistical SP methods based on stochastic simulation and optimization. Stochastic simulation and optimization algorithms are computationally intensive tools for performing statistical inference in models that are analytically intractable and beyond the scope of deterministic inference methods. They have been recently successfully applied to many difficult problems involving complex statistical models and sophisticated (often Bayesian) statistical inference techniques. This paper presents a tutorial on stochastic simulation and optimization methods in signal and image processing and points to some interesting research problems. The paper addresses a variety of high-dimensional Markov chain Monte Carlo It also discusses a range of optimization methods that have been adopted to solve stochastic problems, as well as stochastic methods for deterministic optimization. Subsequently, areas of overlap between simulation and optimization, in particular optimization-within-MCMC and MCMC-driven optimization are discussed.

## Cepstral and generalised cepstral transforms

See also machine listening.

Just as you can generalise linear models for i.i.d observations you can do it with time series. You can also do it for the power-spectral representation of the time series, which includes as a special case the cepstral representation of the series.

I haven’t actually read the foundational literature here, just used some algorithms; but it seems to be mostly a hack for rapid identification of correlation lags where said lags are long.

Proietti, T., & Luati, A. (2013). Generalised Linear Cepstral Models for the Spectrum of a Time Series.

In this chapter we consider a class of parametric spectrum estimators based on a generalized linear model for exponential random variables with power link. The power transformation of the spectrum of a stationary process can be expanded in a Fourier series, with the coefficients representing generalised autocovariances. Direct Whittle estimation of the coefficients is generally unfeasible, as they are subject to constraints (the autocovariances need to be a positive semidefinite sequence). The problem can be overcome by using an ARMA representation for the power transformation of the spectrum. Estimation is carried out by maximising the Whittle likelihood, whereas the selection of a spectral model, as a function of the power transformation parameter and the ARMA orders, can be carried out by information criteria. The proposed methods are applied to the estimation of the inverse autocorrelation function and the related problem of selecting the optimal interpolator, and for the identification of spectral peaks. More generally, they can be applied to spectral estimation with possibly misspecified models.

## Instrumental variable regression

See also causal DAGs for an extended perspective.

(Open loop) Linear systems have particular tricks unavailable in the general case, e.g. 2 stage least squares. TODO: list, quantify dangers if not actually linear.

## Identifiability

Two questions here - If the sequence is stationary, can we identify it?

If it is not stationary, we need to appeal to ergodic/mixing properties.

## Convergence/sample complexity

### Asymptotic

### finite-sample

## Refs

- Akai73
- Akaike, H. (1973) Maximum likelihood identification of Gaussian autoregressive moving average models.
*Biometrika*, 60(2), 255–265. DOI. - AnWa16
- Antoniano-Villalobos, I., & Walker, S. G.(2016) A Nonparametric Model for Stationary Time Series.
*Journal of Time Series Analysis*, 37(1), 126–142. DOI. - Bart46
- Bartlett, M. S.(1946) On the Theoretical Specification and Sampling Properties of Autocorrelated Time-Series.
*Supplement to the Journal of the Royal Statistical Society*, 8(1), 27–41. DOI. - BeZa76
- Berkhout, A. J., & Zaanen, P. R.(1976) A Comparison Between Wiener Filtering, Kalman Filtering, and Deterministic Least Squares Estimation*.
*Geophysical Prospecting*, 24(1), 141–197. DOI. - BJRL16
- Box, George E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M.(2016) Time series analysis: forecasting and control. (Fifth edition.). Hoboken, New Jersey: John Wiley & Sons, Inc
- BoPi70
- Box, G. E. P., & Pierce, D. A.(1970) Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models.
*Journal of the American Statistical Association*, 65(332), 1509–1526. DOI. - Broe06
- Broersen, P. M.(2006) Automatic autocorrelation and spectral analysis. . Secaucus, NJ, USA: Springer Science & Business Media
- BrBo06
- Broersen, P. M. T., & Bos, R. (2006) Estimating time-series models from irregularly spaced data.
*IEEE Transactions on Instrumentation and Measurement*, 55(4), 1124–1131. DOI. - BrWB04
- Broersen, Piet M. T., de Waele, S., & Bos, R. (2004) Autoregressive spectral analysis when observations are missing.
*Automatica*, 40(9), 1495–1504. DOI. - BüKü99
- Bühlmann, P., & Künsch, H. R.(1999) Block length selection in the bootstrap for time series.
*Computational Statistics & Data Analysis*, 31(3), 295–310. DOI. - Carm13
- Carmi, A. Y.(2013) Compressive system identification: Sequential methods and entropy bounds.
*Digital Signal Processing*, 23(3), 751–770. DOI. - Carm14
- Carmi, A. Y.(2014) Compressive System Identification. In A. Y. Carmi, L. Mihaylova, & S. J. Godsill (Eds.), Compressed Sensing & Sparse Filtering (pp. 281–324). Springer Berlin Heidelberg DOI.
- ChHo12
- Chen, B., & Hong, Y. (2012) Testing for the Markov Property in Time Series.
*Econometric Theory*, 28(01), 130–178. DOI. - ChKF16
- Christ, M., Kempa-Liehr, A. W., & Feindt, M. (2016) Distributed and parallel time series feature extraction for industrial big data applications.
*arXiv:1610.07717 [Cs]*. - MaFe07
- de Matos, J. A., & Fernandes, M. (2007) Testing the Markov property with high frequency data.
*Journal of Econometrics*, 141(1), 44–64. DOI. - DuKo97
- Durbin, J., & Koopman, S. J.(1997) Monte Carlo maximum likelihood estimation for non-Gaussian state space models.
*Biometrika*, 84(3), 669–684. DOI. - DuKo12
- Durbin, J., & Koopman, S. J.(2012) Time series analysis by state space methods. (2nd ed.). Oxford: Oxford University Press
- GeMe81
- Geweke, J., & Meese, R. (1981) Estimating regression models of finite but unknown order.
*Journal of Econometrics*, 16(1), 162. DOI. - HaMR16
- Hardt, M., Ma, T., & Recht, B. (2016) Gradient Descent Learns Linear Dynamical Systems.
*arXiv:1609.05191 [Cs, Math, Stat]*. - HaKo05
- Harvey, A., & Koopman, S. J.(2005) Structural Time Series Models. In Encyclopedia of Biostatistics. John Wiley & Sons, Ltd
- HeDG15
- Hefny, A., Downey, C., & Gordon, G. (2015) A New View of Predictive State Methods for Dynamical System Learning.
*arXiv:1505.05310 [Cs, Stat]*. - HeGo15
- Hencic, A., & Gouriéroux, C. (2015) Noncausal Autoregressive Model in Application to Bitcoin/USD Exchange Rates. In V.-N. Huynh, V. Kreinovich, S. Sriboonchitta, & K. Suriya (Eds.), Econometrics of Risk (pp. 17–40). Springer International Publishing DOI.
- HoLD10
- Holan, S. H., Lund, R., & Davis, G. (2010) The ARMA alphabet soup: A tour of ARMA model variants.
*Statistics Surveys*, 4, 232–274. DOI. - Jone81
- Jones, R. H.(1981) Fitting a continuous time autoregression to discrete data. In Applied time series analysis II (pp. 651–682).
- Jone84
- Jones, R. H.(1984) Fitting multivariate models to unequally spaced data. In Time series analysis of irregularly observed data (pp. 158–188). Springer
- KaSH00
- Kailath, T., Sayed, A. H., & Hassibi, B. (2000) Linear estimation. . Upper Saddle River, N.J: Prentice Hall
- KMBT11
- Kalouptsidis, N., Mileounis, G., Babadi, B., & Tarokh, V. (2011) Adaptive algorithms for sparse system identification.
*Signal Processing*, 91(8), 1910–1919. DOI. - Kay93
- Kay, S. M.(1993) Fundamentals of statistical signal processing, volume I: estimation theory.
- Küns86
- Künsch, H. R.(1986) Discrimination between monotonic trends and long-range dependence.
*Journal of Applied Probability*, 23(4), 1025–1030. - LaFR04
- Lahalle, E., Fleury, G., & Rivoira, A. (2004) Continuous ARMA spectral estimation from irregularly sampled observations. In Proceedings of the 21st IEEE Instrumentation and Measurement Technology Conference, 2004. IMTC 04 (Vol. 2, p. 923–927 Vol.2). DOI.
- LaSö02
- Larsson, E. K., & Söderström, T. (2002) Identification of continuous-time AR processes from unevenly sampled data.
*Automatica*, 38(4), 709–718. DOI. - LiMa92
- Lii, K.-S., & Masry, E. (1992) Model fitting for continuous-time stationary processes from discrete-time data.
*Journal of Multivariate Analysis*, 41(1), 56–79. DOI. - Ljun99
- Ljung, L. (1999) System identification: theory for the user. (2nd ed.). Upper Saddle River, NJ: Prentice Hall PTR
- Makh75
- Makhoul, J. (1975) Linear prediction: A tutorial review.
*Proceedings of the IEEE*, 63(4), 561–580. DOI. - MaKP98
- Manton, J. H., Krishnamurthy, V., & Poor, H. V.(1998) James-Stein state filtering algorithms.
*IEEE Transactions on Signal Processing*, 46(9), 2431–2447. DOI. - Mart98
- Martin, R. J.(1998) Autoregression and irregular sampling: Filtering.
*Signal Processing*, 69(3), 229–248. DOI. - Mart99a
- Martin, R. J.(1999a) Autoregression and irregular sampling: Spectral estimation.
*Signal Processing*, 77(2), 139–157. DOI. - Mart99b
- Martin, Richard James. (1999b, April 2) Irregularly Sampled Signals: Theories and Techniques for Analysis.
- McSS11a
- McDonald, D. J., Shalizi, C. R., & Schervish, M. (2011a) Generalization error bounds for stationary autoregressive models.
*arXiv:1103.0942 [Cs, Stat]*. - McSS11b
- McDonald, D. J., Shalizi, C. R., & Schervish, M. (2011b) Risk bounds for time series without strong mixing.
*arXiv:1106.0730 [Cs, Stat]*. - Mcle98
- McLeod, A. I.(1998) Hyperbolic decay time series.
*Journal of Time Series Analysis*, 19(4), 473–483. DOI. - McZh08
- McLeod, A. I., & Zhang, Y. (2008) Faster ARMA maximum likelihood estimation.
*Computational Statistics & Data Analysis*, 52(4), 2166–2176. DOI. - MiVi93
- Milanese, M., & Vicino, A. (1993) Information-Based Complexity and Nonparametric Worst-Case System Identification.
*Journal of Complexity*, 9(4), 427–446. DOI. - PSCP16
- Pereyra, M., Schniter, P., Chouzenoux, É., Pesquet, J. C., Tourneret, J. Y., Hero, A. O., & McLaughlin, S. (2016) A Survey of Stochastic Simulation and Optimization Methods in Signal Processing.
*IEEE Journal of Selected Topics in Signal Processing*, 10(2), 224–241. DOI. - PlDY15
- Plis, S., Danks, D., & Yang, J. (2015) Mesochronal Structure Learning.
*Uncertainty in Artificial Intelligence : Proceedings of the … Conference. Conference on Uncertainty in Artificial Intelligence*, 31. - RuSW05
- Rudary, M., Singh, S., & Wingate, D. (2005) Predictive Linear-Gaussian Models of Stochastic Dynamical Systems. In arXiv:1207.1416 [cs].
- Scar81
- Scargle, J. D.(1981) Studies in astronomical time series analysis I-Modeling random processes in the time domain.
*The Astrophysical Journal Supplement Series*, 45, 1–71. - SöMo00
- Söderström, T., & Mossberg, M. (2000) Performance evaluation of methods for identifying continuous-time autoregressive processes.
*Automatica*, 1(36), 53–59. DOI. - StMo05
- Stoica, P., & Moses, R. L.(2005) Spectral Analysis of Signals. (1 edition.). Upper Saddle River, N.J: Prentice Hall
- TaKa00
- Taniguchi, M., & Kakizawa, Y. (2000) Asymptotic theory of statistical inference for time series. . New York: Springer
- TuKu82
- Tufts, D. W., & Kumaresan, R. (1982) Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood.
*Proceedings of the IEEE*, 70(9), 975–989. DOI. - UnTa14
- Unser, M. A., & Tafti, P. (2014) An introduction to sparse stochastic processes. . New York: Cambridge University Press
- Geer02
- van de Geer, S. (2002) On Hoeffdoing’s inequality for dependent random variables. In Empirical Process Techniques for Dependent Data. Birkhhäuser
- ZhMc06
- Zhang, Y., & McLeod, A. I.(2006) Computer Algebra Derivation of the Bias of Burg Estimators.
*Journal of Time Series Analysis*, 27(2), 157–165. DOI.