After all, if you have a system whose future evolution is important to predict, why not try to infer a plausible model instead of a convenient one?

I am in the process of taxonomising here. Stuff which fits the particular (likelihood) model of recursive estimation and so on will be kept there. Miscellaneous other approaches here.

A compact overview is inserted incidentally in Cosma’s review of Fan and Yao —- FaYa03 —- (wherein he also recommends Bosq98, TaKa00 and BoBl07.)

To reconstruct the state, as opposed to the parameters, you do state filtering. There can be interplay between these steps, if you are doing simulation-based online parameter inference, as in recursive estimation.

Anyway, for what kind of systems can you infer parameters? Mutually exciting point processes? Yep, EFBS04 do that.

From an engineering/control perspective, we have BrPK16, who give a sparse regression version. Generally it seems it can be done by indirect inference, or recursive hierarchical generalised linear models, generalising the process for linear time series.

There are many highly general formulations; Kita96 gives a Bayesian “smooth” one.

See e.g. the HeDG15 paper:

We address […] these problems with a new view of predictive state methods for dynamical system learning. In this view, a dynamical system learning problem is reduced to a sequence of supervised learning problems. So, we can directly apply the rich literature on supervised learning methods to incorporate many types of prior knowledge about problem structure. We give a general convergence rate analysis that allows a high degree of flexibility in designing estimators. And finally, implementing a new estimator becomes as simple as rearranging our data and calling the appropriate supervised learning subroutines.

[…] More specifically, our contribution is to show that we can use much-more- general supervised learning algorithms in place of linear regression, and still get a meaningful theoretical analysis. In more detail:

we point out that we can equally well use any well-behaved supervised learning algorithm in place of linear regression in the first stage of instrumental-variable regression;

for the second stage of instrumental-variable regression, we generalize ordinary linear regression to its RKHS counterpart;

we analyze the resulting combination, and show that we get convergence to the correct answer, with a rate that depends on how quickly the individual supervised learners converge

Also, sparsely or unevenly observed series are tricky. I’m looking at those at the moment.

## Awaiting filing

- Pereyra et al (PSCP16)

Modern signal processing (SP) methods rely very heavily on probability and statistics to solve challenging SP problems. Expectations and demands are constantly rising, and SP methods are now expected to deal with ever more complex models, requiring ever more sophisticated computational inference techniques. This has driven the development of statistical SP methods based on stochastic simulation and optimization. Stochastic simulation and optimization algorithms are computationally intensive tools for performing statistical inference in models that are analytically intractable and beyond the scope of deterministic inference methods. They have been recently successfully applied to many difficult problems involving complex statistical models and sophisticated (often Bayesian) statistical inference techniques. This paper presents a tutorial on stochastic simulation and optimization methods in signal and image processing and points to some interesting research problems. The paper addresses a variety of high-dimensional Markov chain Monte Carlo It also discusses a range of optimization methods that have been adopted to solve stochastic problems, as well as stochastic methods for deterministic optimization. Subsequently, areas of overlap between simulation and optimization, in particular optimization-within-MCMC and MCMC-driven optimization are discussed.

## Refs

- BeAt16: Souhaib Ben Taieb, Amir F. Atiya (2016) A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting.
*IEEE Transactions on Neural Networks and Learning Systems*, 27(1), 62–76. DOI - WiZi89: Ronald J. Williams, David Zipser (1989) A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.
*Neural Computation*, 1(2), 270–280. DOI - WeTN17: Ruofeng Wen, Kari Torkkola, Balakrishnan Narayanaswamy (2017) A Multi-Horizon Quantile Recurrent Forecaster.
*ArXiv:1711.11053 [Stat]*. - HeDG15: Ahmed Hefny, Carlton Downey, Geoffrey Gordon (2015) A New View of Predictive State Methods for Dynamical System Learning.
*ArXiv:1505.05310 [Cs, Stat]*. - AnWa16: Isadora Antoniano-Villalobos, Stephen G. Walker (2016) A Nonparametric Model for Stationary Time Series.
*Journal of Time Series Analysis*, 37(1), 126–142. DOI - PSCP16: M. Pereyra, P. Schniter, É Chouzenoux, J. C. Pesquet, J. Y. Tourneret, A. O. Hero, S. McLaughlin (2016) A Survey of Stochastic Simulation and Optimization Methods in Signal Processing.
*IEEE Journal of Selected Topics in Signal Processing*, 10(2), 224–241. DOI - AMGC02: M. S. Arulampalam, S. Maskell, N. Gordon, T. Clapp (2002) A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking.
*IEEE Transactions on Signal Processing*, 50(2), 174–188. DOI - UnTa14: Michael A. Unser, Pouya Tafti (2014)
*An introduction to sparse stochastic processes*. New York: Cambridge University Press - FiSi16: Axel Finke, Sumeetpal S. Singh (2016) Approximate Smoothing and Parameter Estimation in High-Dimensional State-Space Models.
*ArXiv:1606.08650 [Stat]*. - TaKa00: Masanobu Taniguchi, Yoshihide Kakizawa (2000)
*Asymptotic theory of statistical inference for time series*. New York: Springer - RoTa10: Tirza Routtenberg, Joseph Tabrikian (2010) Blind MIMO-AR System Identification and Source Separation with Finite-alphabet.
*IEEE Transactions on Signal Processing*, 58(3), 990–1000. DOI - BüKü99: Peter Bühlmann, Hans R Künsch (1999) Block length selection in the bootstrap for time series.
*Computational Statistics & Data Analysis*, 31(3), 295–310. DOI - CaRS15: Ben Cassidy, Caroline Rae, Victor Solo (2015) Brain Activity: Connectivity, Sparsity, and Mutual Information.
*IEEE Transactions on Medical Imaging*, 34(4), 846–860. DOI - Carm14: Avishy Y. Carmi (2014) Compressive System Identification. In Compressed Sensing & Sparse Filtering (pp. 281–324). Springer Berlin Heidelberg DOI
- KAAB18: Robert E. Kass, Shun-Ichi Amari, Kensuke Arai, Emery N. Brown, Casey O. Diekman, Markus Diesmann, … Mark A. Kramer (2018) Computational Neuroscience: Mathematical and Statistical Perspectives.
*Annual Review of Statistics and Its Application*, 5(1), 183–214. DOI - FlSG17: Valentin Flunkert, David Salinas, Jan Gasthaus (2017) DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.
*ArXiv:1704.04110 [Cs, Stat]*. - DoJR13: Arnaud Doucet, Pierre E. Jacob, Sylvain Rubenthaler (2013) Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models.
*ArXiv:1304.5768 [Stat]*. - Chev07: Guillaume Chevillon (2007) Direct Multi-Step Estimation and Forecasting.
*Journal of Economic Surveys*, 21(4), 746–785. DOI - BrPK16: Steven L. Brunton, Joshua L. Proctor, J. Nathan Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems.
*Proceedings of the National Academy of Sciences*, 113(15), 3932–3937. DOI - EFBS04: U Eden, L Frank, R Barbieri, V Solo, E Brown (2004) Dynamic Analysis of Neural Encoding by Point Process Adaptive Filtering.
*Neural Computation*, 16(5), 971–998. DOI - Andr94: Donald W. K. Andrews (1994) Empirical process methods in econometrics. In Handbook of Econometrics (Vol. 4, pp. 2247–2294). Elsevier
- COMG07: Alex R. Cook, Wilfred Otten, Glenn Marion, Gavin J. Gibson, Christopher A. Gilligan (2007) Estimation of multiple transmission rates for epidemics in heterogeneous populations.
*Proceedings of the National Academy of Sciences*, 104(51), 20392–20397. DOI - Tani01: Hisashi Tanizaki (2001) Estimation of unknown parameters in nonlinear and non-Gaussian state-space models.
*Journal of Statistical Planning and Inference*, 96(2), 301–323. DOI - ChLY16: Ngai Hang Chan, Ye Lu, Chun Yip Yau (2016) Factor Modelling for High-Dimensional Time Series: Inference and Model Selection.
*Journal of Time Series Analysis*, n/a-n/a. DOI - Werb88: Paul J. Werbos (1988) Generalization of backpropagation with application to a recurrent gas market model.
*Neural Networks*, 1(4), 339–356. DOI - Fras08: Andrew M. Fraser (2008)
*Hidden Markov models and dynamical systems*. Philadelphia, PA: Society for Industrial and Applied Mathematics - BoBl07: Denis Bosq, Delphine Blanke (2007)
*Inference and prediction in large dimensions*. Chichester, England ; Hoboken, NJ: John Wiley/Dunod - IoBK06: E. L. Ionides, C. Bretó, A. A. King (2006) Inference for nonlinear dynamical systems.
*Proceedings of the National Academy of Sciences*, 103(49), 18438–18443. DOI - IBAK11: Edward L. Ionides, Anindya Bhadra, Yves Atchadé, Aaron King (2011) Iterated filtering.
*The Annals of Statistics*, 39(3), 1776–1802. DOI - HaSZ17: Elad Hazan, Karan Singh, Cyril Zhang (2017) Learning Linear Dynamical Systems via Spectral Filtering. In NIPS.
- PlDY15: Sergey Plis, David Danks, Jianyu Yang (2015) Mesochronal Structure Learning.
*Uncertainty in Artificial Intelligence : Proceedings of the … Conference. Conference on Uncertainty in Artificial Intelligence*, 31. - PhPa16: Tung Pham, Victor Panaretos (2016) Methodology and Convergence Rates for Functional Time Series Regression.
*ArXiv:1612.07197 [Math, Stat]*. - HMCH08: X. Hong, R. J. Mitchell, S. Chen, C. J. Harris, K. Li, G. W. Irwin (2008) Model selection approaches for non-linear system identification: a review.
*International Journal of Systems Science*, 39(10), 925–946. DOI - Kita96: Genshiro Kitagawa (1996) Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models.
*Journal of Computational and Graphical Statistics*, 5(1), 1–25. DOI - DuKo97: J. Durbin, S. J. Koopman (1997) Monte Carlo maximum likelihood estimation for non-Gaussian state space models.
*Biometrika*, 84(3), 669–684. DOI - NRPD93: O. Nerrand, P. Roussel-Ragot, L. Personnaz, G. Dreyfus, S. Marcos (1993) Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms.
*Neural Computation*, 5(2), 165–199. DOI - Kita87: Genshiro Kitagawa (1987) Non-Gaussian State—Space Modeling of Nonstationary Time Series.
*Journal of the American Statistical Association*, 82(400), 1032–1041. DOI - SZLB95: Jonas Sjöberg, Qinghua Zhang, Lennart Ljung, Albert Benveniste, Bernard Delyon, Pierre-Yves Glorennec, … Anatoli Juditsky (1995) Nonlinear black-box modeling in system identification: a unified overview.
*Automatica*, 31(12), 1691–1724. DOI - KaSc04: Holger Kantz, Thomas Schreiber (2004)
*Nonlinear time series analysis*. Cambridge, UK ; New York: Cambridge University Press - FaYa03: Jianqing Fan, Qiwei Yao (2003)
*Nonlinear time series: nonparametric and parametric methods*. New York: Springer - Robi83: P. M. Robinson (1983) Nonparametric Estimators for Time Series.
*Journal of Time Series Analysis*, 4(3), 185–207. DOI - HoLi05: Yongmiao Hong, Haitao Li (2005) Nonparametric Specification Testing for Continuous-Time Models with Applications to Term Structure of Interest Rates.
*Review of Financial Studies*, 18(1), 37–84. DOI - Bosq98: Denis Bosq (1998)
*Nonparametric statistics for stochastic processes: estimation and prediction*. New York: Springer - Särk07: Simo Särkkä (2007) On Unscented Kalman Filtering for State Estimation of Continuous-Time Nonlinear Systems.
*IEEE Transactions on Automatic Control*, 52(9), 1631–1641. DOI - RuDK15: Jakob Runge, Reik V. Donner, Jürgen Kurths (2015) Optimal model-free prediction from multivariate time series.
*Physical Review E*, 91(5). DOI - FeKü18: Paul Fearnhead, Hans R. Künsch (2018) Particle Filters and Data Assimilation.
*Annual Review of Statistics and Its Application*, 5(1), 421–449. DOI - StMu13: Nicolas Städler, Sach Mukherjee (2013) Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models.
*The Annals of Applied Statistics*, 7(4), 2157–2179. DOI - Ljun10: Lennart Ljung (2010) Perspectives on system identification.
*Annual Reviews in Control*, 34(1), 1–12. DOI - HeIK10: Daihai He, Edward L. Ionides, Aaron A. King (2010) Plug-and-play inference for disease dynamics: measles in large and small populations as a case study.
*Journal of The Royal Society Interface*, 7(43), 271–283. DOI - KEMW05: Bruce E. Kendall, Stephen P. Ellner, Edward McCauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch, Peter Turchin (2005) Population cycles in the pine looper moth: Dynamical tests of mechanistic hypotheses.
*Ecological Monographs*, 75(2), 259–276. - ClBj04: James S. Clark, Ottar N. Bjørnstad (2004) Population time series: process variability, observation errors, missing values, lags, and hidden states.
*Ecology*, 85(11), 3140–3150. DOI - LGZZ16: Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio (2016) Professor Forcing: A New Algorithm for Training Recurrent Networks. In Advances In Neural Information Processing Systems.
- BVJS15: Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer (2015) Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems 28 (pp. 1171–1179). Cambridge, MA, USA: Curran Associates, Inc.
- KeCh72: R. Kemerait, D. Childers (1972) Signal detection and extraction by cepstrum techniques.
*IEEE Transactions on Information Theory*, 18(6), 745–759. DOI - KiGe96: Genshiro Kitagawa, Will Gersch (1996)
*Smoothness Priors Analysis of Time Series*. New York, NY: Springer New York : Imprint : Springer - HaKo05: A. Harvey, S. J. Koopman (2005) Structural Time Series Models. In Encyclopedia of Biostatistics. John Wiley & Sons, Ltd
- Levi17: David N. Levin (2017) The Inner Structure of Time-Dependent Signals.
*ArXiv:1703.08596 [Cs, Math, Stat]*. - Pill16: Gianluigi Pillonetto (2016) The interplay between system identification and machine learning.
*ArXiv:1612.09158 [Cs, Stat]*. - DuKo12: J. Durbin, S. J. Koopman (2012)
*Time series analysis by state space methods*. Oxford: Oxford University Press - AASS18: Anish Agarwal, Muhammad Jehangir Amjad, Devavrat Shah, Dennis Shen (2018) Time Series Analysis via Matrix Estimation.
*ArXiv:1802.09064 [Cs, Stat]*. - BHIK09: Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King (2009) Time series analysis via mechanistic models.
*The Annals of Applied Statistics*, 3(1), 319–348. DOI - TaOl17: Corentin Tallec, Yann Ollivier (2017) Unbiasing Truncated Backpropagation Through Time.
*ArXiv:1705.08209 [Cs]*.