a.k.a. recursive estimation, recursive estimation, state space model calibration, recursive identification.

State filters are cool for estimating
time-varying hidden states.
How about learning the *parameters* of the model generating your states?
Classic ways that you can do this in dynamical systems include
basic
linear system identification,
and general system identification.
But can you identify the fixed parameters (not just hidden states)
with a state filter?

Yes.

Contents

According to LIFM12, here are some landmark papers:

Augmenting the unobserved state vector is a well known technique, used in the system identification community for decades, see e.g. Ljung (Ljun79); Söderström and Stoica (SöSt88); Lindström et al. (LSBW08). Similar ideas, using Sequential Monte Carlos methods, were suggested by Kitagawa (Kita98); Liu and West (LiWe01). Combined state and parameter estimation is also the standard technique for data assimilation in high-dimensional systems, see Moradkhani et al. (MSGH05); Evensen (Even09a,Even09b_).

However, introducing random walk dynamics to the parameters with fixed variance leads to a new dynamical stochastic system with properties that may be different from the properties of the original system. That implies that the variance of the random walk should be decreased, when the method is used for offline parameter estimation, cf. Hürzeler and Künsch (HüKü01)

## Iterated filtering

Related: indirect inference. Precise relation will have to wait, since I currently do not care enough about indirect inference.

## Questions

Is this how Särkka use state filters to do gaussian process regression?

Ionides and King dominate my citations, at least for the frequentist stuff. Surely other people do this method too? But what are the keywords? This research is suspiciously concentrated in U Michigan, but the idea is not so esoteric. I think I am caught in a citation bubble.

update: the oceanographic crew of Even03 etc seem to do this with Bayes a lot.

can I estimate regularisation this way, despite the lack of probabilistic interpretation?

How does this work with non-Markov systems? Do we need to bother, or can we just do the Hamiltonian trick and augment the state vector? Can we talk about mixing, or correlation decay? Should I then shoot for the new-wave mixing approaches of Kuznetsov and Mohri etc?

### Basic Construction

There are a few variations.

But we start with the basic continuous time state space model.

Here we have an unobserved Markov state process \(x(t)\) on \(\mathcal{X}\) and an observation process \(y(t)\) on \(\mathcal{Y}\). For now they will be assumed to be finite dimensional vectors over \(\mathbb{R}.\) They will additionally depend upon a vector of parameters \(\theta\) We observe the process at discrete times \(t(1:T)=(t_1, t_2,\dots, t_T),\) and we will write the observations \(y(1:T)=(y(t_1), y(t_2),\dots, y(1_T)).\)

We presume our processes are completely specified by the following conditional densities (which might not have closed-form expression)

The transition density ..math:

f(x(t_i)|x(t_{i-1}), \theta)

The observation density (which seems overgeneral TBH…)

TBD.

## Awaiting filing

Recently enjoyed:
Sahani Pathiraja’s state filter does something cool, in attempting to identify
process *model* noise - a conditional nonparametric density of process errors, that may be used to come up with some neat process models.
I’m not convinced about her use of
kernel density estimators, since these scale badly precisely when you need them most, in high dimension; but any nonparametric density estimator would, I assume, work.

## Implementations

pomp does state filtering inference in R.

For some example of doing this in Stan see Sinhrks’ statn-statespace.

## Refs

- AnDH10
- Andrieu, C., Doucet, A., & Holenstein, R. (2010) Particle Markov chain Monte Carlo methods.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 72(3), 269–342. DOI. - APBC15
- Archer, E., Park, I. M., Buesing, L., Cunningham, J., & Paninski, L. (2015) Black box variational inference for state space models.
*ArXiv:1511.07367 [Stat]*. - BaMa17
- Bamler, R., & Mandt, S. (2017) Structured Black Box Variational Inference for Latent Time Series Models.
*ArXiv:1707.01069 [Cs, Stat]*. - BHIK09
- Bretó, C., He, D., Ionides, E. L., & King, A. A.(2009) Time series analysis via mechanistic models.
*The Annals of Applied Statistics*, 3(1), 319–348. DOI. - BrPK16
- Brunton, S. L., Proctor, J. L., & Kutz, J. N.(2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems.
*Proceedings of the National Academy of Sciences*, 113(15), 3932–3937. DOI. - CKDG15
- Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. C., & Bengio, Y. (2015) A Recurrent Latent Variable Model for Sequential Data. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 2980–2988). Curran Associates, Inc.
- DeDJ06
- Del Moral, P., Doucet, A., & Jasra, A. (2006) Sequential Monte Carlo samplers.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 68(3), 411–436. DOI. - DeDJ11
- Del Moral, P., Doucet, A., & Jasra, A. (2011) An adaptive sequential Monte Carlo method for approximate Bayesian computation.
*Statistics and Computing*, 22(5), 1009–1020. DOI. - DoFG01
- Doucet, A., Freitas, N., & Gordon, N. (2001) Sequential Monte Carlo Methods in Practice. . New York, NY: Springer New York
- DoJR13
- Doucet, A., Jacob, P. E., & Rubenthaler, S. (2013) Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models.
*ArXiv:1304.5768 [Stat]*. - Even03
- Evensen, G. (2003) The Ensemble Kalman Filter: theoretical formulation and practical implementation.
*Ocean Dynamics*, 53(4), 343–367. DOI. - Even09a
- Evensen, G. (2009a) Data Assimilation - The Ensemble Kalman Filter. . Berlin; Heidelberg: Springer
- Even09b
- Evensen, G. (2009b) The ensemble Kalman filter for combined state and parameter estimation.
*IEEE Control Systems*, 29(3), 83–104. DOI. - HeIK10
- He, D., Ionides, E. L., & King, A. A.(2010) Plug-and-play inference for disease dynamics: measles in large and small populations as a case study.
*Journal of The Royal Society Interface*, 7(43), 271–283. DOI. - HüKü01
- Hürzeler, M., & Künsch, H. R.(2001) Approximating and Maximising the Likelihood for a General State-Space Model. In Sequential Monte Carlo Methods in Practice (pp. 159–175). Springer, New York, NY DOI.
- InMa17
- Ingraham, J., & Marks, D. (2017) Variational Inference for Sparse and Undirected Models. In PMLR (pp. 1607–1616).
- IBAK11
- Ionides, E. L., Bhadra, A., Atchadé, Y., & King, A. (2011) Iterated filtering.
*The Annals of Statistics*, 39(3), 1776–1802. DOI. - IoBK06
- Ionides, E. L., Bretó, C., & King, A. A.(2006) Inference for nonlinear dynamical systems.
*Proceedings of the National Academy of Sciences*, 103(49), 18438–18443. DOI. - INAS15
- Ionides, E. L., Nguyen, D., Atchadé, Y., Stoev, S., & King, A. A.(2015) Inference for dynamic and latent variable models via iterated, perturbed Bayes maps.
*Proceedings of the National Academy of Sciences*, 112(3), 719–724. DOI. - KDSM09
- Kantas, N., Doucet, A., Singh, S. S., & Maciejowski, J. M.(2009) An Overview of Sequential Monte Carlo Methods for Parameter Estimation in General State-Space Models.
*IFAC Proceedings Volumes*, 42(10), 774–785. DOI. - Kita98
- Kitagawa, G. (1998) A self-organizing state-space model.
*Journal of the American Statistical Association*, 1203–1215. - KrSS15
- Krishnan, R. G., Shalit, U., & Sontag, D. (2015) Deep kalman filters.
*ArXiv Preprint ArXiv:1511.05121*. - KrSS17
- Krishnan, R. G., Shalit, U., & Sontag, D. (2017) Structured Inference Networks for Nonlinear State Space Models. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.
- LIJR17
- Le, T. A., Igl, M., Jin, T., Rainforth, T., & Wood, F. (2017) Auto-Encoding Sequential Monte Carlo.
*ArXiv Preprint ArXiv:1705.10306*. - LIFM12
- Lindström, E., Ionides, E., Frydendall, J., & Madsen, H. (2012) Efficient Iterated Filtering. In IFAC-PapersOnLine (System Identification, Volume 16) (Vol. 45, pp. 1785–1790). IFAC & Elsevier Ltd. DOI.
- LSBW08
- Lindström, E., Ströjby, J., Brodén, M., Wiktorsson, M., & Holst, J. (2008) Sequential calibration of options.
*Computational Statistics & Data Analysis*, 52(6), 2877–2891. DOI. - LiWe01
- Liu, J., & West, M. (2001) Combined Parameter and State Estimation in Simulation-Based Filtering. In Sequential Monte Carlo Methods in Practice (pp. 197–223). Springer, New York, NY DOI.
- Ljun79
- Ljung, L. (1979) Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems.
*IEEE Transactions on Automatic Control*, 24(1), 36–50. DOI. - LjPW12
- Ljung, L., Pflug, G. C., & Walk, H. (2012) Stochastic approximation and optimization of random systems. (Vol. 17). Birkhäuser
- MLTH17
- Maddison, C. J., Lawson, D., Tucker, G., Heess, N., Norouzi, M., Mnih, A., … Teh, Y. W.(2017) Filtering Variational Objectives.
*ArXiv Preprint ArXiv:1705.09279*. - MSGH05
- Moradkhani, H., Sorooshian, S., Gupta, H. V., & Houser, P. R.(2005) Dual state–parameter estimation of hydrological models using ensemble Kalman filter.
*Advances in Water Resources*, 28(2), 135–147. DOI. - NLRB17
- Naesseth, C. A., Linderman, S. W., Ranganath, R., & Blei, D. M.(2017) Variational Sequential Monte Carlo.
*ArXiv Preprint ArXiv:1705.11140*. - OlPS17
- Oliva, J. B., Poczos, B., & Schneider, J. (2017) The Statistical Recurrent Unit.
*ArXiv:1703.00381 [Cs, Stat]*. - RMAR07
- Rasmussen, J. G., Møller, J., Aukema, B. H., Raffa, K. F., & Zhu, J. (2007) Continuous time modelling of dynamical spatial lattice data observed at sparsely distributed times.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 69(4), 701–713. DOI. - SZLB95
- Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P.-Y., … Juditsky, A. (1995) Nonlinear black-box modeling in system identification: a unified overview.
*Automatica*, 31(12), 1691–1724. DOI. - SöSt88
- Söderström, T., & Stoica, P. (Eds.). (1988) System Identification. . Upper Saddle River, NJ, USA: Prentice-Hall, Inc.
- TaOl17
- Tallec, C., & Ollivier, Y. (2017) Unbiasing Truncated Backpropagation Through Time.
*ArXiv:1705.08209 [Cs]*. - TABH03
- Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., & Whitaker, J. S.(2003) Ensemble square root filters.
*Monthly Weather Review*, 131(7), 1485–1490. - Werb88
- Werbos, P. J.(1988) Generalization of backpropagation with application to a recurrent gas market model.
*Neural Networks*, 1(4), 339–356. DOI.