a.k.a. recursive estimation, recursive estimation, state space model calibration, recursive identification. Possibly the same as, but differently framed to, online estimation.
State filters are cool for estimating timevarying hidden states. How about learning the nontimevarying parameters of the model generating your states? Classic ways that you can do this in dynamical systems include basic linear system identification, and general system identification. But can you identify the fixed parameters (not just hidden states) with a state filter?
Yes.
According to LIFM12, here are some landmark papers:
Augmenting the unobserved state vector is a well known technique, used in the system identification community for decades, see e.g. Ljung (Ljun79); Söderström and Stoica (SöSt88); Lindström et al. (LSBW08). Similar ideas, using Sequential Monte Carlos methods, were suggested by Kitagawa (Kita98); Liu and West (LiWe01). Combined state and parameter estimation is also the standard technique for data assimilation in highdimensional systems, see Moradkhani et al. (MSGH05); Evensen (Even09a, Even09b ).
However, introducing random walk dynamics to the parameters with fixed variance leads to a new dynamical stochastic system with properties that may be different from the properties of the original system. That implies that the variance of the random walk should be decreased, when the method is used for offline parameter estimation, cf. Hürzeler and Künsch (HüKü01).
Classic recursive estimation
TBD.
Iterated filtering
Related: indirect inference. Precise relation will have to wait, since I currently do not care enough about indirect inference.
Questions

Ionides and King dominate my citations, at least for the frequentist stuff. Surely other people use this method too? But what are the keywords? This research is suspiciously concentrated in U Michigan, but the idea is not so esoteric. I think I am caught in a citation bubble.
Update: the oceanographic crew of Even03 etc seem to do this with Bayes a lot.

a lot of the variational filtering literature turns out to be about attempting this with, effectively, neural nets.

can I estimate regularisation this way, despite the lack of probabilistic interpretation? (leveraging Bayesianprior parameter relations)

How does this work with nonMarkov systems? Do we need to bother, or can we just do the Hamiltonian trick and augment the state vector? Can we talk about mixing, or correlation decay? Should I then shoot for the newwave mixing approaches of Kuznetsov and Mohri etc?
Basic Construction
There are a few variations. We start with the basic continuous time state space model.
Here we have an unobserved Markov state process on and an observation process on . For now they will be assumed to be finite dimensional vectors over They will additionally depend upon a vector of parameters We observe the process at discrete times and we will write the observations
We presume our processes are completely specified by the following conditional densities (which might not have closedform expression)
The transition density
The observation density…
TBC.
Awaiting filing
Recently enjoyed: Sahani Pathiraja's state filter does something cool, in attempting to identify process model noise – a conditional nonparametric density of process errors, that may be used to come up with some neat process models. I'm not convinced about her use of kernel density estimation, since these scale badly precisely when you need them most, in high dimension; but any nonparametric density estimator would, I assume, work, and that would be awesome.
Implementations
pomp does state filtering inference in R.
For some example of doing this in Stan see Sinhrks' stanstatespace.
Refs
 CKDG15: Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, Yoshua Bengio (2015) A Recurrent Latent Variable Model for Sequential Data. In Advances in Neural Information Processing Systems 28 (pp. 2980–2988). Curran Associates, Inc.
 Kita98: Genshiro Kitagawa (1998) A selforganizing statespace model. Journal of the American Statistical Association, 1203–1215.
 DeDJ11: Pierre Del Moral, Arnaud Doucet, Ajay Jasra (2011) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing, 22(5), 1009–1020. DOI
 KDSM09: N. Kantas, A. Doucet, S. S. Singh, J. M. Maciejowski (2009) An Overview of Sequential Monte Carlo Methods for Parameter Estimation in General StateSpace Models. IFAC Proceedings Volumes, 42(10), 774–785. DOI
 HüKü01: Markus Hürzeler, Hans R. Künsch (2001) Approximating and Maximising the Likelihood for a General StateSpace Model. In Sequential Monte Carlo Methods in Practice (pp. 159–175). Springer, New York, NY DOI
 Ljun79: L. Ljung (1979) Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems. IEEE Transactions on Automatic Control, 24(1), 36–50. DOI
 LIJR17: Tuan Anh Le, Maximilian Igl, Tom Jin, Tom Rainforth, Frank Wood (2017) AutoEncoding Sequential Monte Carlo. ArXiv Preprint ArXiv:1705.10306.
 APBC15: Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, Liam Paninski (2015) Black box variational inference for state space models. ArXiv:1511.07367 [Stat].
 LiWe01: Jane Liu, Mike West (2001) Combined Parameter and State Estimation in SimulationBased Filtering. In Sequential Monte Carlo Methods in Practice (pp. 197–223). Springer, New York, NY DOI
 Even09: Geir Evensen (2009) Data Assimilation  The Ensemble Kalman Filter. Berlin; Heidelberg: Springer
 LeDL07: S. R. Lele, B. Dennis, F. Lutscher (2007) Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters, 10(7), 551. DOI
 KrSS15: Rahul G. Krishnan, Uri Shalit, David Sontag (2015) Deep kalman filters. ArXiv Preprint ArXiv:1511.05121.
 DoJR13: Arnaud Doucet, Pierre E. Jacob, Sylvain Rubenthaler (2013) DerivativeFree Estimation of the Score Vector and Observed Information Matrix with Application to StateSpace Models. ArXiv:1304.5768 [Stat].
 BrPK16: Steven L. Brunton, Joshua L. Proctor, J. Nathan Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15), 3932–3937. DOI
 MSGH05: Hamid Moradkhani, Soroosh Sorooshian, Hoshin V. Gupta, Paul R. Houser (2005) Dual state–parameter estimation of hydrological models using ensemble Kalman filter. Advances in Water Resources, 28(2), 135–147. DOI
 LIFM12: Erik Lindström, Edward Ionides, Jan Frydendall, Henrik Madsen (2012) Efficient Iterated Filtering. In IFACPapersOnLine (System Identification, Volume 16) (Vol. 45, pp. 1785–1790). IFAC & Elsevier Ltd. DOI
 TABH03: Michael K. Tippett, Jeffrey L. Anderson, Craig H. Bishop, Thomas M. Hamill, Jeffrey S. Whitaker (2003) Ensemble square root filters. Monthly Weather Review, 131(7), 1485–1490.
 LeNS10: Subhash R. Lele, Khurram Nadeem, Byron Schmuland (2010) Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association, 105(492), 1617–1625. DOI
 DrPM16: Christopher C. Drovandi, Anthony N. Pettitt, Roy A. McCutchan (2016) Exact and Approximate Bayesian Inference for Low IntegerValued Time Series Models with Intractable Likelihoods. Bayesian Analysis, 11(2), 325–352. DOI
 MLTH17: Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, … Yee Whye Teh (2017) Filtering Variational Objectives. ArXiv Preprint ArXiv:1705.09279.
 Werb88: Paul J. Werbos (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339–356. DOI
 INAS15: Edward L. Ionides, Dao Nguyen, Yves Atchadé, Stilian Stoev, Aaron A. King (2015) Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. Proceedings of the National Academy of Sciences, 112(3), 719–724. DOI
 IoBK06: E. L. Ionides, C. Bretó, A. A. King (2006) Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 103(49), 18438–18443. DOI
 IBAK11: Edward L. Ionides, Anindya Bhadra, Yves Atchadé, Aaron King (2011) Iterated filtering. The Annals of Statistics, 39(3), 1776–1802. DOI
 HeBu14: Markus Heinonen, Florence d’AlchéBuc (2014) Learning nonparametric differential equations with operatorvalued kernels and gradient matching. ArXiv:1411.5172 [Cs, Stat].
 SZLB95: Jonas Sjöberg, Qinghua Zhang, Lennart Ljung, Albert Benveniste, Bernard Delyon, PierreYves Glorennec, … Anatoli Juditsky (1995) Nonlinear blackbox modeling in system identification: a unified overview. Automatica, 31(12), 1691–1724. DOI
 KDSM15: Nikolas Kantas, Arnaud Doucet, Sumeetpal S. Singh, Jan Maciejowski, Nicolas Chopin (2015) On Particle Methods for Parameter Estimation in StateSpace Models. Statistical Science, 30(3), 328–351. DOI
 FeKü18: Paul Fearnhead, Hans R. Künsch (2018) Particle Filters and Data Assimilation. Annual Review of Statistics and Its Application, 5(1), 421–449. DOI
 AnDH10: Christophe Andrieu, Arnaud Doucet, Roman Holenstein (2010) Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(3), 269–342. DOI
 HeIK10: Daihai He, Edward L. Ionides, Aaron A. King (2010) Plugandplay inference for disease dynamics: measles in large and small populations as a case study. Journal of The Royal Society Interface, 7(43), 271–283. DOI
 LSBW08: Erik Lindström, Jonas Ströjby, Mats Brodén, Magnus Wiktorsson, Jan Holst (2008) Sequential calibration of options. Computational Statistics & Data Analysis, 52(6), 2877–2891. DOI
 DoFG01: Arnaud Doucet, Nando Freitas, Neil Gordon (2001) Sequential Monte Carlo Methods in Practice. New York, NY: Springer New York
 DeDJ06: Pierre Del Moral, Arnaud Doucet, Ajay Jasra (2006) Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 68(3), 411–436. DOI
 LjPW12: Lennart Ljung, Georg Ch Pflug, Harro Walk (2012) Stochastic approximation and optimization of random systems (Vol. 17). Birkhäuser
 BaMa17: Robert Bamler, Stephan Mandt (2017) Structured Black Box Variational Inference for Latent Time Series Models. ArXiv:1707.01069 [Cs, Stat].
 KrSS17: Rahul G. Krishnan, Uri Shalit, David Sontag (2017) Structured Inference Networks for Nonlinear State Space Models. In Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence (pp. 2101–2109).
 SöSt88: (1988) System Identification. Upper Saddle River, NJ, USA: PrenticeHall, Inc.
 Even09: G. Evensen (2009) The ensemble Kalman filter for combined state and parameter estimation. IEEE Control Systems, 29(3), 83–104. DOI
 Even03: Geir Evensen (2003) The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53(4), 343–367. DOI
 OlPS17: Junier B. Oliva, Barnabas Poczos, Jeff Schneider (2017) The Statistical Recurrent Unit. ArXiv:1703.00381 [Cs, Stat].
 LjSö83: Lennart Ljung, Torsten Söderström (1983) Theory and practice of recursive identification. Cambridge, Mass: MIT Press
 DuKo12: J. Durbin, S. J. Koopman (2012) Time series analysis by state space methods. Oxford: Oxford University Press
 BHIK09: Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King (2009) Time series analysis via mechanistic models. The Annals of Applied Statistics, 3(1), 319–348. DOI
 TaOl17: Corentin Tallec, Yann Ollivier (2017) Unbiasing Truncated Backpropagation Through Time. ArXiv:1705.08209 [Cs].
 InMa17: John Ingraham, Debora Marks (2017) Variational Inference for Sparse and Undirected Models. In PMLR (pp. 1607–1616).
 NLRB17: Christian A. Naesseth, Scott W. Linderman, Rajesh Ranganath, David M. Blei (2017) Variational Sequential Monte Carlo. ArXiv Preprint ArXiv:1705.11140.