Working through some generalisations of the GaltonWatson process as an INAR process.
Consider

van Harn & Steutel's work on “Fstable branching processes.” Also bounded influence kernel?

Lee, Hopcraft, Jakeman and Williams on discrete stable processes. Discrete state, continuous time  How do these differ from the usual Hawkes processes, if at all?
Long Memory GaltonWatson
For my own edification and amusement I would like to walk through the construction of a particular analogue of the continuous time Hawkes point process on a discrete index set.
Specifically, a nonMarkovian generalisation of the GaltonWatson process which still operates in quantised time, but has interesting, possiblyunbounded influence kernels, like the Hawkes process.
I denote a realisation of the process . and the associated nonnegative increment process and a conditional nonnegative pseudointensity process {N_s}_{s \lt t}). By “pseudointensity” I mean that the innovation law is parameterised (solely, for now) by some scalarvalued process . That is, . For the moment I will take this be be Poisson. To complete the analogy with the Hawkes process I choose the dependence on the past values of the process linear with influence kernel : This is also close to clustering, and indeed there are lots of papers noticing the connection.
Then a linear conditional intensity process would be
The in is to make sure our influence kernel is defined on , which is convenient for typical count distribution functions.
If the kernel has bounded support such that
then we have an autoregressive count process of order p. More on that in a moment.
What influence kernel shape will we use?
Geometric distributions are natural, although it doesn't have to be strictly monotonic, or even unimodal. Poisson or negative binomial would also work. We could in general give any arbitrary probability mass function as influence kernel, or use a nonparametric form.
for some .
If we expect to be using sparsifying lasso penalties for such a kernel we probably want to decompose the kernel in a way that minimises correlation between mixture components to improve our odds of correctly identifying dependency at different scales. If we constrain our distributions to be positive the only way to do this is for them to be completely orthogonal is to have disjoint support.
Intermediately, we could choose a Poisson mixture
There is a subtlety here with regard to the filtration  do we set up the kernel strictly to regard triggering events at previous timesteps? If so, no problem. If we want to allow sameday triggering, we might allow the exogenous events to also contribute to the kernel, in which case we might have to estimate an extra influence parameter, or find some principled way to include it in the kernel weights.
TODO: unconditional distribution using, e.g. generator fns.
Autoregressive characterisation
Turns out Steutel and van Harn saw me coming here, and characterised this process in 1979  see StHa79. (Wait  is this strictly true, that we can make this go with a thinning operator? Many related definitions here, muddying the waters)
We need their binomial thinning operator , which is defined for some count RV by
for RVs.
In terms of generating functions,
There are many generalisation of this operator  see Weiß08 for an overview.
Anyway, you can use this thinning operator to construct an autoregressive time series model driven by thinned versions of its history.
(Maybe it would be simpler to use Fokkianos' GLM characterisation? I think they are equivalent or nearly equivalent in ths case  certainly with stable distributions they are.)
Estimation of parameters
Well studied for finiteorder GINAR(p) processes.
Influence kernels
Hardiman et al propose multiplescale exponential kernels to simultaneously estimate decays and branching ratios Bacry et al 2012 have a related nonparametric method based on estimating the kernel in the spectral domain. Convergence properties are unclear.
We are also free to use a sumofexponentials kernel, possibly calculating the branching ratio from that alone, and some measure of tailheaviness from that.
Possibly Smoothlasso (penalises component CHANGE)
Endoexo models
Note that we can still recover the endoexo model with this by simply calculating the projected ratio between exogenous and endogenous events. It would be interesting to derive the properties of this as a single parameter of interest.
Short timescale process
We want the distribution within a bin to be plausibly a cluster process.
The distribution of subcritical processes are generally tedious to calculate, although we can get a nice form for the generating funciton for a geometric offspring distribution from HaJV05, p115.
Set and . We write for the G). Then the (noncritical) geometric offspring distribution branching process obeys the identity
This can get us a formula for the first two factorial moments, and hence the mean and variance, which is all we will bother with here.
Although, reading HaOa74 I see that the actual offspring distribution is Poisson. Maybe I should use Dwas69 to get the moments? Dominic Yeo has a great explanation as always.
Ideas
Consider the contagion process with immigration, where the immigration rate must have the same distribution as this, where the immigration rate is proportional to a contagion proces with a law from the same family (possibly different parameters). Possibly many such, on a graph. e.g. a model for multiple “cities” or other discrete population with some contagion between them. (I'm sure there is some evolutionary biology on this point, not just epidemiology.)
Can this be linked to general theory of coarse graining?
Refs
 SoSA09: (2009) A class of discrete distributions induced by stable laws. Statistics & Probability Letters, 79(14), 1608–1614. DOI
 HaOa74: (1974) A cluster process representation of a selfexciting process. Journal of Applied Probability, 11(3), 493. DOI
 Weiß09: (2009) A New Class of Autoregressive Models for Time Series of Binomial Counts. Communications in Statistics  Theory and Methods, 38(4), 447–460. DOI
 CuLu09: (2009) A new look at time series of counts. Biometrika, 96(4), 781–792. DOI
 Zege88: (1988) A regression model for time series of counts. Biometrika, 75(4), 621–629. DOI
 BaSø94: (1994) A Review of Some Aspects of Asymptotic Likelihood Theory for Stochastic Processes. International Statistical Review / Revue Internationale de Statistique, 62(1), 133–165. DOI
 FrMc04: (2004) Analysis of low count time series data by Poisson autoregression. Journal of Time Series Analysis, 25(5), 701–722. DOI
 Arag12: (2012) Applied epidemiology using R. MedEpi Publishing. http://www. medepi. net/epir/index. html. Calendar Time. Accessed
 KvPa11: (2011) Asymptotic inference for partially observed branching processes. Advances in Applied Probability, 43(4), 1166–1190. DOI
 Mcke86: (1986) Autoregressive MovingAverage Processes with NegativeBinomial and Geometric Marginal Distributions. Advances in Applied Probability, 18(3), 679–705. DOI
 NaWa84: (1984) Branching processes. Stochastic Processes and Their Applications, 18(2), 189. DOI
 LeHJ08: (2008) Continuous and discrete stable processes. Physical Review E, 77(1), 011109. DOI
 StHa79: (1979) Discrete Analogues of SelfDecomposability and Stability. The Annals of Probability, 7(5), 893–899. DOI
 Mcke03: (2003) Discrete variate time series. In Handbook of Statistics (Vol. 21, pp. 573–606). Elsevier
 DrAW09: (2009) Efficient estimation of autoregression parameters and innovation distributions for semiparametric integervalued AR(p) models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 467–485. DOI
 WeWi90: (1990) Estimation of the Means in the Branching Process with Immigration. The Annals of Statistics, 18(4), 1757–1773. DOI
 Winn91: (1991) Estimation of the variances in the branching process with immigration. Probability Theory and Related Fields, 88(1), 77–106. DOI
 Lato98: (1998) Existence and Stochastic Structure of a Nonnegative Integervalued Autoregressive Process. Journal of Time Series Analysis, 19(4), 439–455. DOI
 HaSC09: (2009) Extremes of integervalued moving average sequences. TEST, 19(2), 359–374. DOI
 AlAl92: (1992) First order autoregressive time series with negative binomial and geometric marginals. Communications in Statistics  Theory and Methods, 21(9), 2483–2492. DOI
 AlAl88: (1988) FirstOrder IntegerValued Autoregressive (INAR (1)) Process: Distributional and Regression Properties. Statistica Neerlandica, 42(1), 53–61. DOI
 AlAl87: (1987) FirstOrder IntegerValued Autoregressive (INAR(1)) Process. Journal of Time Series Analysis, 8(3), 261–275. DOI
 ZhBa08: (2008) Firstorder observationdriven integervalued autoregressive processes. Statistics & Probability Letters, 78(1), 1–9. DOI
 ZhBD07: (2007) Firstorder random coefficient integervalued autoregressive processes. Journal of Statistical Planning and Inference, 137(1), 212–229. DOI
 KrPa14: (2014) Frequentist estimation of an epidemic’s spreading potential when observations are scarce. Biometrika, 101(1), 141–154. DOI
 EiDD16: (2016) Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions. Journal of Time Series Analysis, n/an/a. DOI
 SaEb94: (1994) Identification and characterization of rhythmic nociceptive and nonnociceptive spinal dorsal horn neurons in the rat. Neuroscience, 61(4), 991–1006. DOI
 LaDG09: (2009) Inference for Partially Observed Multitype Branching Processes and Ecological Applications. ArXiv:0902.4520 [Stat].
 MoSP12: (2012) IntegerValued SelfExciting Threshold Autoregressive Processes. Communications in Statistics  Theory and Methods, 41(15), 2717–2737. DOI
 PaSa17: (2017) Large deviation principle for epidemic models. Journal of Applied Probability, 54(3), 905–920. DOI
 PaSa16: (2016) Large deviation principle for Poisson driven SDEs in epidemic models. ArXiv:1606.01619 [Math].
 KrPa16: (2016) Large deviations for infectious diseases models. ArXiv:1602.02803 [Math].
 ZeQa88: (1988) Markov Regression Models for Time Series: A QuasiLikelihood Approach. Biometrics, 44(4), 1019–1031. DOI
 BiSø95: (1995) Martingale Estimation Functions for Discretely Observed Diffusion Processes. Bernoulli, 1(1/2), 17–39. DOI
 BhAd81: (1981) Maximum Likelihood Estimation for Branching Processes with Immigration. Advances in Applied Probability, 13(3), 498–509. DOI
 Böck98: (1998) Mixed INAR(1) Poisson regression models: Analyzing heterogeneity and serial dependencies in longitudinal count data. Journal of Econometrics, 89(1–2), 317–338. DOI
 TuSB14: (2014) Models for IntegerValued Time Series. In NonLinear Time Series (pp. 199–244). Springer International Publishing
 KeFo02: (2002) Regression models for time series analysis. Chichester; Hoboken, NJ: John Wiley & Sons
 HaSV82: (1982) Selfdecomposable discrete distributions and branching processes. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete, 61(1), 97–118. DOI
 Mcke88: (1988) Some ARMA Models for Dependent Sequences of Poisson Counts. Advances in Applied Probability, 20(4), 822–835. DOI
 Foki11: (2011) Some recent progress in count time series. Statistics, 45(1), 49–58. DOI
 HaSt93: (1993) Stability equations for processes with stationary independent increments using branching processes and Poisson mixtures. Stochastic Processes and Their Applications, 45(2), 209–230. DOI
 AlBo05: (2005) Stationary solutions for integervalued autoregressive processes. International Journal of Mathematics and Mathematical Sciences, 2005(1), 1–18. DOI
 PrMW09: (2009) Subsampling effects in neuronal avalanche distributions recorded in vivo. BMC Neuroscience, 10(1), 40. DOI
 GeHW06: (2006) The Rate Adapting Poisson Model for Information Retrieval and Object Recognition. In Proceedings of the 23rd International Conference on Machine Learning (pp. 337–344). New York, NY, USA: ACM DOI
 GeKa04: (2004) The Shape of Large GaltonWatson Trees with Possibly Infinite Variance. Random Struct. Algorithms, 25(3), 311–335. DOI
 Bhat87: (1987) The Time to Extinction of Branching Processes and LogConvexity: I. Probability in the Engineering and Informational Sciences, 1(03), 265–278. DOI
 Dwas69: (1969) The Total Progeny in a Branching Process and a Related Random Walk. Journal of Applied Probability, 6(3), 682–686. DOI
 Weiß08: (2008) Thinning operations for modeling time series of counts—a survey. Advances in Statistical Analysis, 92(3), 319–341. DOI