# Generalized Galton-Watson processes

Working through some generalisations of the Galton-Watson process as an INAR process.

Consider

• van Harn & Steutel’s work on “F-stable branching processes.” Also bounded influence kernel?
• Lee, Hopcraft, Jakeman and Williams on discrete stable processes. Discrete state, continuous time - How do these differ from the usual Hawkes processes, if at all?

## Long Memory Galton-Watson

For my own edification and amusement I would like to walk through the construction of a particular analogue of the continuous time Hawkes point process on a discrete index set.

Specifically, a non-Markovian generalisation of the Galton-Watson process which still operates in quantised time, but has interesting, possibly-unbounded influence kernels, like the Hawkes process.

I denote a realisation of the process $\{N_t\}_{t\in\mathbb{N}}$. and the associated non-negative increment process $\{X_t\}\equiv\{N_t-N_{t-1}\}$ and a conditional non-negative pseudo-intensity process $\lambda_t\equiv g(\{N_s\}_{s \lt t})$, adapted to the whole history $\{N_s\}_{s \lt t}$. By “pseudo-intensity” I mean that the innovation law $X_t\sim\mathcal{L}_t$ is parameterised (solely, for now) by some scalar-valued process $\lambda_t(\mathcal{F}(X_t))$. That is, $\{X_t\}|\{N_s\}_{s \lt t}\sim \mathcal{L}(\lambda_t)$. For the moment I will take this be be Poisson. To complete the analogy with the Hawkes process I choose the dependence on the past values of the process linear with influence kernel $\phi$: This is also close to clustering, and indeed there are lots of papers noticing the connection.

\begin{equation*} \lambda_t\equiv \phi * X \end{equation*}

Then a linear conditional intensity process $\lambda_t$ would be

\begin{equation*} \lambda_t := \mu + \eta\sum_{0 \leq s <t} \phi(s-t-1)N_s \end{equation*}

The $-1$ in $\phi(s-t-1)$ is to make sure our influence kernel is defined on $\mathbb{N}_0$, which is convenient for typical count distribution functions.

If the kernel has bounded support such that

\begin{equation*} s>p\Rightarrow\phi(s)=0 \end{equation*}

then we have an autoregressive count process of order p. More on that in a moment.

What influence kernel shape will we use?

Geometric distributions are natural, although it doesn’t have to be strictly monotonic, or even unimodal. Poisson or negative binomial would also work. We could in general give any arbitrary probability mass function as influence kernel, or use a nonparametric form.

\begin{equation*} \phi_\text{Exp}(i) = \sum_{0 \leq k <K} b_ke^{a_ki} \end{equation*}

for some $\{a_k, b_k\}$.

If we expect to be using sparsifying lasso penalties for such a kernel we probably want to decompose the kernel in a way that minimises correlation between mixture components to improve our odds of correctly identifying dependency at different scales. If we constrain our distributions to be positive the only way to do this is for them to be completely orthogonal is to have disjoint support.

Intermediately, we could choose a Poisson mixture

\begin{equation*} \phi_\text{Pois}(i) = \sum_{0 \leq k <K} \frac{a_k^i}{i!} e^{-a_k} \end{equation*}

There is a subtlety here with regard to the filtration - do we set up the kernel strictly to regard triggering events at previous timesteps? If so, no problem. If we want to allow same-day triggering, we might allow the exogenous events to also contribute to the kernel, in which case we might have to estimate an extra influence parameter, or find some principled way to include it in the kernel weights.

TODO: unconditional distribution using, e.g. generator fns.

## Autoregressive characterisation

Turns out Steutel and van Harn saw me coming here, and characterised this process in 1979 - see StHa79. (Wait - is this strictly true, that we can make this go with a thinning operator? Many related definitions here, muddying the waters)

We need their binomial thinning operator $\odot$, which is defined for some count RV $X$ by

\begin{equation*} \alpha\odot X = \sum_{i=1}^X N_i \end{equation*}

for $N_i$ independent $\text{Bernoulli}(\alpha)$ RVs.

In terms of generating functions,

$G_{\alpha\odot X}(s)=G_{X}(1-\alpha+\alpha s)$

There are many generalisation of this operator - see Weiß08 for an overview.

Anyway, you can use this thinning operator to construct an autoregressive time series model driven by thinned versions of its history.

(Maybe it would be simpler to use Fokkianos’ GLM characterisation? I think they are equivalent or nearly equivalent in ths case - certainly with stable distributions they are.)

## Estimation of parameters

Well studied for finite-order GINAR(p) processes.

## Influence kernels

Hardiman et al propose multiple-scale exponential kernels to simultaneously estimate decays and branching ratios Bacry et al 2012 have a related nonparametric method based on estimating the kernel in the spectral domain. Convergence properties are unclear.

We are also free to use a sum-of-exponentials kernel, possibly calculating the branching ratio from that alone, and some measure of tail-heaviness from that.

Possibly Smooth-lasso (penalises component CHANGE)

## Endo-exo models

Note that we can still recover the endo-exo model with this by simply calculating the projected ratio between exogenous and endogenous events. It would be interesting to derive the properties of this as a single parameter of interest.

## Short timescale process

We want the distribution within a bin to be plausibly a cluster process.

The distribution of subcritical processes are generally tedious to calculate, although we can get a nice form for the generating funciton for a geometric offspring distribution from HaJV05, p115.

Set $\frac{1}{\lambda+1}=p$ and $q=1-p$. We write $G^n\equiv G\cdot G\cdot \dots \codt G\cdot G$ for the $n$-fold composition of $G$. Then the (non-critical) geometric offspring distribution branching process obeys the identity

\begin{equation*} 1-G^n(s;\lambda) = \frac{\lambda^n(\lambda-1)(1-s)}{\lambda(\lambda^n-1)(1-s)+\lambda-1} \end{equation*}

This can get us a formula for the first two factorial moments, and hence the mean and variance, which is all we will bother with here.

Although, reading HaOa74 I see that the actual offspring distribution is Poisson. Maybe I should use Dwas69 to get the moments? Dominic Yeo has a great explanation as always.

## Ideas

Consider the contagion process with immigration, where the immigration rate must have the same distribution as this, where the immigration rate is proportional to a contagion proces with a law from the same family (possibly different parameters). Possibly many such, on a graph. e.g. a model for multiple “cities” or other discrete population with some contagion between them. (I’m sure there is some evolutionary biology on this point, not just epidemiology.)

Can this be linked to general theory of coarse graining?

AlAl92
Al-Osh, M. A., & Aly, E.-E. A. A.(1992) First order autoregressive time series with negative binomial and geometric marginals. Communications in Statistics - Theory and Methods, 21(9), 2483–2492. DOI.
AlAl87
Al-Osh, M. A., & Alzaid, A. A.(1987) First-Order Integer-Valued Autoregressive (INAR(1)) Process. Journal of Time Series Analysis, 8(3), 261–275. DOI.
AlBo05
Aly, E.-E. A. A., & Bouzar, N. (2005) Stationary solutions for integer-valued autoregressive processes. International Journal of Mathematics and Mathematical Sciences, 2005(1), 1–18. DOI.
AlAl88
Alzaid, A., & Al-Osh, M. (1988) First-Order Integer-Valued Autoregressive (INAR (1)) Process: Distributional and Regression Properties. Statistica Neerlandica, 42(1), 53–61. DOI.
BaSø94:
Barndorff-Nielsen, O. E., & Sørensen, M. (1994) A Review of Some Aspects of Asymptotic Likelihood Theory for Stochastic Processes. International Statistical Review / Revue Internationale de Statistique, 62(1), 133–165. DOI.
Bhat, B. R., & Adke, S. R.(1981) Maximum Likelihood Estimation for Branching Processes with Immigration. Advances in Applied Probability, 13(3), 498–509. DOI.
Bhat87
Bhattacharjee, M. C.(1987) The Time to Extinction of Branching Processes and Log-Convexity: I. Probability in the Engineering and Informational Sciences, 1(03), 265–278. DOI.
BiSø95:
Bibby, B. M., & Sørensen, M. (1995) Martingale Estimation Functions for Discretely Observed Diffusion Processes. Bernoulli, 1(1/2), 17–39. DOI.
Böck98:
Böckenholt, U. (1998) Mixed INAR(1) Poisson regression models: Analyzing heterogeneity and serial dependencies in longitudinal count data. Journal of Econometrics, 89(1–2), 317–338. DOI.
CuLu09
Cui, Y., & Lund, R. (2009) A new look at time series of counts. Biometrika, 96(4), 781–792. DOI.
DrAW09
Drost, F. C., Akker, R. van den, & Werker, B. J. M.(2009) Efficient estimation of auto-regression parameters and innovation distributions for semiparametric integer-valued AR(p) models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 467–485. DOI.
Dwas69
Dwass, M. (1969) The Total Progeny in a Branching Process and a Related Random Walk. Journal of Applied Probability, 6(3), 682–686. DOI.
Foki11
Fokianos, K. (2011) Some recent progress in count time series. Statistics, 45(1), 49–58. DOI.
FrMc04
Freeland, R. K., & McCabe, B. P. M.(2004) Analysis of low count time series data by Poisson autoregression. Journal of Time Series Analysis, 25(5), 701–722. DOI.
FuBa02
Fukasawa, T., & Basawa, I. V.(2002) Estimation for a class of generalized state-space time series models. Statistics & Probability Letters, 60(4), 459–473. DOI.
GeHW06
Gehler, P. V., Holub, A. D., & Welling, M. (2006) The Rate Adapting Poisson Model for Information Retrieval and Object Recognition. In Proceedings of the 23rd International Conference on Machine Learning (pp. 337–344). New York, NY, USA: ACM DOI.
GeKa04
Geiger, J., & Kauffmann, L. (2004) The Shape of Large Galton-Watson Trees with Possibly Infinite Variance. Random Struct. Algorithms, 25(3), 311–335. DOI.
HaJV05
Haccou, P., Jagers, P., & Vatutin, V. A.(2005) Branching processes: variation, growth, and extinction of populations. (Digitally printed version.). Cambridge ; New York: Cambridge University Press
HaSC09
Hall, A., Scotto, M., & Cruz, J. (2009) Extremes of integer-valued moving average sequences. TEST, 19(2), 359–374. DOI.
HaOa74
Hawkes, A. G., & Oakes, D. (1974) A cluster process representation of a self-exciting process. Journal of Applied Probability, 11(3), 493. DOI.
KrPa14
Kraus, A., & Panaretos, V. M.(2014) Frequentist estimation of an epidemic’s spreading potential when observations are scarce. Biometrika, 101(1), 141–154. DOI.
KvPa11
Kvitkovičová, A., & Panaretos, V. M.(2011) Asymptotic inference for partially observed branching processes. Advances in Applied Probability, 43(4), 1166–1190. DOI.
Laredo, C., David, O., & Garnier, A. (2009) Inference for Partially Observed Multitype Branching Processes and Ecological Applications. arXiv:0902.4520 [stat].
Lato98
Latour, A. (1998) Existence and Stochastic Structure of a Non-negative Integer-valued Autoregressive Process. Journal of Time Series Analysis, 19(4), 439–455. DOI.
LeHJ08
Lee, W. H., Hopcraft, K. I., & Jakeman, E. (2008) Continuous and discrete stable processes. Physical Review E, 77(1), 011109. DOI.
Mcke86
McKenzie, E. (1986) Autoregressive Moving-Average Processes with Negative-Binomial and Geometric Marginal Distributions. Advances in Applied Probability, 18(3), 679–705. DOI.
Mcke88
McKenzie, E. (1988) Some ARMA Models for Dependent Sequences of Poisson Counts. Advances in Applied Probability, 20(4), 822–835. DOI.
Mcke03
Mckenzie, E. (2003) Discrete variate time series. In Handbook of Statistics, C. Raoand D. Shanbhag, Eds., ElsevierScience, Amsterdam, 573–606. MR1973555. Citeseer
MoSP12
Monteiro, M., Scotto, M. G., & Pereira, I. (2012) Integer-Valued Self-Exciting Threshold Autoregressive Processes. Communications in Statistics - Theory and Methods, 41(15), 2717–2737. DOI.
NaWa84
Nanthi, K., & Wasan, M. T.(1984) Branching processes. Stochastic Processes and Their Applications, 18(2), 189. DOI.
NaRB12
Nastić, A. S., Ristić, M. M., & Bakouch, H. S.(2012) A combined geometric INAR(p) model based on negative binomial thinning. Mathematical and Computer Modelling, 55(5–6), 1665–1672. DOI.
PrMW09
Priesemann, V., Munk, M. H., & Wibral, M. (2009) Subsampling effects in neuronal avalanche distributions recorded in vivo. BMC Neuroscience, 10(1), 40. DOI.
RiNB12
Ristić, M. M., Nastić, A. S., & Bakouch, H. S.(2012) Estimation in an Integer-Valued Autoregressive Process with Negative Binomial Marginals (NBINAR(1)). Communications in Statistics - Theory and Methods, 41(4), 606–618. DOI.
SiSi06
Silva, I., & Silva, M. E.(2006) Asymptotic distribution of the Yule–Walker estimator for INAR processes. Statistics & Probability Letters, 76(15), 1655–1663. DOI.
SoSA09
Soltani, A. R., Shirvani, A., & Alqallaf, F. (2009) A class of discrete distributions induced by stable laws. Statistics & Probability Letters, 79(14), 1608–1614. DOI.
StHa79
Steutel, F. W., & van Harn, K. (1979) Discrete Analogues of Self-Decomposability and Stability. The Annals of Probability, 7(5), 893–899. DOI.
TuSB14
Turkman, K. F., Scotto, M. G., & Bermudez, P. de Z. (2014) Models for Integer-Valued Time Series. In Non-Linear Time Series (pp. 199–244). Springer International Publishing
HaSt93
van Harn, K., & Steutel, F. W.(1993) Stability equations for processes with stationary independent increments using branching processes and Poisson mixtures. Stochastic Processes and Their Applications, 45(2), 209–230. DOI.
HaSV82
van Harn, K., Steutel, F. W., & Vervaat, W. (1982) Self-decomposable discrete distributions and branching processes. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete, 61(1), 97–118. DOI.
WeWi90
Wei, C. Z., & Winnicki, J. (1990) Estimation of the Means in the Branching Process with Immigration. The Annals of Statistics, 18(4), 1757–1773. DOI.
Weiß08:
Weiß, C. H.(2008) Thinning operations for modeling time series of counts—a survey. AStA Advances in Statistical Analysis, 92(3), 319–341. DOI.
Weiß09:
Weiß, C. H.(2009) A New Class of Autoregressive Models for Time Series of Binomial Counts. Communications in Statistics - Theory and Methods, 38(4), 447–460. DOI.
Winn91
Winnicki, J. (1991) Estimation of the variances in the branching process with immigration. Probability Theory and Related Fields, 88(1), 77–106. DOI.
Zege88
Zeger, S. L.(1988) A regression model for time series of counts. Biometrika, 75(4), 621–629. DOI.
ZeQa88
Zeger, S. L., & Qaqish, B. (1988) Markov Regression Models for Time Series: A Quasi-Likelihood Approach. Biometrics, 44(4), 1019–1031. DOI.
ZhBa08
Zheng, H., & Basawa, I. V.(2008) First-order observation-driven integer-valued autoregressive processes. Statistics & Probability Letters, 78(1), 1–9. DOI.
ZhBD06
Zheng, H., Basawa, I. V., & Datta, S. (2006) Inference for pth-order random coefficient integer-valued autoregressive processes. Journal of Time Series Analysis, 27(3), 411–440. DOI.
ZhBD07
Zheng, H., Basawa, I. V., & Datta, S. (2007) First-order random coefficient integer-valued autoregressive processes. Journal of Statistical Planning and Inference, 137(1), 212–229. DOI.