# The interpretation of densities as intensities and vice versa

## Densities

Consider the problem of estimating the common density \(f(x\theta)=dF(x)\) density of indexed i.i.d. random variables \(\{X_i\}_{i\leq n}\in \mathbb{R}^d\) from \(n\) realisations of those variables, \(\{x_i\}_{i\leq n}\) and \(F:\mathbb{R}^d\rightarrow[0,1].\) We assume the state is absolutely continuous with respect to the Lebesgue measure, i.e. \(\mu(A)=0\Rightarrow P(X_i\in A)=0\). Amongst other things, this implies that \(P(X_i)=P(X_j)=0\text{ for }i\neq j\) and that the density exists as a standard function (i.e. we do not need to consider generalised functions such as distributions to handle atoms in \(F\) etc.)

Here we will give the density a finite parameter vector \(\theta\), i.e. \(f(x;\theta)=dF(x;\theta)\), whose value completely characterises the density; the problem of estimating the density is then the same as the one of estimating \(\theta.\)

In the method of maximum likelihood estimation we seek to maximise the value of the empirical likelihood of the data. That is, we choose a parameter estimate \(\hat{\theta}\) to satisfy

\[ \begin{align*} \hat{\theta} &:=\operatorname{argmax}_\theta\prod_i f(x_i;\theta)\\\\ &=\operatorname{argmax}_\theta\sum_i \log f(x_i;\theta) \end{align*} \]

Let’s consider the case where we try to estimate this function by constructing it from some basis of \(p\( functions\)*j: ^d({filename}functional_data.md) get me here? (see Ch21 of Ramsay and Silverman) 2. Can I use point process estimation theory to improve density estimation? After all, normal point-process estimation claims to be an un-normalised vesion of density estimation. Lies11* draws some parallels there, esp. with mixture models.

# Ref

- Reyn03: (2003) Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities.
*Probability Theory and Related Fields*, 126(1). DOI - Nore10: (2010) Approximation of conditional densities by smooth mixtures of regressions.
*The Annals of Statistics*, 38(3), 1733–1766. DOI - BaSh91: (1991) Approximation of Density Functions by Sequences of Exponential Families.
*The Annals of Statistics*, 19(3), 1347–1369. DOI - Efro07: (2007) Conditional density estimation in a regression setting.
*The Annals of Statistics*, 35(6), 2504–2535. DOI - STSK10: (2010) Conditional density estimation via least-squares density ratio estimation. In International Conference on Artificial Intelligence and Statistics (pp. 781–788).
- Scho05: (2005) Consistent parametric estimation of the intensity of a spatial–temporal point process.
*Journal of Statistical Planning and Inference*, 128(1), 79–93. DOI - HDYD12: (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.
*IEEE Signal Processing Magazine*, 29(6), 82–97. DOI - Elli91: (1991) Density estimation for point processes.
*Stochastic Processes and Their Applications*, 39(2), 345–358. DOI - Cast03: (2003) Density estimation via exponential model selection.
*IEEE Transactions on Information Theory*, 49(8), 2052–2060. DOI - BeDi89: (1989) Estimating Weighted Integrals of the Second-Order Intensity of a Spatial Point Process.
*Journal of the Royal Statistical Society. Series B (Methodological)*, 51(1), 81–92. - CuSS08: (2008) Fast Gaussian process methods for point process intensity estimation. (pp. 192–199). ACM Press DOI
- EiMa96: (1996) Flexible smoothing with B-splines and penalties.
*Statistical Science*, 11(2), 89–121. DOI - LeBa06: (2006) Information Theory and Mixing Least-Squares Regressions.
*IEEE Transactions on Information Theory*, 52(8), 3396–3410. DOI - TTSN15: (2015) Integrating Gaussian mixtures into deep neural networks: softmax layer with hidden variables. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4285–4289). IEEE
- ShSa06a: (2006a) Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 1, pp. I–I). DOI
- ShSa06b: (2006b) Large margin hidden Markov models for automatic speech recognition. In Advances in neural information processing systems (pp. 1249–1256).
- AnRi79: (1979) Logistic Discrimination and Bias Correction in Maximum Likelihood Estimation.
*Technometrics*, 21(1), 71–78. DOI - SaLe01: (2001) Multiplicative Updates for Classification by Mixture Models. In Advances in Neural Information Processing Systems (pp. 897–904).
- WiNo07: (2007) Multiscale Poisson Intensity and Density Estimation.
*IEEE Transactions on Information Theory*, 53(9), 3171–3187. DOI - BrCZ10: (2010) Nonparametric regression in exponential families.
*The Annals of Statistics*, 38(4), 2005–2046. DOI - Lies11: (2011) On Estimation of the Intensity Function of a Point Process.
*Methodology and Computing in Applied Probability*, 14(3), 567–578. DOI - Efro96: (1996) On nonparametric regression for IID observations in a general setting.
*The Annals of Statistics*, 24(3), 1126–1144. DOI - HeSN07: (2007) On the Equivalence of Gaussian HMM and Gaussian HMM-Like Hidden Conditional Random Fields. In Eighth Annual Conference of the International Speech Communication Association.
- Cox65: (1965) On the Estimation of the Intensity Function of a Stationary Point Process.
*Journal of the Royal Statistical Society. Series B (Methodological)*, 27(2), 332–337. - Ande75: (1975) Quadratic logistic discrimination.
*Biometrika*, 62(1), 149–154. DOI - PaZe16: (2016) Separation of Amplitude and Phase Variation in Point Processes.
*The Annals of Statistics*, 44(2), 771–812. DOI - GiKM08: (2008) Simulating point processes by intensity projection. In Simulation Conference, 2008. WSC 2008. Winter (pp. 560–568). DOI
- Gu93: (1993) Smoothing Spline Density Estimation: A Dimensionless Automatic Algorithm.
*Journal of the American Statistical Association*, 88(422), 495–504. DOI - Papa74: (1974) The conditional intensity of general point processes and an application to line processes.
*Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete*, 28(3), 207–226. DOI