You have data and/or predictions made up of nonnegative integers . What models can you fit to it?
I'm collecting some appropriate models for such data so that I can do regression which does not reduce to approximating them as Gaussian, or as Bernoulli, the extreme cases usually dealt with.
Also, is there a countbased formulation for nonnegative regression? Nonnegative matrix factorisations with appropriate loss function perhaps?
TODO:
 raid the document topic model literature for this. Surely they are implicitly count data? See string bags compare with Steyvers and Tenenbaum's semantic network model (StTe05).
 robust regression for all of these
All the distributions I discuss here have support unbounded above. Bounded distributions (e.g. vanilla Binomial) are for some other time. The exception is the Bernoulli RV, a.v. the biassed coin, which is simple enough to sneak in.
For the details of these models in a time series context, see
 count time series
 or, a special case, GaltonWatson processes
A lot of this material is probably in JoKK05.
Poisson
The Poisson is reminiscent of the Gaussian for count data, in terms of the number of places that it pops up, and the vast number of things that have a limiting Poisson distribution.
Conveniently, it has only one parameter, which means you don't need to justify how you chose any other parameters, saving valuable time and thought. It's useful as a “null model”, in that the number of particles in realisation of a point process without interaction will be Poissondistributed, conditional upon the mean measure. Conversely, nonPoisson residuals are evidence that your model has failed to take out some kind of interaction or hidden variable.
 Spelled

 Pmf

 Mean

 Variance

 Pgf
Negative Binomial
Nearly a century old! (GrYu20) A generic count data model which, unlike the Poisson, has both locationlike and scalelike parameters, instead of only one parameter. This makes one feel less dirty about a using a model more restrictive than standard linear regression, which sets the benchmark for being castigated for being too restrictive. Has a traditional rationale in terms of drawing balls from urns and such, which is of little interest here. the key point is that it is both flexible and uncontroversial.
It includes Geometric as special cases when , and the Gamma and Poisson as limiting cases. More precisely, the Poisson is a limiting case of the Polya distribution. for example, in the largek limit, it approximates the Gamma distribution, and, when the mean is held constant, in the large limit, it approaches Poisson. For fixed r it is an exponential family.
For all that, it's still contrived, this model, and tedious to fit.
 Spelled

 Pmf

 Mean

 Variance

 Pgf
Mean/dispersion parameterisation (Polya)
Commonly, where the parameter is not required to be a nonnegative integer, we call it a Polya model, and use it for overdispersed data i.e. data that looks like a Poisson process if we are drunk, but whose variance is too big in comparison to its mean for soberer minds.
To see how that works we will reparameterise the model in terms of a “location” parameter and “dispersion”/scaleish parameter , such that we can rewrite it.
 Spelled

 Pmf

 Mean

 Variance

 Pgf
The log Pmf is then
It will be apparent that all these loggamma differences will be numerically unstable, so we need to use different approximations for it depending in the combination of or values in play.
Aieee! Tedious!
I can't understand why anyone bothers using the negative binomial as a model unless they have strong reason to think it is a true model; the GPD is easier to fit, in that the log Pmf is numerically tractable for all parameter combinations even with large count values, and it's no less natural; IMO it usually has has more, IMO, plausible justifications than the Polya/NB model.
Geometric
A discrete analogue of the exponential, i.e. The probability distribution of the number X of Bernoulli trials before the first success, supported on the set
 Spelled

 Pmf

 Pgf

 Mean

 Variance
Note that .
Mean parameterisation
We can parameterise this in terms of the mean
 Spelled

 Pmf

 Pgf

 Mean

 Variance
Lagrangian distributions
Another clade of distribution where we work backwards from the pgf, although we generate the distribution from a function this time, or rather, two functions. This family includes various others on this page; I will check which some day. For now, let's get to the interesting new ones.
For it is more like a jungle of distributions, requiring a map to hack through it. There are various parameters estimation methods and properties, all applicable to different subfamilies. Sometimes the forms of the mass function are explicit and easy. Others, not so much.
See CoFa06 for the authoritative list, plus JoKK05 Ch7.2 for a brusquer version.
It's interesting to me because (CoFa06 Ch 6.2, CoSh88) the total cascade size of a subcritical branching process has a “delta Lagrangian” or “general Lagrangian” distribution, depending on whether the cluster has a deterministic or random starting population. We will define offspring distribution of such a branching process as EG:=\eta\lt 1).
Let's get specific.
PoissonPoisson Lagrangian
See Consul and Famoye (CoFa06, 9.3). Also known as the Generalised Poisson, although there are many things called that.
there are many possible interpretations for this; I will choose the interpretation in terms of cascade sizes of branching processes, in which case we have
 Poisson() initial distribution,
 Poisson() offspring distribution.
Then…
 Spelled

 Pmf

 Mean

 Variance
Notice that this can produce long tails, in the sense that it can have a large variance with finite mean, but not heavy tails, in the sense of the variance becoming infinite while retaining a finite mean. (Q: What nonnegative distribution has the quality of parameterised explosion of moments, apart from the inconvenient discretestable?)
Here, I implemented the PoissonPoisson GPD in python for you.
Basic Lagrangian distribution
Summarised in CoSh72.
One parameter: a differentiable (infinitely?) function, not necessarily a pgf, . Now we define a pgf implicitly by the smallest root of the Lagrange transformation . The paradigmatic example of such a function is ; let's check this fella out.
 Spelled

 ?
 Pmf

 ?
 Mean

 ?
 Variance
 ?
TBD
Delta Lagrangian distributions
TBD
General Lagrangian distributions
TBD
Discrete Stable
Another generalisation of Poisson, with untilrecently hip features such as a powerlaw tail.
By analogy with the continuousstable distribution, a “stable” family for count data.
In particular, this is stable in the sense that it is a limiting distribution for sums of count random variables, analogous to the continuous stable family for realvalued RVs.
No (convenient) closed form for the Pmf in the general case, but the Pgf is simple, so that's something.
 Spelled

 Pmf

 (which is simply the usual formula for extracting the Pmf from any Pgf.)
 Pgf

 Mean

 Variance
Here, is a scale parameter and a dispersion parameter describing in particular a powerlaw tail such that when ,
Question: the Pgf formulation implies this is a nonnegative distribution. Does that mean that symmetric discrete RVs cannot be stable? Possiblynegative ones?
Nola99 and Nola01 give some approximate MLestimators of the parameter. Lee10 does some interesting stuff:
This thesis considers the interplay between the continuous and discrete
properties of random stochastic processes.
It is shown that the special cases of the onesided Lévystable
distributions
can be connected to the class of discretestable distributions through a
doublystochastic Poisson transform.
This facilitates the creation of a onesided stable process for which the
Nfold statistics can be factorised explicitly. […]
Using the same Poisson transform interrelationship, an exact method for
generating discretestable variates is found.
It has already been shown that discretestable distributions occur in the
crossing statistics of continuous processes whose autocorrelation exhibits
fractal properties.
The statistical properties of a nonlinear filter analogue of a phasescreen
model are calculated, and the level crossings of the intensity analysed.
[…]
The asymptotic properties of the interevent density of the process are
found to be accurately approximated by a function of the Fano factor
and the mean of the crossings alone.
Zipf/Zeta models
The discrete version of the basic powerlaw models.
While we are here, the plainest explanation of the relation of Zips to Pareto distribution that I know is Lada Adamic's Zipf, Powerlaws, and Pareto  a ranking tutorial.
 Spelled

 Pmf

 Mean

 Variance
This has unbounded support. In the bounded case, it becomes the Zipf–Mandelbrot law, which is too fiddly for me to discuss here unless I turn out to really need it, which would likely be for ranking statistics.
YuleSimon
 Spelled

 Pmf

 Mean

 Variance
where B is the beta function.
Zipf law in the tail. See also the twoparameter version, which replaces the beta function with an incomplete beta function, giving Pmf
I'm bored with this one too.
ConwayMaxwellPoisson
Exponential family count model with free variance parameter. See CHSH16.
Decomposability properties
For background, see decomposability.
Stability
By analogy with the continuous case we may construct a stability equation for count RVs:
here is Steutel and van Harn's discrete multiplication operator, which I won't define here exhaustively because there are variously complex formulations of it, and I don't care enough to wrangle them. In the simplest case it gives us a binomial thinning of the left operand by the right
Selfdivisibility
Poisson RVs are selfdivisible, in the sense that
Polya RVs likewise are selfdivisible, if uglier under this parameterisation
So are GPDs, in .
Refs
 SoSA09: A. R. Soltani, A. Shirvani, F. Alqallaf (2009) A class of discrete distributions induced by stable laws. Statistics & Probability Letters, 79(14), 1608–1614. DOI
 CoJa73: P. C. Consul, G. C. Jain (1973) A Generalization of the Poisson Distribution. Technometrics, 15(4), 791–799. DOI
 SMKB05: Galit Shmueli, Thomas P. Minka, Joseph B. Kadane, Sharad Borle, Peter Boatwright (2005) A Useful Distribution for Fitting Discrete Data: Revival of the ConwayMaxwellPoisson Distribution. Journal of the Royal Statistical Society. Series C (Applied Statistics), 54(1), 127–142.
 GrYu20: Major Greenwood, G. Udny Yule (1920) An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attacks of Disease or of Repeated Accidents. Journal of the Royal Statistical Society, 83(2), 255–279. DOI
 SiMS94: Masaaki Sibuya, Norihiko Miyawaki, Ushio Sumita (1994) Aspects of Lagrangian Probability Distributions. Journal of Applied Probability, 31, 185–197. DOI
 SaPa05: Krishna Saha, Sudhir Paul (2005) Biascorrected maximum likelihood estimator of the negative binomial dispersion parameter. Biometrics, 61(1), 179–185. DOI
 Neym65: Jerzy Neyman (1965) Certain Chance Mechanisms Involving Discrete Distributions. Sankhyā: The Indian Journal of Statistics, Series A (19612002), 27(2/4), 249–258.
 Blak00: Helge Blaker (2000) Confidence Curves and Improved Exact Confidence Intervals for Discrete Distributions. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 28(4), 783–798. DOI
 LeHJ08: W. H. Lee, K. I. Hopcraft, E. Jakeman (2008) Continuous and discrete stable processes. Physical Review E, 77(1), 011109. DOI
 StHa79: F. W. Steutel, K. van Harn (1979) Discrete Analogues of SelfDecomposability and Stability. The Annals of Probability, 7(5), 893–899. DOI
 WeBK03: Michel Wedel, Ulf Böckenholt, Wagner A. Kamakura (2003) Factor models for multivariate count data. Journal of Multivariate Analysis, 87(2), 356–369. DOI
 Cons88: P. C. Consul (1988) Generalized Poisson Distributions. New York: CRC Press
 CoFa92: P. C. Consul, Felix Famoye (1992) Generalized poisson regression model. Communications in Statistics  Theory and Methods, 21(1), 89–109. DOI
 CoFa06: P. C. Consul, Felix Famoye (2006) Lagrangian probability distributions. Boston: Birkhäuser
 Muta95: Ljuben Mutafchiev (1995) Local limit approximations for Lagrangian distributions. Aequationes Mathematicae, 49(1), 57–85. DOI
 Nola01: John P. Nolan (2001) Maximum Likelihood Estimation and Diagnostics for Stable Distributions. In Lévy Processes (pp. 379–400). Birkhäuser Boston DOI
 CoSh84: P.C. Consul, M. M. Shoukri (1984) Maximum likelihood estimation for the generalized poisson distribution. Communications in Statistics  Theory and Methods, 13(12), 1533–1547. DOI
 Lloy07: James O. LloydSmith (2007) Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases. PLoS ONE, 2(2), e180. DOI
 CoFe89: P. C. Consul, Famoye Felix (1989) Minimum variance unbiased estimation for the lagrange power series distributions. Statistics, 20(3), 407–415. DOI
 ChSh16: Suneel Babu Chatla, Galit Shmueli (2016) Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM. ArXiv:1610.08244 [Stat].
 TuSB14: Kamil Feridun Turkman, Manuel González Scotto, Patrícia de Zea Bermudez (2014) Models for IntegerValued Time Series. In NonLinear Time Series (pp. 199–244). Springer International Publishing
 Jana84: K. Janardan (1984) Moments of Certain Series Distributions and Their Applications. SIAM Journal on Applied Mathematics, 44(4), 854–868. DOI
 Nola97: John P. Nolan (1997) Numerical calculation of stable densities and distribution functions. Communications in Statistics. Stochastic Models, 13(4), 759–774. DOI
 LiFL10: S Li, F Famoye, C Lee (2010) On the generalized Lagrangian probability distributions. Journal of Probability and Statistical Science, 8(1), 113–123.
 Tuen00: Hans J. H. Tuenter (2000) On the Generalized Poisson Distribution. Statistica Neerlandica, 54(3), 374–376. DOI
 CoSh75: P. C. Consul, L. R. Shenton (1975) On the Probabilistic Structure and Properties of Discrete Lagrangian Distributions. In A Modern Course on Statistical Distributions in Scientific Work (pp. 41–57). Springer Netherlands DOI
 Nola97: J.P. Nolan (1997) Parameter estimation and data analysis for stable distributions. (Vol. 1, pp. 443–447). IEEE Comput. Soc DOI
 CaXi15: Yang Cao, Yao Xie (2015) Poisson Matrix Recovery and Completion. ArXiv:1504.05229 [Cs, Math, Stat].
 ScWZ16: Aaron Schein, Hanna Wallach, Mingyuan Zhou (2016) PoissonGamma dynamical systems. In Advances In Neural Information Processing Systems (pp. 5006–5014).
 GoMY04: M. L. Goldstein, S. A. Morris, G. G. Yen (2004) Problems with fitting to the powerlaw distribution. The European Physical Journal B  Condensed Matter and Complex Systems, 41(2), 255–258. DOI
 Imot16: Tomoaki Imoto (2016) Properties of Lagrangian distributions. Communications in Statistics  Theory and Methods, 45(3), 712–721. DOI
 Tsou06: TsungShan Tsou (2006) Robust Poisson regression. Journal of Statistical Planning and Inference, 136(9), 3173–3186. DOI
 HaSV82: K. van Harn, F. W. Steutel, W. Vervaat (1982) Selfdecomposable discrete distributions and branching processes. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete, 61(1), 97–118. DOI
 ShCo87: M. M. Shoukri, P. C. Consul (1987) Some Chance Mechanisms Generating the Generalized Poisson Probability Models. In Biostatistics (pp. 259–268). Dordrecht: Springer Netherlands
 CoSh88: P.C. Consul, M.M. Shoukri (1988) Some Chance Mechanisms Related to a Generalized Poisson Probability Model. American Journal of Mathematical and Management Sciences, 8(1–2), 181–202. DOI
 CoSh73: P. C. Consul, L. R. Shenton (1973) Some interesting properties of Lagrangian distributions. Communications in Statistics, 2(3), 263–272. DOI
 Cox83: D. R. Cox (1983) Some remarks on overdispersion. Biometrika, 70(1), 269–274. DOI
 DoJL09: Louis G. Doray, Shu Mei Jiang, Andrew Luong (2009) Some Simple Method of Estimation for the Parameters of the Discrete Stable Distribution with the Probability Generating Function. Communications in Statistics  Simulation and Computation, 38(9), 2004–2017. DOI
 HaSt93: K. van Harn, F. W. Steutel (1993) Stability equations for processes with stationary independent increments using branching processes and Poisson mixtures. Stochastic Processes and Their Applications, 45(2), 209–230. DOI
 StTe05: Mark Steyvers, Joshua B. Tenenbaum (2005) The LargeScale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth. Cognitive Science, 29(1), 41–78. DOI
 JoKK05: Norman Lloyd Johnson, Adrienne W. Kemp, Samuel Kotz (2005) Univariate discrete distributions. Hoboken, N.J: Wiley
 CoSh72: P. Consul, L. Shenton (1972) Use of Lagrange Expansion for Generating Discrete Generalized Probability Distributions. SIAM Journal on Applied Mathematics, 23(2), 239–248. DOI