The Living Thing / Notebooks : Hierarchical models

The classic set up: Your process generates observations. Theses are corrupted by noise. You would like to work out the true parameter values of the process despite the noise.

Hierarchical set up: You are observing many processes with different parameters. You don’t see these; rather, another layer of processes lies between you and these underlying processes, perturbing the output in some way. Repeat the last line to taste. These processes in turn generate observations. The observations might be corrupted by noise. You would like to know the distribution for the parameters of the process across the whole population, and possibly of the noise too.

Known as mixed effects models, hierarchical models, nested models (careful! many definitions to that term), not necessarily ecology per se). random coefficient models… and directed graphical models.

Directed graphical models are formally the same thing, but retold with different emphasis for reasons of tradition. When you mention the graphical models, frequently the emphasis is on the independence graph itself; When you mention hierarchical models it seems to be assumed that you wish to estimate parameters, or sample from posteriors, or what-have-you.

In certain cute cases (i.e. linear) these can be inferred by deconvolution. See ANOVA for an important special case. More generally, frequentists sometimes find it convenient to use hierarchical generalised linear models to get tractable parameter estimates for such models. If your structural equations are very much not linear, things can get tedious, and special-cased, e.g. one might do indirect inference.

Bayesians don’t generally do anything special in that vein; this is bread-and-butter Bayes stuff.

In the case that you have many layers of hidden variables and don’t expect any of them to correspond to a “real” state so much as simply to approximate the unknown function better, you just discovered a deep neural network, possibly even a probabilistic neural network. Ranz13 (for example) does explicitly.


  • The Best Of Both Worlds: Hierarchical Linear Regression in PyMC3

  • Why hierarchical models are awesome, tricky, and Bayesian:

    […]I want to take the opportunity to make another point that is not directly related to hierarchical models but can be demonstrated quite well here. Usually when talking about the perils of Bayesian statistics we talk about priors, uncertainty, and flexibility when coding models using Probabilistic Programming. However, an even more important property is rarely mentioned because it is much harder to communicate. @rosstaylor touched on this point in his tweet

    It’s interesting that many summarize Bayes as being about priors; but real power is its focus on integrals/expectations over maxima/modes

    Michael Betancourt makes a similar point when he says “Expectations are the only thing that make sense.”

    But what’s wrong with maxima/modes? Aren’t those really close to the posterior mean (i.e. the expectation)? Unfortunately, that’s only the case for the simple models we teach to build up intuitions. In complex models, like the hierarchical one, the MAP can be far away and not be interesting or meaningful at all.[…]

    This strong divergence of the MAP and the Posterior Mean does not only happen in hierarchical models but also in high dimensional ones, where our intuitions from low-dimensional spaces gets twisted in serious ways. …]

    […] Final disclaimer: This might provide the impression that this is a property of being in a Bayesian framework, which is not true. Technically, we can talk about Expectations vs Modes irrespective of that. Bayesian statistics just happens to provide a very intuitive and flexible framework for expressing and estimating these models.


Various systems implement specialist fitting for general hierarchical models, notably everything bayesian that’s worthwhile using - e.g. edward, pymc3, stan etc.

For special cases such as generalized linear, jointly gaussian, or jointly Bernoulli variables etc, see your favourite statistical language’s standard library.


Blackwell, M., Honaker, J., & King, G. (2015a) A Unified Approach to Measurement Error and Missing Data Details and Extensions. Sociological Methods & Research, 0049124115589052. DOI.
Blackwell, M., Honaker, J., & King, G. (2015b) A Unified Approach to Measurement Error and Missing Data Overview and Applications. Sociological Methods & Research, 1, 0049124115585360. DOI.
Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J.-S. S.(2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127–135. DOI.
Breslow, N. E., & Clayton, D. G.(1993) Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association, 88(421), 9–25. DOI.
Efron, B. (2009) Empirical Bayes Estimates for Large-Scale Prediction Problems. Journal of the American Statistical Association, 104(487), 1015–1028. DOI.
Gelman, A. (2006) Multilevel (Hierarchical) Modeling: What It Can and Cannot Do. Technometrics, 48(3), 432–435. DOI.
Gelman, A., Lee, D., & Guo, J. (2015) Stan: a probabilistic programming language for Bayesian inference and optimization. Journal of Educational and Behavioral Statistics, 40(5), 1076998615606113. DOI.
Hansen, C. B.(2007) Generalized least squares inference in panel and multilevel models with serial correlation and fixed effects. Journal of Econometrics, 140(2), 670–694. DOI.
Lee, Y., & Nelder, J. A.(2001) Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika, 88(4), 987–1006. DOI.
Lee, Y., & Nelder, J. A.(2006) Double hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society: Series C (Applied Statistics), 55(2), 139–185. DOI.
Li, Y., & Mykland, P. A.(2007) Are volatility estimators robust with respect to modeling assumptions?. Bernoulli, 13(3), 601–622. DOI.
Mallet, A. (1986) A maximum likelihood estimation method for random coefficient regression models. Biometrika, 73(3), 645–656. DOI.
Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35(9), 2206–2222. DOI.
Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism. arXiv:1610.09787 [Cs, Stat].
Valpine, P. de. (2011) Frequentist analysis of hierarchical models for population dynamics and demographic data. Journal of Ornithology, 152(2), 393–408. DOI.
Venables, W. N., & Dichmont, C. M.(2004) GLMs, GAMs and GLMMs: an overview of theory for applications in fisheries research. Fisheries Research, 70(2–3), 319–337. DOI.