This is a Preprint and has not been peer reviewed. This is version 5 of this Preprint.
Dispersion tests in generalized linear mixed-effects models: a methods comparison and practical guide for ecologists
Downloads
Supplementary Files
Authors
Abstract
Underdispersion and overdispersion are common issues when analysing ecological data with generalized linear (mixed) models (GLMs/GLMMs). Overdispersion, the phenomenon where observations spread wider than expected by the fitted model, usually leads to anti-conservative p-values and, thus, to inflated type I error. In contrast, underdispersion, a narrower spread of the data than expected, causes overly conservative p-values and, therefore, reduced power. A range of tests has been proposed to detect such dispersion problems, but there are few comparative studies of their performance across models and analysis settings, and, most importantly, sparse recommendations for ecologists on how to check for dispersion issues. Our goal was to identify a general dispersion test for GLMs/GLMMs applicable to standard distributions and random-effects structures commonly used in ecological data analysis. Following an initial review of available tests, we selected two classes of dispersion tests: (1) parametric and nonparametric tests based on Pearson residuals and (2) simulation-based tests that compare the expected and observed residual variance. Comparing their performance by type I error, power, and dispersion estimate, across a range of Poisson and binomial GLMs/GLMMs, we found that the nonparametric Pearson residuals test performed best across all metrics, particularly for data with low incidence or count rates and/or small samples; however, at the cost of high computational expense. The parametric Pearson residuals test, recommended in many books and guidelines, was fast and effective for GLMs, but biased towards underdispersion in GLMMs due to the naïve computation of the random-effect degrees of freedom. The simulation-based residual variance test was slightly less powerful, but showed overall good calibration. The latter offers a compromise between the strengths and weaknesses of the two Pearson-based tests. We conclude that for GLMs, the parametric Pearson residuals test offers the best balance of speed and accuracy. For GLMMs, we recommend either the computationally demanding nonparametric Pearson residuals test or the faster, although somewhat less powerful, simulation-based residual variance test. We also analyze two case studies in ecology that differ in complexity and include recommendations for ecological data analysis to address dispersion issues, using the most commonly used R packages, avoiding pitfalls, and improving model fit and the interpretation of ecological datasets.
DOI
https://doi.org/10.32942/X23M14
Subjects
Applied Statistics, Ecology and Evolutionary Biology, Statistical Models
Keywords
overdispersion/underdispersion, multilevel/hierarchical models, hypothesis test, Pearson residuals, type I error, power, dispersion parameter, multilevel/hierarchical models, , hypothesis test, Pearson residuals, type I error, Power, dispersion parameter
Dates
Published: 2025-11-14 18:05
Last Updated: 2026-03-30 11:37
Older Versions
License
CC BY Attribution 4.0 International
Additional Metadata
Data and Code Availability Statement:
Code and simulations available at Zenodo: https://doi.org/10.5281/zenodo.17611061
Language:
English
There are no comments or no comments have been made public for this article.