This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.
Dispersion tests in generalised linear mixed-effects models - a methods comparison and practical guide for ecologists
Downloads
Supplementary Files
Authors
Abstract
1. Underdispersion and overdispersion are common issues when analysing ecological data with generalised linear (mixed) models (GLMs/GLMMs). Overdispersion, the phenomenon where observations spread wider than expected by the fitted model, usually leads to anti-conservative p-values and, thus, to inflated type I error. In contrast, underdispersion, a narrower spread of the data than expected, causes overly conservative p-values and, therefore, reduced power. A range of tests has been proposed to detect such dispersion problems, but there are few comparative studies of their performance across models and analysis settings.
2. The goal of this study is to identify a general dispersion test for GLMs/GLMMs that is applicable across all standard distributions and random-effects structures. Following an initial assessment of available tests, we selected two classes of dispersion tests as candidates: (1) parametric and nonparametric tests based on Pearson residuals and (2) simulation-based tests that compare the expected and observed variance in the response.
3. Comparing their performance by type I error, power, and dispersion estimate, across a range of GLMs and GLMMs, we found that the nonparametric Pearson residuals test performed best across all metrics, especially for data with low incidence or count rates and/or small samples; however, at the cost of high computational expense. The parametric Pearson residuals test, recommended in many books and guidelines, was fast and effective for GLMs, but biased towards underdispersion in GLMMs due to the naïve computation of the random-effect degrees of freedom. The simulation-based response variance test was slightly less powerful, but showed overall good calibration and was much faster to compute. The latter offers a compromise between the strengths and weaknesses of the two Pearson-based tests.
4. We conclude that for GLMs, the parametric Pearson residuals test offers the best balance of speed and accuracy. For GLMMs, we recommend either the computationally demanding nonparametric Pearson residuals test or the faster, although somewhat less powerful, simulation-based response variance test. We also provide additional recommendations for ecological data analysis to address dispersion issues using the most commonly used R packages, avoiding pitfalls and improving model fit and the interpretation of ecological datasets.
DOI
https://doi.org/10.32942/X23M14
Subjects
Applied Statistics, Ecology and Evolutionary Biology, Statistical Models
Keywords
overdispersion/underdispersion, multilevel/hierarchical models, hypothesis test, Pearson residuals, type I error, power, dispersion parameter, multilevel/hierarchical models, , hypothesis test, Pearson residuals, type I error, Power, dispersion parameter
Dates
Published: 2025-11-14 19:05
Last Updated: 2026-02-10 10:25
Older Versions
License
CC BY Attribution 4.0 International
Additional Metadata
Data and Code Availability Statement:
Code and simulations available at Zenodo: https://doi.org/10.5281/zenodo.17611061
Language:
English
There are no comments or no comments have been made public for this article.