Dispersion tests in generalised linear mixed-effects models - a methods comparison and practical guide for ecologists

Melina de Souza Leite; Daniel Rettelbach; Florian Hartig

Dispersion tests in generalised linear mixed-effects models - a methods comparison and practical guide for ecologists

This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files

Authors

Melina de Souza Leite , Daniel Rettelbach, Florian Hartig

Abstract

1. Underdispersion and overdispersion are common issues when analysing ecological data with generalised linear (mixed) models (GLMs/GLMMs). Overdispersion, the phenomenon where observations spread wider than expected by the fitted model, usually leads to anti-conservative p-values and, thus, to inflated type I error. In contrast, underdispersion, a narrower spread of the data than expected, causes overly conservative p-values and, therefore, reduced power. A range of tests has been proposed to detect such dispersion problems, but there are few comparative studies of their performance across models and analysis settings.

2. The goal of this study is to identify a general dispersion test for GLMs/GLMMs that is applicable across all standard distributions and random-effects structures. Following an initial assessment of available tests, we selected two classes of dispersion tests as candidates: (1) parametric and nonparametric tests based on Pearson residuals and (2) simulation-based tests that compare the expected and observed variance in the response.

3. Comparing their performance by type I error, power, and dispersion estimate, across a range of GLMs and GLMMs, we found that the nonparametric Pearson residuals test performed best across all metrics, especially for data with low incidence or count rates and/or small samples; however, at the cost of high computational expense. The parametric Pearson residuals test, recommended in many books and guidelines, was fast and effective for GLMs, but biased towards underdispersion in GLMMs due to the naïve computation of the random-effect degrees of freedom. The simulation-based response variance test was slightly less powerful, but showed overall good calibration and was much faster to compute. The latter offers a compromise between the strengths and weaknesses of the two Pearson-based tests.

4. We conclude that for GLMs, the parametric Pearson residuals test offers the best balance of speed and accuracy. For GLMMs, we recommend either the computationally demanding nonparametric Pearson residuals test or the faster, although somewhat less powerful, simulation-based response variance test. We also provide additional recommendations for ecological data analysis to address dispersion issues using the most commonly used R packages, avoiding pitfalls and improving model fit and the interpretation of ecological datasets.

DOI

https://doi.org/10.32942/X23M14

Subjects

Applied Statistics, Ecology and Evolutionary Biology, Statistical Models

Keywords

overdispersion/underdispersion, multilevel/hierarchical models, hypothesis test, Pearson residuals, type I error, power, dispersion parameter, multilevel/hierarchical models, , hypothesis test, Pearson residuals, type I error, Power, dispersion parameter

Dates

Published: 2025-11-14 19:05

Last Updated: 2026-02-10 10:25

Older Versions

License

CC BY Attribution 4.0 International

Additional Metadata

Data and Code Availability Statement:
Code and simulations available at Zenodo: https://doi.org/10.5281/zenodo.17611061

Language:
English