Skip to main content
Dispersion tests in generalised linear mixed-effects models - a methods comparison and practical guide

Dispersion tests in generalised linear mixed-effects models - a methods comparison and practical guide

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files

Authors

Melina de Souza Leite , Daniel Rettelbach, Florian Hartig

Abstract

1. Underdispersion and overdispersion are common issues when analysing ecological data with generalised linear (mixed) models (GLMs/GLMMs). Overdispersion, the phenomenon where observations spread wider than expected by the fitted model, leads to anti-conservative p-values and, thus, to inflated type I error. In contrast, underdispersion, a narrower spread of the data than expected, causes overly conservative p-values and, therefore, a reduction in power. A range of tests has been suggested to detect such dispersion problems, but there are few comparative studies of their performance across a range of models and analysis situations.
2. The goal of this study is to identify a general dispersion test for GLMs/GLMMs that is applicable across all standard distributions and random-effects structures. After an initial assessment of available tests, we selected two classes of dispersion tests as candidates: (1) parametric and nonparametric tests based on Pearson residuals and (2) simulation-based tests that compare the expected to the observed variance in the response.
3. Comparing their performance by type I error, power, and dispersion estimate, across a range of GLMs and GLMMs, we find that a nonparametric Pearson residuals test performed best across all metrics, especially for data with low incidence or count rates and/or sample sizes; however, at the cost of high computational expenses. The parametric Pearson residuals test, which is recommended in many books and guidelines, is faster and performs excellently for GLMs, but can be seriously biased towards underdispersion for GLMMs. We show that the reason for this bias, which increases with the number of random effect clusters/groups, lies in the naïve computations of the degrees of freedom for the random effects. The simulation-based response variance test is slightly less powerful than the nonparametric Pearson test, but it showed overall good calibration and is much faster to compute. It offers a compromise between the strengths and weaknesses of the two Pearson-based tests.
4. We conclude that for GLMs, the parametric Pearson residuals test offers the best combination of speed and accuracy. For GLMMs, we recommend either the computationally demanding non-parametric Pearson residuals test or the faster, although somewhat less powerful, simulation-based response variance test.

DOI

https://doi.org/10.32942/X23M14

Subjects

Applied Statistics, Ecology and Evolutionary Biology, Statistical Models

Keywords

overdispersion/underdispersion, multilevel/hierarchical models, hypothesis test, Pearson residuals, type I error, power, dispersion parameter, multilevel/hierarchical models, , hypothesis test, Pearson residuals, type I error, Power, dispersion parameter

Dates

Published: 2025-11-14 13:05

Last Updated: 2025-11-14 13:05

License

CC BY Attribution 4.0 International

Additional Metadata

Language:
English

Data and Code Availability Statement:
Code and simulations available at Zenodo: https://doi.org/10.5281/zenodo.17611061