This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Bayesian adaptive design for citizen science data collection: Exploring tensions between data and design
Downloads
Supplementary Files
Authors
Abstract
1. Bayesian adaptive design can be applied in spatial settings where future survey locations need to be selected based on already available data. An important use case of adaptive design is the recommendation of locations for opportunistic, citizen science collection of species observation data, where some areas are already overrepresented and others are severely undersampled.
2. This work conducts an extensive comparative study of adaptive Bayesian design approaches, specifically for recommending survey locations for citizen science data collection. We explore compromises between design-based, model-based, and exchange-based approaches in order to better understand the statistical implications of using adaptive design for citizen science projects.
3. To evaluate the adaptive design approaches effectively, we work in a simulated environment, allowing us to compare the performance between fourteen different design strategies. For each method, we vary design size and the initial data conditions, and in a separate analysis show the effect of weighting the site recommendations by varying degrees of citizen scientist preference. Additionally, we conduct a sensitivity experiment on the approximation used to integrate out the uncertainty associated with the future data collection.
4. Our results highlight the tension between the data and the design methodology: We show that the model- and exchange-based approaches perform better when more prior data is available, particularly when estimation or prediction accuracy is preferred. In contrast, design-based methods are more stable when prior information is limited. However, it remains a challenge to balance all design priorities and data scenarios in a single approach. Additionally, we show that including citizen scientist preference in the utility function can impact the design in surprising ways. These outcomes are not uniform and depend on the method used to optimize the design, the metric of interest for optimization, as well as the amount of prior data and the desired size of the future survey. Taken together, our results show that effective survey design must be carefully matched to both the available baseline information and the study objectives.
DOI
https://doi.org/10.32942/X2F092
Subjects
Life Sciences, Physical Sciences and Mathematics
Keywords
bayesian adaptive design, adaptive sampling, spatial design, opportunistic data collection, citizen science, biodiversity monitoring, species distribution modelling
Dates
Published: 2026-05-27 23:17
Last Updated: 2026-05-27 23:17
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
None
Data and Code Availability Statement:
All data used in this work can be found in the anonymous figshare repository at https://figshare.com/s/61fe2afc676a2a725531. Upon publication of this article in a peer-reviewed journal, the figshare repository will be made public at DOI 10.6084/m9.figshare.31915401, along with a GitHub repository for public use of the code and data.
Language:
English
There are no comments or no comments have been made public for this article.