Skip to main content
Bayesian adaptive design for citizen science data collection: Exploring tensions between data and design

Bayesian adaptive design for citizen science data collection: Exploring tensions between data and design

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files

Authors

Max Savery , Stjin Luca

Abstract

1. Bayesian adaptive design can be applied in spatial settings where future survey locations need to be selected based on already available data. An important use case of adaptive design is the recommendation of locations for opportunistic, citizen science collection of species observation data, where some areas are already overrepresented and others are severely undersampled.

2. This work conducts an extensive comparative study of adaptive Bayesian design approaches, specifically for recommending survey locations for citizen science data collection. We explore compromises between design-based, model-based, and exchange-based approaches in order to better understand the statistical implications of using adaptive design for citizen science projects.

3. To evaluate the adaptive design approaches effectively, we work in a simulated environment, allowing us to compare the performance between fourteen different design strategies. For each method, we vary design size and the initial data conditions, and in a separate analysis show the effect of weighting the site recommendations by varying degrees of citizen scientist preference. Additionally, we conduct a sensitivity experiment on the approximation used to integrate out the uncertainty associated with the future data collection.

4. Our results highlight the tension between the data and the design methodology: We show that the model- and exchange-based approaches perform better when more prior data is available, particularly when estimation or prediction accuracy is preferred. In contrast, design-based methods are more stable when prior information is limited. However, it remains a challenge to balance all design priorities and data scenarios in a single approach. Additionally, we show that including citizen scientist preference in the utility function can impact the design in surprising ways. These outcomes are not uniform and depend on the method used to optimize the design, the metric of interest for optimization, as well as the amount of prior data and the desired size of the future survey. Taken together, our results show that effective survey design must be carefully matched to both the available baseline information and the study objectives.

DOI

https://doi.org/10.32942/X2F092

Subjects

Life Sciences, Physical Sciences and Mathematics

Keywords

bayesian adaptive design, adaptive sampling, spatial design, opportunistic data collection, citizen science, biodiversity monitoring, species distribution modelling

Dates

Published: 2026-05-27 23:17

Last Updated: 2026-05-27 23:17

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None

Data and Code Availability Statement:
All data used in this work can be found in the anonymous figshare repository at https://figshare.com/s/61fe2afc676a2a725531. Upon publication of this article in a peer-reviewed journal, the figshare repository will be made public at DOI 10.6084/m9.figshare.31915401, along with a GitHub repository for public use of the code and data.

Language:
English