Slow improvement to the archiving quality of open datasets shared by researchers in ecology and evolution

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: This is version 2 of this Preprint.


Download Preprint

Supplementary Files

Dominique Roche, Ilias Berberi, Fares Dhane, Félix lauzon, Sandrine Soeharjono, Roslyn Dakin, Sandra Binning


Many leading journals in evolution and ecology now mandate open data upon publication. Yet, there is very little oversight to ensure the completeness and reusability of archived datasets, and we currently have a poor understanding of the factors associated with high quality (FAIR) data-sharing. We assessed 362 open datasets linked to first- or senior-authored papers published by 100 principal investigators (PIs) in the fields of evolution and ecology over a period of seven years to identify predictors of data completeness and reusability (‘data archiving quality’). Datasets scored low on these metrics: 56.4% were complete and 45.9% were reusable. Data reusability, but not completeness, was slightly higher for more recently archived datasets and PIs with less seniority. Journal open data policy, PI gender, and PI corresponding author status were unrelated to data archiving quality. However, PI identity explained a large proportion of the variance in data completeness (27.8%) and reusability (22.0%), indicating consistent inter-individual differences in data sharing practices by PIs across time and contexts. Several PIs consistently shared data of either high or low archiving quality, but most PIs were inconsistent in how well they shared. One explanation for the high intra-individual variation we observed is that PIs often conduct research through students and post-docs, who may be responsible for the data collection, curation and archiving. Levels of data literacy vary among trainees and PIs may not regularly perform quality control over archived files. Our findings suggests that research data management training and culture within a PI’s group are likely to be more important determinants of data archiving quality than other factors such as a journal’s open data policy. Greater incentives and training for individual researchers at all career stages could improve data sharing practices and enhance data transparency and reusability.



Biology, Ecology and Evolutionary Biology, Life Sciences, Other Ecology and Evolutionary Biology


data sharing, FAIR data, metascience, open science, public data archiving, reproducibility


Published: 2021-05-18 21:20

Last Updated: 2022-04-29 16:14

Older Versions

CC-By Attribution-ShareAlike 4.0 International

Additional Metadata

Conflict of interest statement:
This study was funded by the Natural Sciences and Engineering Research Council of Canada (grant no. UIF-537860–2018). DGR was supported by the European Union’s Horizon 2020 research and innovation program under Marie Skłodowska-Curie grant agreement no. 838237-OPTIMISE. DGR is an ambassador for the Centre for Open Science and the data repository Figshare; he sits on the Policy Committee for Research Data Canada, the Open Science Working Group for Canadian Science Publishing, the Data Rescue Committee for the NSERC-CREATE Living Data Project, and the Canada National Committee for CODATA (CODATA = Committee on Data for the International Science Council). DGR is an expert reviewer for the Open Science Program of Swissuniversities. He serves on the Society for Open, Reproducible and Transparent Ecology and Evolutionary biology’s Executive Committee and is the chair of the Society’s fundraising and overlay journal committees.

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.