This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Ten simple rules to follow when cleaning occurrence data in palaeobiology
Downloads
Supplementary Files
Authors
Abstract
Large datasets of fossil occurrences, often downloaded from online community-maintained databases, are a vital resource for understanding broad-scale evolutionary patterns, such as how biodiversity has changed through time and space. Such datasets, however, are not infallible and must be ‘cleaned’ of inaccurate, incomplete, or duplicate data prior to analysis. Researchers must decide upon the extent, feasibility, and value of data cleaning steps to perform, but while guides are available for working with neontological occurrences, there is currently no clear procedure for palaeobiological data despite its unique attributes. Here, we outline ten rules that aim to aid the process of cleaning fossil occurrence data for downstream analysis. These rules cover the major steps involved in processing data prior to analysis, including project setup, data exploration and cleaning, and finalising and reporting work. We provide accompanying examples and a vignette covering the entire data cleaning process to demonstrate the application of each rule. We believe that these rules will serve as a useful guideline to support data cleaning and foster new standards for the palaeobiological community.
DOI
https://doi.org/10.32942/X2FS8M
Subjects
Paleobiology
Keywords
palaeontology, fossils, biodiversity, reproducibility, data cleaning
Dates
Published: 2025-03-21 16:30
Last Updated: 2025-03-21 16:30
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
We declare we have no competing interests.
Data and Code Availability Statement:
The data and code generated for this article have been included within a dedicated GitHub repository: https://github.com/palaeoverse/ten-rules. In addition, they have been uploaded to a Zenodo repository through integrated version control: https://doi.org/10.5281/zenodo.14938533.
Language:
English
There are no comments or no comments have been made public for this article.