Skip to main content
High Data Quality Enhances Microplastic Toxicity Prediction

High Data Quality Enhances Microplastic Toxicity Prediction

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Ana L. Antonio Vital, Scott Coffin, Andrea Bonisoli-Alquati, Maaike Vercauteren, Luan de Souza Leite, Maximilian Pichler, Magdalena M. Mair 

Abstract

Unlike chemicals, microplastics (MPs) lack standardized identifiers, limiting the applicability of traditional predictive ecotoxicology methods such as quantitative structure-activity relationship (QSAR) models. This study aimed to predict MP toxicity using MP properties, MP concentration, organismal traits, endpoints, and experimental design, and to evaluate how data pre-processing, dataset size, and quality influence model performance. We applied the Boosted Regression Tree (BRT) machine learning algorithm to four datasets derived from the Toxicity of Microplastics Explorer database (ToMEx 2.0): (i) imputed missing values, (ii) complete-case (missing values removed), (iii) high-quality data, and (iv) low-quality data. The high-quality dataset yielded the best final predictions for both random cross-validation (AUC = 0.93) and blocked cross-validation by particle identifier (AUC = 0.87). Explainable artificial intelligence (xAI) analyses showed that predictive performance was primarily determined by endpoints and concentration, with MP properties contributing despite limited reporting. Our findings demonstrate the feasibility of machine learning to predict and identify key drivers of MP toxicity, highlighting that high-quality data improves predictive performance while reducing data mining and computational costs. Standardized experiments, detailed MP characterization, and high reporting standards would better support risk assessment frameworks and inform the design of safer materials.

DOI

https://doi.org/10.32942/X2C96D

Subjects

Life Sciences

Keywords

ecotoxicology, explainable artificial intelligence, predictive modeling, microplastic properties, risk assessment

Dates

Published: 2026-03-23 04:06

Last Updated: 2026-03-23 04:06

Older Versions

License

CC-By Attribution-ShareAlike 4.0 International

Additional Metadata

Conflict of interest statement:
The authors declare that they have no conflicts of interest.

Data and Code Availability Statement:
All data and code will be made publicly available upon publication.

Language:
English