Guidelines for the prediction of species interactions through binary classification

Timothée Poisot

Guidelines for the prediction of species interactions through binary classification

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1111/2041-210X.14071. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files

Supplementary material

Authors

Timothée Poisot

Abstract

1. The prediction of species interactions is gaining momentum as a way to circumvent limitations in data volume. Yet, ecological networks are challenging to predict because they are typically small and sparse. Dealing with extreme class imbalance is a challenge for most binary classifiers, and there are currently no guidelines as to how predictive models can be trained for this specific problem.
2. Using simple mathematical arguments and numerical experiments in which a variety of classifiers (for supervised learning) are trained on simulated networks, we develop a series of guidelines related to the choice of measures to use for model selection, and the degree of unbiasing to apply to the training dataset.
3. Neither classifier accuracy nor the ROC-AUC are informative measures for the performance of interaction prediction. PR-AUC is a fairer assessment of performance. In some cases, even standard measures can lead to selecting a more biased classifier because the effect of connectance is strong. The amount of correction to apply to the training dataset depends on network connectance, on the measure to be optimized, and only weakly on the classifier.
4. These results reveal that training machines to predict networks is a challenging task, and that in virtually all cases, the composition of the training set needs to be experimented on before performing the actual training. We discuss these consequences in the context of the low volume of data.

DOI

https://doi.org/10.32942/osf.io/aty7n

Subjects

Biodiversity, Life Sciences

Keywords

binary classification, ecological networks, machine learning

Dates

Published: 2022-01-11 06:27

Last Updated: 2022-06-11 23:21

Older Versions

Version 1 - 2022-01-11

License

CC-By Attribution-ShareAlike 4.0 International