This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1111/2041-210X.14071. This is version 2 of this Preprint.
Downloads
Supplementary Files
Authors
Abstract
1. The prediction of species interactions is gaining momentum as a way to circumvent limitations in data volume. Yet, ecological networks are challenging to predict because they are typically small and sparse. Dealing with extreme class imbalance is a challenge for most binary classifiers, and there are currently no guidelines as to how predictive models can be trained for this specific problem.
2. Using simple mathematical arguments and numerical experiments in which a variety of classifiers (for supervised learning) are trained on simulated networks, we develop a series of guidelines related to the choice of measures to use for model selection, and the degree of unbiasing to apply to the training dataset.
3. Neither classifier accuracy nor the ROC-AUC are informative measures for the performance of interaction prediction. PR-AUC is a fairer assessment of performance. In some cases, even standard measures can lead to selecting a more biased classifier because the effect of connectance is strong. The amount of correction to apply to the training dataset depends on network connectance, on the measure to be optimized, and only weakly on the classifier.
4. These results reveal that training machines to predict networks is a challenging task, and that in virtually all cases, the composition of the training set needs to be experimented on before performing the actual training. We discuss these consequences in the context of the low volume of data.
DOI
https://doi.org/10.32942/osf.io/aty7n
Subjects
Biodiversity, Life Sciences
Keywords
binary classification, ecological networks, machine learning
Dates
Published: 2022-01-11 04:27
Last Updated: 2022-06-11 22:21
There are no comments or no comments have been made public for this article.