This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
A Niche in the Machine: The Promise of AI Foundation Models for Species Distribution Modeling
Downloads
Authors
Abstract
Species distribution models (SDM) are fundamental tools for conservation, yet methodological progress has stalled. Despite two decades of refinement, traditional approaches – MaxEnt, boosted regression trees, random forests – have approached a performance ceiling, and deep learning has failed to break through on species distribution data. TabPFN, a foundation model that learns to perform Bayesian inference through pretraining on millions of synthetic classification tasks, represents a different paradigm for deep learning: rather than training on each dataset, it applies learned inference patterns to new problems in a single forward pass: it performs ‘in-context learning’. We ask whether this new paradigm of learning algorithms can achieve good performance on SDM, despite strong differences in typical SDM datasets relative to TabPFN’s training data.
We evaluated TabPFN against established SDM methods across 226 species in six geographic regions using a standardized SDM benchmark. To address the mismatch between TabPFN's pretraining context and the structure of presence-background data, we developed two adaptations: ensemble class balancing, which partitions pseudo-absences across ensemble members while retaining all presence records in each; and domain-specific finetuning through a two-step training process on SDM tasks.
Finetuned TabPFN achieved the highest discrimination on this benchmark, with mean ROC-AUC of 0.762, exceeding MaxNet (0.732), Random Forest (0.727), BRT (0.724), and GAM (0.717) – a 4.1% relative improvement. On held-out species not seen during finetuning, performance was essentially identical (0.763), confirming generalization rather than memorization. Under spatially-separated evaluation, finetuned TabPFN maintained its advantage over all traditional methods (0.699 mean ROC-AUC vs. 0.656-0.683 for traditional models). Using Miller’s calibration slope – a ratio calibration measure appropriate for presence-only models – finetuned TabPFN achieved strong probability calibration (slope 1.110), comparable to the best traditional methods.
These results demonstrate that foundation models, when appropriately adapted, can exceed traditional SDM methods. The combination of strong discrimination, sub-second inference, and substantial possibilities for extension, positions TabPFN as a strong alternative for presence-only SDM modeling.
DOI
https://doi.org/10.32942/X2VQ10
Subjects
Ecology and Evolutionary Biology, Life Sciences
Keywords
species distribution modeling, computational ecology, foundation model, machine learning, prior-data fitted networks, TabPFN, AI, transfer learning, presence-background data, benchmarking
Dates
Published: 2026-02-24 08:18
License
CC BY Attribution 4.0 International
Additional Metadata
Data and Code Availability Statement:
Code associated with this preprint can be found at: https://github.com/rdinnager/TabPFN-SDM. All data used in the preprint is publicly availble.
Language:
English
There are no comments or no comments have been made public for this article.