This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.

Computer Vision Models Offer Scalable Species Detection From Social Media Photographs
Downloads
Authors
Abstract
Social media platforms have emerged as a promising source of data for biodiversity monitoring, due to the vast amounts of user-generated visual content. However, the unstructured and noisy nature of social media data poses challenges for accurate species identification. Foundation vision models present an innovative methodology for identifying a large diversity of species from photographs, however, they are yet to be robustly tested on messy social media data. This study explores the utility of foundation vision models in identifying species from social media images, focusing on charismatic species such as lions, cheetahs, and gorillas. We manually labeled a dataset of images from Flickr, taken in zoos across the United States, to establish a ground truth for species presence. We evaluated the performance of three models: (i) CLIP with binary prompts ("species name is present/species name is not present"), (ii) a categorical model with common object categories (e.g., “plant,” "building," "vehicle," and "expected species name"), and (iii) BioCLIP, a fine-tuned version of CLIP designed specifically for species identification. Our analysis revealed that the binary presence/absence model struggled with the noisy social media data, leading to low accuracy. The categorical model showed an improvement in true positive rates but continued to produce a large number of false positives. BioCLIP, while not achieving the highest accuracy, demonstrated superior performance in minimizing false positives, which is crucial for biodiversity monitoring where incorrect detections can have significant consequences. Precision-recall analysis using presence-only data indicates their potential in real-world applications where presence detection is prioritized. Our findings suggest that foundation vision
DOI
https://doi.org/10.32942/X21935
Subjects
Life Sciences
Keywords
Artificial Intelligence, social media, biodiversity
Dates
Published: 2025-04-22 20:33
Last Updated: 2025-04-22 20:33
Older Versions
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
None
Data and Code Availability Statement:
Open data/code are not available.
Language:
English
There are no comments or no comments have been made public for this article.