">
Skip to main content
Computer Vision Models Offer Scalable Species Detection From Social Media Photographs

Computer Vision Models Offer Scalable Species Detection From Social Media Photographs

This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Nathan Fox, Summer Mengarelli, Sabina Tomkins, Derek Van Berkel

Abstract

Social media platforms have emerged as a promising source of data for biodiversity monitoring, due to the vast amounts of user-generated visual content. However, the unstructured and noisy nature of social media data poses challenges for accurate species identification. Foundation vision models present an innovative methodology for identifying a large diversity of species from photographs, however, they are yet to be robustly tested on messy social media data. This study explores the utility of foundation vision models in identifying species from social media images, focusing on charismatic species such as lions, cheetahs, and gorillas. We manually labeled a dataset of images from Flickr, taken in zoos across the United States, to establish a ground truth for species presence. We evaluated the performance of three models: (i) CLIP with binary prompts ("species name is present/species name is not present"), (ii) a categorical model with common object categories (e.g., “plant,” "building," "vehicle," and "expected species name"), and (iii) BioCLIP, a fine-tuned version of CLIP designed specifically for species identification. Our analysis revealed that the binary presence/absence model struggled with the noisy social media data, leading to low accuracy. The categorical model showed an improvement in true positive rates but continued to produce a large number of false positives. BioCLIP, while not achieving the highest accuracy, demonstrated superior performance in minimizing false positives, which is crucial for biodiversity monitoring where incorrect detections can have significant consequences. Precision-recall analysis using presence-only data indicates their potential in real-world applications where presence detection is prioritized. Our findings suggest that foundation vision

DOI

https://doi.org/10.32942/X21935

Subjects

Life Sciences

Keywords

Artificial Intelligence, social media, biodiversity

Dates

Published: 2025-04-22 20:33

Last Updated: 2025-04-22 20:33

Older Versions

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None

Data and Code Availability Statement:
Open data/code are not available.

Language:
English