Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Jennifer D'Souza, Zachary M. Laubach, Tarek Al Mustafa, Sina Zarrieß, Robert Frühstückl, Phyllis Illari

Abstract

This paper presents an exploratory study that harnesses the capabilities of large language models (LLMs) to mine key ecological entities from invasion biology literature. Specifically, we focus on extracting species names, their locations, associated habitats, and ecosystems, information that is critical for understanding species spread, predicting future invasions, and informing conservation efforts. Traditional text mining approaches often struggle with the complexity of ecological terminology and the subtle linguistic patterns found in these texts. By applying general-purpose LLMs without domain-specific fine-tuning, we uncover both the promise and limitations of using these models for ecological entity extraction. In doing so, this study lays the groundwork for more advanced, automated knowledge extraction tools that can aid researchers and practitioners in understanding and managing biological invasions.

DOI

https://doi.org/10.32942/X29D1X

Subjects

Engineering, Life Sciences

Keywords

large language models, Information Extraction, Generative AI, invasion biology, Literature review, prompt engineering, schema-based information extraction

Dates

Published: 2025-03-05 14:05

License

CC-By Attribution-ShareAlike 4.0 International

Additional Metadata

Language:
English

Conflict of interest statement:
None

Data and Code Availability Statement:
https://doi.org/10.5281/zenodo.13956882