Liberating host-virus knowledge from biological dark data

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.


Download Preprint

Supplementary Files

Nathan Upham, Jorrit H. Poelen, Deborah Leo Paul, Quentin John Groom, Nancy B. Simmons, Maarten P. M. Vanhove, Sandro Bertolino, DeeAnn M. Reeder, Cristiane Bastos-Silveira, Atriya Sen


Connecting basic data about bats and other potential hosts of SARS-CoV-2 with their ecological context is critical for understanding the emergence and spread of COVID-19. However, when global lockdown started in March 2020, the world’s bat experts were locked out of their research laboratories, which, in turn, locked up large volumes of offline ecological and taxonomic data. Pandemic lockdowns have put a magnifying glass on the long-standing problem of biological ‘dark data’: data which are published, but disconnected from digital knowledge resources, and thus unavailable for high-throughput analysis. Knowledge of host-to-virus ecological interactions will be biased until this challenge is addressed. Here we outline two viable solutions: (i) how to interconnect published data about host organisms, viruses, and other pathogens in the short term; and (ii) how to shift the publishing paradigm beyond unstructured text (‘PDF prison’) to labeled networks of digital knowledge. Biological taxonomy is foundational to both solutions as the indexing system for biodiversity data. Building digitally connected ‘knowledge graphs’ of host-pathogen interactions will establish the needed agility for quickly identifying reservoir hosts of novel zoonoses, allow for more robust predictions of emergence, and thereby strengthen planetary health systems.



Diseases, Epidemiology, International Public Health, Medicine and Health Sciences, Organisms, Public Health


Artificial Intelligence, FAIR, mammal, open-access, Spillover, Virus, zoonosis


Published: 2021-01-15 23:45

Last Updated: 2021-05-28 00:29

Older Versions

CC-By Attribution-ShareAlike 4.0 International