A machine learning approach to estimating the geographical origin of timber

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.


Download Preprint


Jakub Michal Truszkowski, Roi Maor, Raquib Bin Yousuf, Subhodip Biswas, Caspar Chater, Peter Gasson, Scot McQueen, Marigold Norman, Jade Saunders, John Simeone, Naren Ramakrishnan, Alexandre Antonelli, Victor Deklerck


1: Determining the harvest location of timber is crucial to enforcing international regulations designed to tackle illegal logging and associated trade in forest products. However, complex supply chains obscure harvest sources, which often leaves paper-based traceability systems as the sole tool for demonstrating provenance, despite its vulnerability to fraud. Stable Isotope Ratio Analysis (SIRA) can be used to verify claims of timber harvest location by matching levels of naturally occurring stable isotopes within wood tissue, to location-specific SIR predicted from reference data (‘isoscapes’). The primary challenge in developing reliable isoscapes is the need to accurately predict stable isotopes in areas where no physical reference samples are available. Existing attempts to predict isoscapes from reference data have been hampered by the use of simple and ad-hoc statistical models, limiting the precision of estimated isoscapes and the confidence in derived estimates of geographical origin.
2: We present a new SIRA data analysis pipeline, designed to infer timber harvest location. We use Gaussian Processes to robustly estimate isoscapes from reference wood samples, which are then combined with species distribution range data to compute, for every pixel in the study area, the probability of it being the origin of the sample. Finally we present a methodology to determine priority locations to obtain new reference samples in future field expeditions.
3: We demonstrate our approach on a data set of n=87 wood samples from seven oak species in the USA as proof of concept. Our method is able to determine the harvest location up to 520-870 km, depending on the model parameterisation. Incorporating species distribution information improves accuracy by up to 36%. The new sampling locations proposed by our method decrease the variance of resultant isoscapes by up to 86% more than sampling the same number of locations at random.
4: The pipeline we present here combines the prediction of isoscapes with derivation of geographical origin estimates quickly and efficiently. It advances the toolset available to authorities addressing illegal trade in forest products and enforcing anti-deforestation legislation. Importantly, reference data can be added as available, allowing for the expansion of reference collections and increasing prediction accuracy.




Biodiversity, Biostatistics, Forest Management, Other Forestry and Forest Sciences, Statistical Methodology, Statistical Models


SIRA, origin traceability, timber provenance, illegal logging, isoscapes, Gaussian Processes


Published: 2023-02-22 11:54

Last Updated: 2023-02-27 05:00

Older Versions

CC BY Attribution 4.0 International

Additional Metadata

Data and Code Availability Statement:
Code not available (yet)

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.