Integrated distribution modelling to estimate the national population size of 1 an alpine bird

:

2) Here we integrate information from two distinct citizen science data sources, 24 opportunistic occurrence data and targeted standardized distance-sampling survey 25 data, to estimate the population size of an alpine bird -the willow ptarmigan, Lagopus 26 lagopus -in Norway between 2008 and 2017. Our model combines the strengths of the 27 occurrence data (widespread but coarse) and standardised survey data (spatially 28 restricted but detailed) to estimate ptarmigan population size at both local and 29 national-scales. Using simulations, we also examined the sensitivity of the population 30 size estimates to each data type to guide future data collection.

31
3) An occupancy-detection model fit to the occurrence data predicted that willow 32 ptarmigan were present in 29% of 5 x 5 grid cells across Norway. Occupancy 33 probability was most strongly affected by habitat covariates. The distance-sampling 34 model predicted that ptarmigan density in the area covered by the line-transect surveys 35 was, on average, 13 individuals per km 2 , and most strongly affected by climatic  indices, there is a risk that the value of absolute population abundance estimates is overlooked 63 for understanding species' population dynamics and trends. 64 The main challenge to the quantification of species abundances at large spatial scales 65 is imperfect detection and spatial heterogeneity in abundance ( best regarded as opportunistic i.e., without a consistent sampling protocol. We downloaded 110 two sets of data: (1) occurrence data for willow ptarmigan and (2) occurrence data for all 111 birds (Fig. 1). Data for all bird occurrences were used in the statistical analysis to control for 112 spatial and temporal variation in the sampling effort of ornithologists across Norway. The 113 willow ptarmigan occurrence dataset included some observations from the line-transect 114 surveys; however, we did not discard them from the occurrence data set since they still 115 provided valid occurrence observations. Both sets of data were filtered by removing: duplicate 116 observations (with the same date, species and geographic coordinates); those with coordinate 117 uncertainty greater than 5 km; those with geographic coordinates with less than three decimal 118 places and those outside our temporal scope of 2008-2017. We focused on records during the 119 breeding season between May and September. The occurrence data were mapped to a 120 reference grid comprising 5 x 5 km grid cells that covered the extent of Norway (limited to 121 grids that overlapped at least 50% with mainland Norway). This resolution should account for     (Table S1).   The models were run in JAGS with 20,000 iterations and 10,000 burnin, with vague priors.

212
The Rhat statistics and traceplots were used to check for convergence.  Out-of-sample: The latitudinal range of the data was split into 25 blocks that were 290 systematically assigned to one of five folds (Fig. S1). We repeated the models described 291 above five times -training using four of the folds (e.g., folds 1-4) and using the remaining

324
Mean occupancy probability across all grid cells was 0.29, but there was substantial 325 spatial variation (Fig. 2). Occupancy probability was most positively affected by tree line  (Fig. 3). Variable indicator selection supported the importance of variables related to 345 temperature (maximum and minimum temperature) and tree line (Fig. S3). For fit measures, 346 the model predictions were strongly correlated with the observed data (r = 0.93; Fig S4); the 347 mean absolute deviation was 2.39 (for the line-transect mean count) or 4.9 (for year-specific 348 transect predicted count), and the Bayesian p-value was 0.58, suggesting no fit problems.

349
Also, cross-validation suggested no great loss of fit between the test versus training datasets 350 (Table S2). Abundance was highest in central Norway and lowest in the southeast and north (Fig. 4).

358
Summed across all grids, total abundance, on average across years, was 1,164,379 (95% CI = 359 1,053,149 to 1,307,195) (Fig. 5a). This estimate was generally similar for all three approaches 360 taken to select the environmental variables for the most parsimonious model (Fig. S5). Year-  Grids with high uncertainty in both occupancy and abundance were found in central Norway.   Further simulation studies could explore how the optimal integration approach varies with the 446 spatial scale and coverage of each data stream.

447
Regardless of the data integration approach, population size estimates will always 448 contain some uncertainty, which might limit the application for conservation and is not found in these habitats, and this can be easily modelled with the right covariates. Our 457 analysis suggested that uncertainty in occupancy was especially high in western central and 458 southern Norway, where habitat might be suitable but there are less data. Hence, targeted data 459 collection in these areas may be most beneficial. However, as a caveat, this analysis did not 460 consider other causes of uncertainty, including model structure, which also might be further 461 investigated, but we used a typical range of habitat and climate covariates. 462 We applied our method to the willow ptarmigan in Norway, which currently has the 463 IUCN status of "least concern", but like similar montane species, has been declining across