IAP-24-037

Novel statistical models accounting for misidentification in ecological data

Biodiversity (Maclaurin and Sterelny, 2008) refers to the variety of all living things on the earth, which often interact with each other in a sophisticated way. Human beings rely heavily on biodiversity to survive; thus, the current rapid loss of biodiversity (Cardinale et al., 2012) is a serious problem to our ecosystem, which needs to be addressed effectively. One immediate remedy for this dilemma is wildlife conservation that has got a lot of attention in the past decades. Before taking any measurements, decision makers and conservation managers always need accurate information (e.g., species abundance and occurrence) about the latest status of the target species. The information required can be obtained by collecting appropriate data on the target population and then analysing the data through suitable statistical models, either existing or newly developed. Capture-recapture (CR, McCrea and Morgan, 2014), spatial capture-recapture (SCR, Borchers and Efford, 2008) and presence/absence data (MacKenzie et al., 2002) are often collected for estimating key demographic parameters and population density and investigating the patterns and drivers of species occurrence.
In the past decades, advanced sampling technologies and data submission platforms made it possible to collect new forms of aforementioned ecological data including but not limited to genetic, camera trapping, and acoustic data (e.g., Royle et al., 2009; Stevenson et al., 2021). The sampling method innovation enhances our capability to collect more data, leading to more reliable investigations on some elusive species, say large carnivores in Scandinavia (Bischof et al., 2020). While the new sampling technologies greatly help with data collection, they pose new non-trivial challenges to analysis of the data. The challenges are in two aspects. The first aspect is that new models need be developed to account for the new and often complicated features of the data collected. For example, ecological data are often prone to misidentification (e.g., McClintock et al., 2010; Kodi et al., 2024) while correct identification of individuals and species is crucial to most existing models for these data (MacKenzie et al., 2002; Kodi et al., 2024). Limited models accounting for misidentification in CR data in the literature rely on unrealistic assumptions (e.g. Link et al., 2010) and those for SCR data simply condition on multiple detections of the same individual, which discard data with only a single encounter (e.g. Kodi et al., 2024; Petersma et al., 2024). The second aspect is about statistical estimation and inference which is more challenging. Models accounting for misidentification of ecological data are usually complicated and challenging to fit using existing inference methods (e.g. Link et al., 2010; Zhang et al., 2019).
This PhD project will develop novel statistical models to deal with misidentification issues of CR, SCR and occupancy data, and computationally efficient inference methods and software tools to fit these new models. More specifically, we aim to develop models with less unrealistic assumptions but with general framework for ecological data with misidentification, which will have a better performance for real data analysis. We also aim to develop general and computationally efficient inference methods for these models in the Bayesian paradigm, including approximate but efficient Bayesian inference methods. The models and methods developed will be made accessible to non-specialists through user-friendly and open-access computer software packages. The aim of the project is to develop an effective statistical framework to get a more accurate and reliable understanding of wildlife populations, particularly those endangered species.

Click on an image to expand

Methodology

This PhD project relies on the following main methodologies:
• mathematical/statistical derivations using probability and statistics knowledge
• statistical programming and computing in R/Python or other suitable platforms
• simulation-based analysis
• real data applications

Project Timeline

Year 1

• Conduct a thorough literature review on ecological data analysis, particularly those models for capture-recapture and occupancy data with misidentification
• Develop initial models for closed-population CR/Occupancy data with misidentification, relaxing the unrealistic assumptions of existing models
• Run simulations to check the performance of the new model and compare it to existing ones

Year 2

• Extend the CR misidentification model framework to open populations and evaluate the performance of the model for parameter estimation by simulation and real data analysis
• Write up the work on CR data with misidentification and prepare to submit it to a suitable statistical or ecological journal
• Develop models for SCR data with misidentification and do a comprehensive benchmarking analysis, comparing its performance to the multiple-detection approach
• Develop efficient Bayesian inference methods (e.g. the integrated nested Laplace approximation – INLA) for these new models

Year 3

• Write up the work on SCR models with misidentification and prepare to submit it to a suitable statistical or ecological journal
• Formulate a latent multinomial model (Link et al., 2010) for occupancy data with misidentification and/or species dynamics and fit the model using the efficient saddlepoint approximation method developed by Zhang et al. (2019)
• Conduct a comprehensive benchmarking analysis comparing the performance of the new model and existing ones, and efficiency of the inference methods
• Write up the work on occupancy models and prepare to submit it to a suitable statistical or ecological journal

Year 3.5

• Develop a comprehensive and user-friendly R package to provide practitioners and conservationist with a flexible-yet accessible tool that implements the methods derived from this PhD.
• Write up PhD thesis and submit it for examination
• Complete viva and address potential revisions

Training
& Skills

The PhD student will receive training in statistical analysis and modelling, with a focus on
Methods for CR, SCR and occupancy modelling. Additionally, training will include studying computationally efficient approaches and software inference such as TMB, NIMBLE and INLA. This training will help the PhD student to enhance statistical, modelling and ecological
skills. The PhD student will gain expertise in preparing research papers and presentations, enabling effective communication of findings to both scientific and non-scientific audiences. The PhD program will encourage critical thinking and problem-solving skills to address challenges and uncertainties that may arise during the research process

References & further reading

Borchers, D. L., & Efford, M. G. (2008). Spatially explicit maximum likelihood methods for capture–recapture studies. Biometrics, 64(2), 377-385.

Cardinale, B. J., Duffy, J. E., Gonzalez, A., Hooper, D. U., Perrings, C., Venail, P., … & Naeem, S. (2012). Biodiversity loss and its impact on humanity. Nature, 486(7401), 59-67.

Link, W. A., Yoshizaki, J., Bailey, L. L., & Pollock, K. H. (2010). Uncovering a latent multinomial: analysis of mark–recapture data with misidentification. Biometrics, 66(1), 178-185.

MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Andrew Royle, J., & Langtimm, C. A. (2002). Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83(8), 2248-2255.

Maclaurin, J., & Sterelny, K. (2008). What is biodiversity? University of Chicago Press.

McCrea, R. S., & Morgan, B. J. (2014). Analysis of capture-recapture data. CRC Press.

McClintock, B. T., Bailey, L. L., Pollock, K. H., & Simons, T. R. (2010). Experimental investigation of observation error in anuran call surveys. The Journal of Wildlife Management, 74(8), 1882-1893.

Kodi, A. R., Howard, J., Borchers, D. L., Worthington, H., Alexander, J. S., Lkhagvajav, P., … & Sharma, K. (2024). Ghostbusting—Reducing bias due to identification errors in spatial capture‐recapture histories. Methods in Ecology and Evolution, 15(6), 1060-1070.

Petersma, F. T., Thomas, L., Thode, A. M., Harris, D., Marques, T. A., Cheoo, G. V., & Kim, K. H. (2024). Accommodating false positives within acoustic spatial capture–recapture, with variable source levels, noisy bearings and an inhomogeneous spatial density. Journal of Agricultural, Biological and Environmental Statistics, 29(3), 471-490.

Stevenson, B. C., van Dam‐Bates, P., Young, C. K., & Measey, J. (2021). A spatial capture–recapture model to estimate call rate and population density from passive acoustic surveys. Methods in Ecology and Evolution, 12(3), 432-442.

Zhang, W., Bravington, M. V., & Fewster, R. M. (2019). Fast likelihood‐based inference for latent count models using the saddlepoint approximation. Biometrics, 75(3), 723-733.

Apply Now