Epi-evolutionary modelling of pathogen dynamics using high throughput genomic sequencing data in multi-layer networks

Supervisors: Rowland KaoGlenn Marion

Project Description:

Overview: High throughput genomic sequencing of pathogens has great potential to inform our understanding and management of infectious disease. However, new computational tools and improved theoretical understanding of pathogen epi-evolutionary dynamics are needed if these benefits are to be realised. This project aims to develop and apply state-of-the-art data analysis tools encompassing mathematical, statistical and computational models to realise this potential.

Background: In epidemiology, the identification of 'who infected whom' (the transmission tree) allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and to target disease control. Although direct knowledge of the transmission tree is rare, modern computational tools enable statistical inference of transmission dynamics from the limited data that is typically available e.g. reported disease cases. Unfortunately, there is often considerable uncertainty in the information that can be obtained from such data. However, application of new technologies such as high throughput genomic sequencing of pathogens have great potential to inform inference of their transmission dynamics, ultimately leading to better disease control [1]. For example, we have recently applied these to investigate transmission of bovine tuberculosis between livestock and wildlife [2]. In addition, we have developed a novel Bayesian framework [3] that simultaneously infers the transmission tree and unobserved sequences of transmitted pathogen by combining traditional epidemiological observations with pathogen sequence data, even when pathogen sequences and epidemiological data are incomplete.

Aims: Despite such developments there is much still to learn and new tools are required to address open questions. For example, current statistical tools focus on explicitly spatial models, but need to be extended to more general networks e.g. to represent migratory routes of birds, cattle movements or social networks. The successful student will develop both excellent transferrable communication skills, through interaction with an experienced inter-disciplinary team, and highly sought-after technical skills in the development of state-of-the-art tools for data analytics at the intersection of mathematical, statistical and computational modelling, with example applications including:
 
•    Investigation of how separation of the timescales over which genetic change and disease transmission occur affect the quality of information that can be inferred via joint analysis of sequence data and epidemiological observations e.g. symptoms and diagnostic tests.  
•    Understanding and prediction of the impact of network structure on the ability to infer transmission dynamics and investigate the extent to which it is enhanced by knowledge of the network
•    Exploration of the relationship between ideas from coalescent theory e.g. Most Recent Common Ancestor estimation (MCRA) and identification of the source of infection in epi-evolutionary problems.
•    Application of these ideas to the complex multi-species transmission dynamics of bovine tuberculosis, where there is available an exceptional, considerable density of pathogen sequence and contact network data.

References:

[1] Kao R.R., Haydon D.T., Lycett S.J., Murcia P.R. (2014) Supersize me: how whole-genome sequencing and big data are transforming epidemiology. Trends Microbiol.22(5):282-91.doi: 10.1016/j.tim.2014.02.011.  
[2] Crispell, J, Zadoks, RN, Harris, SR, Paterson, B, Collins, DM, de-Lisle, GW, Livingstone, P, Neill, MA, Biek, R, Lycett, SJ, Kao, RR & Price-Carter, M 2017, 'Using whole genome sequencing to investigate transmission in a multi-host system: bovine tuberculosis in New Zealand' BMC Genomics, vol. 18, no. 1, 180. DOI: 10.1186/s12864-017-3569-x
[3] Lau, M.S.Y., Marion, G., Streftaris, G. and Gibson, G. (2015). A systematic Bayesian integration of epidemiological and genetic data. PLoS Computational Biology 11(11), e1004633.

If you wish to apply for this project, please check this link and send your application to this email.

Other: