Massively Parallel Phylogeny Reconstruction for the Age of DNA Big Data

Supervisors: Dr Daniel Barker, Professor Thomas Meagher

Project description:

The goal is to create re-usable, effective, portable parallel algorithms and open-source software for reconstructing large phylogenies. This would represent a step-change for phylogeny reconstruction, which is now central to many areas of life sciences, from biodiversity and conservation to cancer research. Phylogeny research, a central element of many aspects of modern biology, has been transformed by the availability of DNA sequences, with an explosion in generation and use of such data. Moreover, access to extended data platforms and scientific challenges has increased the scale, in terms of numbers of species, over which phylogeny research is conducted. The resulting Big Data is placing a strain on computational phylogeny. To find the optimal tree-like phylogeny according to current criteria, ideally one would evaluate all possible tree topologies. However, the number of topologies increases factorially with the number of extant entities in the tree. For just 54 species – a small study by current standards – there are more possible topologies than atoms in the universe.  The project will address Big Data challenges for phylogeny research through theoretical and computational approaches. These will include use and characterisation of nature-inspired advanced heuristics (e.g. Strobl and Barker 2016), programming and use of massively parallel computer

systems, optimisation of algorithms and implementations, and cross-site, distributed machine learning techniques. Research training will be provided in phylogeny reconstruction, programming, software engineering, high performance computing and machine learning. This will be provided through meetings with the supervisors; meetings/visits and discussion with Dr Martyn Winn (STFC Scientific Computing Department); attendance at short courses and conferences; and attendance at local and regional seminars and discussion groups. It is anticipated the successful candidate would gain valuable skills and insight for employment in phylogeny, high-performance computing, life sciences research and/or machine learning in academia or industry.


Strobl MAR, Barker D (2016) On simulated annealing phase transitions in phylogeny reconstruction. Molecular Phylogenetics and Evolution, 101, 46-55.