Molecular Phylogenomics
The primary focus of this work is on inferring evolutionary relationships and divergence times of strains, clones, and species. We are interested in developing new methods and approaches that make evolutionary computations feasible for big datasets, while preserving the accuracy of inference. We conduct computer simulations and large-scale empirical data analyses to validate our innovations. For example, we developed a novel method called PhyClass to establish the optimal position of each read generated with next-generation sequencing in a reference phylogenetic tree. We have also recently developed and applied a calibration-free phylogenetic method (RelTime) to the timetree of metazoan diversification. In addition, we are now developing evolution-aware approaches for the analysis of tumor profiling data, with an aim to infer individual clone genotypes and estimates cancer clonal composition of tumor samples.
The Timetree of Life
For decades, my group has engaged in large-scale empirical data analyses to establish a timescale for the tree of life. Most recently, in a massive meta-analytic synthesis of results from over 2000 publications, we have constructed the largest timetree of life that contains >50,000 species. It shows that the time to speciation is similar among major groups of species (mammals, birds, and insects) and that overall species diversity is expanding at a rapid rate. These findings suggest that the accumulation of genetic incompatibilities in geographically isolated populations is the primary cause of speciation and that the processes of speciation and adaptation are largely uncoupled.
We have also developed a resource for scientists and the general public to access molecular clock-based vertebrate species divergence estimates on the Timetree of Life website. This work (research and synthesis) represents our pioneering efforts to build a tree of life scaled to time, which is necessary for estimating the rates at which genes and genomes change. In Phylomedicine, species timetrees are critical for generating evolutionary parameters normalized by time that enable the simultaneous use of all proteins, even those that evolve at vastly different rates.