Welcome to the home page of Bruce Rannala. I develop statistical methods, computer software, and sometimes new theory. I also teach a course on human genetic variation EVE 131 each Fall. My research focuses on statistical genetics, population genetics, and phylogenetics. However, our methods are often applied to real world problems such as pandemics (HIV, COVID19), cancer genetics, conservation biology and disease gene mapping.
Examples of Current Projects
Bayesian inference of ages of latent lineages of HIV using sequence data (PhD student Anna Nagel)
Bayesian inference of hybrid and backcross individuals using genomic sequences (Postdoc Sneha Chakraborty)
New models aimed at early identification of SARS-CoV2 variants of concern using phylogenetic information (Postdoc Mike May)
Epidemiology has been transformed by the advent of Bayesian phylodynamic models that allow researchers to infer the geographic history of pathogen dispersal over a set of discrete geographic areas [1, 2]. These models provide powerful tools for understanding the spatial dynamics of disease outbreaks, but contain many parameters that are inferred from minimal geographic information (i.e., the single area in which each pathogen was sampled). Consequently, inferences under these models are inherently sensitive to our prior assumptions about the model parameters. Here, we demonstrate that the default priors used in empirical phylodynamic studies make strong and biologically unrealistic assumptions about the underlying geographic process. We provide empirical evidence that these unrealistic priors strongly (and adversely) impact commonly reported aspects of epidemiological studies, including: 1) the relative rates of dispersal between areas; 2) the importance of dispersal routes for the spread of pathogens among areas; 3) the number of dispersal events between areas, and; 4) the ancestral area in which a given outbreak originated. We offer strategies to avoid these problems, and develop tools to help researchers specify more biologically reasonable prior models that will realize the full potential of phylodynamic methods to elucidate pathogen biology and, ultimately, inform surveillance and monitoring policies to mitigate the impacts of disease outbreaks.
@article{gao2023model,title={Model misspecification misleads inference of the spatial dynamics of disease outbreaks},author={Gao, Jiansi and May, Michael R and Rannala, Bruce and Moore, Brian R},journal={Proceedings of the National Academy of Sciences},volume={120},number={11},pages={e2213913120},year={2023},doi={10.1073/pnas.2213913120},publisher={National Academy of Sciences},}
Bioinformatics
PrioriTree: a utility for improving phylodynamic analyses in BEAST
Jiansi Gao, Michael R May, Bruce Rannala, and Brian R Moore
Phylodynamic methods are central to studies of the geographic and demographic history of disease outbreaks. Inference under discrete-geographic phylodynamic models—which involve many parameters that must be inferred from minimal information—is inherently sensitive to our prior beliefs about the model parameters. We present an interactive utility, PrioriTree, to help researchers identify and accommodate prior sensitivity in discrete-geographic inferences. Specifically, PrioriTree provides a suite of functions to generate input files for—and summarize output from—BEAST analyses for performing robust Bayesian inference, data-cloning analyses and assessing the relative and absolute fit of candidate discrete-geographic (prior) models to empirical datasets.
@article{gao2023prioritree,title={PrioriTree: a utility for improving phylodynamic analyses in BEAST},author={Gao, Jiansi and May, Michael R and Rannala, Bruce and Moore, Brian R},journal={Bioinformatics},volume={39},number={1},pages={btac849},year={2023},url={https://academic.oup.com/bioinformatics},doi={10.1093/bioinformatics/btac849},publisher={Oxford University Press},}
Genetics
An efficient exact algorithm for identifying hybrids using population genomic sequences
The identification of individuals that have a recent hybrid ancestry (between populations or species) has been a goal of naturalists for centuries. Since the 1960s, codominant genetic markers have been used with statistical and computational methods to identify F1 hybrids and backcrosses. Existing hybrid inference methods assume that alleles at different loci undergo independent assortment (are unlinked or in population linkage equilibrium). Genomic datasets include thousands of markers that are located on the same chromosome and are in population linkage disequilibrium which violate this assumption. Existing methods may therefore be viewed as composite likelihoods when applied to genomic datasets and their performance in identifying hybrid ancestry (which is a model-choice problem) is unknown. Here, we develop a new program Mongrail that implements a full-likelihood Bayesian hybrid inference method that explicitly models linkage and recombination, generating the posterior probability of different F1 or F2 hybrid, or backcross, genealogical classes. We use simulations to compare the statistical performance of Mongrail with that of an existing composite likelihood method (NewHybrids) and apply the method to analyze genome sequence data for hybridizing species of barred and spotted owls.
@article{chakraborty2023efficient,title={An efficient exact algorithm for identifying hybrids using population genomic sequences},doi={10.1093/genetics/iyad011},author={Chakraborty, Sneha and Rannala, Bruce},journal={Genetics},pages={iyad011},year={2023},publisher={Oxford University Press},}