publications
Publications in reversed chronological order.
For a complete list, see Google Scholar.
2021
-
LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq Hentges, Lance D., Sergeant, Martin J., Cole, Christopher B., Downes, Damien J., Hughes, Jim R., and Taylor, Stephen bioRxiv 2021 [Abs]
ATAC-seq, ChIP-seq, and DNase-seq have revolutionized molecular biology by allowing researchers to identify important DNA-encoded elements genome-wide. Regions where these elements are found appear as peaks in the analog signal of an assay’s coverage track, and despite the ease with which humans can visually categorize these regions, meaningful peak calls from whole genome datasets require complex analytical techniques. Current methods focus on statistical tests to classify peaks, reducing the information-dense peak shapes to simply maximum height, and discounting that background signals do not completely follow any known probability distribution for significance testing. Deep learning has been shown to be highly accurate for image recognition, on par or exceeding human ability, providing an opportunity to reimagine and improve peak calling. We present the peak calling framework LanceOtron, which combines multifaceted enrichment measurements with deep learning image recognition techniques for assessing peak shape. In benchmarking transcription factor binding, chromatin modification, and open chromatin datasets, LanceOtron outperforms the long-standing, gold-standard peak caller MACS2 through its improved selectivity and near perfect sensitivity. In addition to command line accessibility, a graphical web application was designed to give any researcher the ability to generate optimal peak calls and interactive visualizations in a single step.Competing Interest StatementS.T. is a founder and CSO of Zegami. J.R.H. is a founder and shareholder of Nucleome Therapeutics. D.J.D. is a paid consultant of Nucleome Therapeutics. No other authors have competing interests.
-
Demographic inference from multiple whole genomes using a particle filter for continuous Markov jump processes Henderson, Donna, Zhu, Sha (Joe), Cole, Christopher B, and Lunter, Gerton PLOS ONE 2021 [Abs]
Demographic events shape a population’s genetic diversity, a process described by the coalescent-with-recombination model that relates demography and genetics by an unobserved sequence of genealogies along the genome. As the space of genealogies over genomes is large and complex, inference under this model is challenging. Formulating the coalescent-with-recombination model as a continuous-time and -space Markov jump process, we develop a particle filter for such processes, and use waypoints that under appropriate conditions allow the problem to be reduced to the discrete-time case. To improve inference, we generalise the Auxiliary Particle Filter for discrete-time models, and use Variational Bayes to model the uncertainty in parameter estimates for rare events, avoiding biases seen with Expectation Maximization. Using real and simulated genomes, we show that past population sizes can be accurately inferred over a larger range of epochs than was previously possible, opening the possibility of jointly analyzing multiple genomes under complex demographic models. Code is available at https://github.com/luntergroup/smcsmc.
2020
-
Ancient Admixture into Africa from the ancestors of non-Africans Cole, Christopher B., Zhu, Sha Joe, Mathieson, Iain, Prüfer, Kay, and Lunter, Gerton bioRxiv 2020 [Abs]
Genetic diversity across human populations has been shaped by demographic history, making it possible to infer past demographic events from extant genomes. However, demographic inference in the ancient past is difficult, particularly around the out-of-Africa event in the Late Middle Paleolithic, a period of profound importance to our species’ history. Here we present SMCSMC, a Bayesian method for inference of time-varying population sizes and directional migration rates under the coalescent-with-recombination model, to study ancient demographic events. We find evidence for substantial migration from the ancestors of present-day Eurasians into African groups between 40 and 70 thousand years ago, predating the divergence of Eastern and Western Eurasian lineages. This event accounts for previously unexplained genetic diversity in African populations, and supports the existence of novel population substructure in the Late Middle Paleolithic. Our results indicate that our species’ demographic history around the out-of-Africa event is more complex than previously appreciated. ### Competing Interest Statement The authors have declared no competing interest.
-
A community-maintained standard library of population genetic models Adrion, Jeffrey R., Cole, Christopher B., Dukler, Noah, Galloway, Jared G., Gladstein, Ariella L., Gower, Graham, Kyriazis, Christopher C., Ragsdale, Aaron P., Tsambos, Georgia, Baumdicker, Franz, Carlson, Jedidiah, Cartwright, Reed A., Durvasula, Arun, Gronau, Ilan, Kim, Bernard Y., McKenzie, Patrick, Messer, Philipp W., Noskova, Ekaterina, Ortega Del Vecchyo, Diego, Racimo, Fernando, Struck, Travis J., Gravel, Simon, Gutenkunst, Ryan N., Lohmueller, Kirk E., Ralph, Peter L., Schrider, Daniel R., Siepel, Adam, Kelleher, Jerome, and Kern, Andrew D. eLife 2020 [Abs]
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to duplication of effort and the possibility for error. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a standard catalog of published simulation models from a wide range of organisms and supports multiple simulation engine backends. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage an even broader community of developers to contribute to this growing resource.
2018
-
Genome-wide association study of a nicotine metabolism biomarker in African American smokers: impact of chromosome 19 genetic influences Chenoweth, M.J., Ware, J.J., Zhu, A.Z.X., Cole, C.B., Cox, L.S., Nollen, N., Ahluwalia, J.S., Benowitz, N.L., Schnoll, R.A., Hawk, L.W., Cinciripini, P.M., George, T.P., Lerman, C., Knight, J., and Tyndale, R.F. Addiction 2018 [Abs]
\textcopyright 2017 Society for the Study of Addiction Background and aims: The activity of CYP2A6, the major nicotine-inactivating enzyme, is measurable in smokers using the nicotine metabolite ratio (NMR; 3′hydroxycotinine/cotinine). Due to its role in nicotine clearance, the NMR is associated with smoking behaviours and response to pharmacotherapies. The NMR is highly heritable (~80%), and on average lower in African Americans (AA) versus whites. We previously identified several reduce and loss-of-function CYP2A6 variants common in individuals of African descent. Our current aim was to identify novel genetic influences on the NMR in AA smokers using genome-wide approaches. Design: Genome-wide association study (GWAS). Setting: Multiple sites within Canada and the United States. Participants: AA smokers from two clinical trials: Pharmacogenetics of Nicotine Addiction Treatment (PNAT)-2 (NCT01314001; n = 504) and Kick-it-at-Swope (KIS)-3 (NCT00666978; n = 450). Measurements: Genome-wide SNP genotyping, the NMR (phenotype) and population substructure and NMR covariates. Findings: Meta-analysis revealed three independent chromosome 19 signals (rs12459249, rs111645190 and rs185430475) associated with the NMR. The top overall hit, rs12459249 (P = 1.47e-39; beta = 0.59 per C (versus T) allele, SE = 0.045), located ~9.5 kb 3′ of CYP2A6, remained genome-wide significant after controlling for the common (~10% in AA) non-functional CYP2A6*17 allele. In contrast, rs111645190 and rs185430475 were not genome-wide significant when controlling for CYP2A6*17. In total, 96 signals associated with the NMR were identified; many were not found in prior NMR GWASs in individuals of European descent. The top hits were also associated with the NMR in a third cohort of AA (KIS2; n = 480). None of the hits were in UGT or OCT2 genes. Conclusions: Three independent chromosome 19 signals account for ~20% of the variability in the nicotine metabolite ratio in African American smokers. The hits identified may contribute to inter-ethnic variability in nicotine metabolism, smoking behaviours and tobacco-related disease risk.
2016
-
Rapporteur summaries of plenary, symposia, and oral sessions from the XXIIIrd World Congress of Psychiatric Genetics Meeting in Toronto, Canada, 16-20 October 2015. Zai, Gwyneth, Alberry, Bonnie, Arloth, Janine, Bánlaki, Zsófia, Bares, Cristina, Boot, Erik, Camilo, Caroline, Chadha, Kartikay, Chen, Qi, Cole, Christopher B, Cost, Katherine T, Crow, Megan, Ekpor, Ibene, Fischer, Sascha B, Flatau, Laura, Gagliano, Sarah, Kirli, Umut, Kukshal, Prachi, Labrie, Viviane, Lang, Maren, Lett, Tristram A, Maffioletti, Elisabetta, Maier, Robert, Mihaljevic, Marina, Mittal, Kirti, Monson, Eric T, O’Brien, Niamh L, Østergaard, Søren D, Ovenden, Ellen, Patel, Sejal, Peterson, Roseann E, Pouget, Jennie G, Rovaris, Diego L, Seaman, Lauren, Shankarappa, Bhagya, Tsetsos, Fotis, Vereczkei, Andrea, Wang, Chenyao, Xulu, Khethelo, Yuen, Ryan K C, Zhao, Jingjing, Zai, Clement C, and Kennedy, James L Psychiatric Genetics 2016 [Abs]
The XXIIIrd World Congress of Psychiatric Genetics meeting, sponsored by the International Society of Psychiatric Genetics, was held in Toronto, ON, Canada, on 16-20 October 2015. Approximately 700 participants attended to discuss the latest state-of-the-art findings in this rapidly advancing and evolving field. The following report was written by trainee travel awardees. Each was assigned one session as a rapporteur. This manuscript represents the highlights and topics that were covered in the plenary sessions, symposia, and oral sessions during the conference, and contains major notable and new findings. Copyright (C) 2016 Wolters Kluwer Health, Inc. All rights reserved.
-
Polygenic risk score prediction of antipsychotic dosage in schizophrenia Hettige, Nuwan C, Cole, Christopher B, Khalid, Sarah, and De Luca, Vincenzo Schizophrenia Research 2016
-
Semi-Automated Identification of Ontological Labels in the Biomedical Literature with goldi Cole, Christopher B., Patel, Sejal, and Knight, Jo bioRxiv 2016 [Abs]
Recent growth in both the scale and the scope of large publicly available ontologies has spurred the development of computational methodologies which can leverage structured information to answer important questions. However, ontological labels, or "terms" have thus far proved difficult to use in practice; text mining, one crucial aspect of electronically understanding and parsing the biomedical literature, has historically had difficulty identifying terms in literature. In this article, we present goldi, an open source R package whose goal it is to identify terms of variable length in free form text. It is available at https://github.com/Chris1221/goldi. The algorithm works through identifying words or synonyms of words present in individual terms and comparing the number of present words to an acceptance function for decision making. In this article we present the theoretical rationale behind the algorithm, as well as practical advice for its usage applied to Gene Ontology term identification and quantification. We additionally detail the options available and describe their respective computational efficiencies.
2015
-
Increased genetic risk for obesity in premature coronary artery disease. Cole, Christopher B, Nikpay, Majid, Stewart, Alexandre Fr, and McPherson, Ruth European journal of human genetics : EJHG 2015 [Abs]
There is ongoing controversy as to whether obesity confers risk for CAD independently of associated risk factors including diabetes mellitus. We have carried out a Mendelian randomization study using a genetic risk score (GRS) for body mass index (BMI) based on 35 risk alleles to investigate this question in a population of 5831 early onset CAD cases without diabetes mellitus and 3832 elderly healthy control subjects, all of strictly European ancestry, with adjustment for traditional risk factors (TRFs). We then estimated the genetic correlation between these BMI and CAD (rg) by relating the pairwise genetic similarity matrix to a phenotypic covariance matrix between these two traits. GRSBMI significantly (P=2.12 × 10(-12)) associated with CAD status in a multivariate model adjusted for TRFs, with a per allele odds ratio (OR) of 1.06 (95% CI 1.042-1.076). The addition of GRSBMI to TRFs explained 0.75% of CAD variance and yielded a continuous net recombination index of 16.54% (95% CI=11.82-21.26%, P\textless0.0001). To test whether GRSBMI explained CAD status when adjusted for measured BMI, separate models were constructed in which the score and BMI were either included as covariates or not. The addition of BMI explained ~1.9% of CAD variance and GRSBMI plus BMI explained 2.65% of CAD variance. Finally, using bivariate restricted maximum likelihood analysis, we provide strong evidence of genome-wide pleiotropy between obesity and CAD. This analysis supports the hypothesis that obesity is a causal risk factor for CAD.European Journal of Human Genetics advance online publication, 29 July 2015; doi:10.1038/ejhg.2015.162.
-
Gene–environment interaction in dyslipidemia Cole, Christopher B, Nikpay, Majid, and McPherson, Ruth 2015 [Abs]
Purpose of review: Recent genome-wide association studies have identified numerous common genetic variants associated with plasma lipid traits and have provided new insights into the regulation of lipoprotein metabolism including the identification of novel biological processes. These findings add to a body of existing data on dietary and environmental factors affecting plasma lipids. Here we explore how interactions between genetic risk factors and other phenotypes may explain some of the missing heritability of plasma lipid traits. Recent findings: Recent studies have identified true statistical interaction between several environmental and genetic risk factors and their effects on plasma lipid fractions. These include interactions between behaviors such as smoking or exercise as well as specific dietary nutrients and the effect size of specific genetic variants on plasma lipid traits risk and modifying effects of measures of adiposity on the cumulative impact of a number of common genetic variants on each of plasma triglycerides and HDL cholesterol. Summary: Interactions between genetic risk factors and clinical phenotypes may account for some of the unexplained heritability of plasma lipid traits. Recent studies provide biological insight into specific genetic associations and may aid in the identification of dyslipidemic patients for whom specific lifestyle interventions are likely to be most effective.
2014
-
Adiposity significantly modifies genetic risk for dyslipidemia Cole, C. B., Nikpay, M., Lau, P., Stewart, a. F. R., Davies, R. W., Wells, G. a., Dent, R., and McPherson, R. The Journal of Lipid Research 2014 [Abs]
Recent genome-wide association studies (GWAS) have identified multiple loci robustly associated with plasma lipids, which also contribute to extreme lipid phenotypes. However, these common genetic variants explain less than 12% of variation in lipid traits. Adiposity is also an important determinant of plasma lipoproteins, particularly plasma triglycerides (TG) and high density lipoprotein cholesterol (HDLc) concentrations. Thus, interactions between genes and clinical phenotypes may contribute to this unexplained heritability. We have applied a weighted genetic risk score (GRS) for each of plasma TGs and HDLc in two large cohorts at the extremes of body mass index (BMI). Both BMI and GRS were strongly associated with these lipid traits. A significant interaction between obese/lean status and GRS was noted for each of TG (PInteraction = 2.87X10-4) and HDLc (PInteraction = 1.05X10-3). These interactions were largely driven by single nucleotide polymorphisms (SNPs) tagging APOA5, GCKR and LPL for TG, and CETP, GALNT2, LIPG and PLTP for HDLc. In contrast, the GRSLDLc X adiposity interaction was not significant. Sexual dimorphism was evident for the GRSHDL on HDLc in obese (PInteraction = 0.016) but not lean subjects. SNP by BMI interactions may provide biological insight into specific genetic associations and missing heritability.