Anshul Kundaje, Assistant Professor
Department of Genetics
Department of Computer Science
Deep learning approaches to denoise, integrate, impute and decode functional genomic data
Abstract: We present interpretable deep learning approaches to address three key challenges in integrative analysis of functional genomic data. (1) Data denoising: Data quality of functional genomic data is affected by a myriad of experimental parameters. Making accurate inferences from chromatin profiling experiments that involve diverse experimental parameters is challenging. We introduce a convolutional denoising algorithm to learn a mapping from suboptimal to high-quality datasets that overcomes various sources of noise and variability, substantially enhancing and recovering signal when applied to low-quality chromatin profiling datasets across individuals, cell types, and species. Our method has the potential to improve data quality at reduced costs. (2) Data imputation: It is largely infeasible to perform 100s of genome-wide assays targeting diverse transcription factors (TFs) and epigenomic marks in 100s of cellular contexts due to cost and material constraints. We have developed multi-task, multi-modal deep neural networks to predict chromatin marks and binding landscapes of 100s of TFs by integrating regulatory DNA sequence with two low-cost assays namely ATAC-seq (or DNase-seq) and RNA-seq performed in a target cell type of interest. We train our models on large reference compendia and predict accurate TF and chromatin landscapes in new cellular contexts thereby significant expanding the context-specific annotation of the non-coding genome. (3) Decoding context-specific regulatory architecture: Finally, we develop novel, efficient interpretation engines for extracting predictive and biological meaningful patterns from integrative deep learning models of TF binding and chromatin accessibility. We obtain new insights into TF binding sequence affinity models (e.g. significance of flanking sequences and fusion motifs), infer high-resolution point binding events of TFs, dissect higher-order cis-regulatory sequence grammars (including density and spatial constraints) and unravel dynamic regulatory drivers of cellular differentiation.
Bio: Anshul Kundaje is an Assistant Professor of Genetics and Computer Science at Stanford University. The Kundaje lab develops statistical and machine learning methods for integrative analysis of functional genomic data to decode regulatory elements and transcriptional regulatory networks across diverse cell types and tissues in healthy and diseased states. Anshul completed his Ph.D. in Computer Science in 2008 from Columbia University. As a postdoc at Stanford University from 2012-2014 and a research scientist at MIT and the Broad Institute from 2012-2014, he led the integrative analysis efforts of two functional genomics consortia – The Encyclopedia of DNA Elements (ENCODE) and The Roadmap Epigenomics Project. Anshul is a recipient of the 2016 NIH Director’s New Innovator Award and The 2014 Alfred Sloan Foundation Fellowship.