Emily Miraldi, assistant professor in the Divisions of Immunobiology and Biomedical Informatics at Cincinnati Children’s Hospital, Department of Pediatrics at University of Cincinnati School of Medicine, leads an “immune-engineering” research group that uses mathematical modeling of the immune system to predict immune responses and understand disease.
The Ohio Supercomputer Center (OSC) plays an important role in the research, as Miraldi has needed high performance computing resources to solve computationally demanding mathematical problems.
“The biological question motivating my work at OSC is a very famous one: Cells in the human body share a common DNA blueprint but have a great diversity of functions and behaviors,” Miraldi said.
The diversity of cell types in the human body are driven by unique patterns of gene expression, which are controlled by proteins called transcription factors. Aberrant gene expression patterns are a hallmark of many diseases and can be traced to altered gene regulation by transcription factors, Miraldi explained.
“Discovering the transcription factors that control disease-associated gene expression provides an opportunity to develop therapies that might target those transcription factors to improve disease outcomes in the ‘poorly behaving’ cell types,” Miraldi said.
In an article published in the journal Genome Research, Miraldi’s team recently showed that a new data type called “Assay for Transposase Accessible Chromatin” (ATAC-seq), could identify transcription factor regulators of gene expression across cell types (Miraldi et al. (2019) Genome Research, Pokrovskii et al. (2019) Immunity).
Before having access to OSC’s high performance computing resources, the team’s studies used simple mathematical models to predict the transcription factor binding from ATAC-seq. With more computational capability, Miraldi began using deep neural network models, which enabled her to improve the accuracy of the transcription factor binding predictions.
“We initially used ATAC-seq data in a crude way to infer transcription factor binding sites, but, taking advantage of the high performance computing resources at OSC, were able to use the latest advances in deep neural network modeling to more accurately predict transcription factor binding events from ATAC-seq.”
The resulting collection of open-source, user-friendly deep neural network models is called “maxATAC”. The maxATAC models can be used by other research groups to predict transcription factor binding from ATAC-seq in any human cell type – including single-cell (sc)ATAC-seq, which is now a standard technology at many research institutions.
“Transcription factor binding prediction scATAC-seq is especially valuable at Cincinnati Children’s Hospital, where there is great desire to understand gene regulation and disease mechanisms from scarce patient samples (e.g., cancer tumor biopsies, transplant rejection) that can only be analyzed by single-cell technologies,” she said.