Predicting the effect of CRISPR-Cas9-based epigenome editing

All cells within a multicellular organism have the same genetic sequence up to a minuscule number of somatic mutations. Yet, many cell types exist with diverse morphological and functional traits. Epigenetics is an important regulator and driver of this diversity by allowing differences in cellular state and gene expression despite having the same genotype (Taherian Fard and Ragan, 2019). Indeed, cells traversing the trajectory from pluripotency through terminal differentiation have essentially the same genotype.

Epigenetic modifications such as post-translational modifications (PTMs) to histone proteins are involved in many vital regulatory processes influencing genomic accessibility, nuclear compartmentalization, and transcription factor binding and recognition (Reik et al., 2001; Kouzarides, 2007; Gibney and Nolan, 2010; Klemm et al., 2019; Hafner and Boettiger, 2023; Zhang and Reinberg, 2001). The Histone Code Hypothesis suggests that combinations of different histone PTMs specify distinct chromatin states, thereby regulating gene expression (Strahl and Allis, 2000; Jenuwein and Allis, 2001).

The field of epigenome editing has produced new tools for understanding the outcomes of epigenetic perturbations that promise to be useful for therapeutics by enabling fine-tuned control of gene expression (Matharu and Ahituv, 2020; Thakore et al., 2016; Goell and Hilton, 2021; Stricker et al., 2017). Currently, small molecule drugs are used to potently interfere with epigenetic regulation of gene expression. For example, Vorinostat inhibits histone deacetylases, thereby impacting the epigenetic landscape (Estey, 2013; Yoon and Eom, 2016). However, small molecules globally disrupt the epigenome and transcriptome and therefore are not suitable for targeting individual dysregulated genes nor clarifying epigenetic regulatory mechanisms (Swaminathan et al., 2007). Meanwhile, numerous tools have been designed to harness catalytically dead Cas9 (dCas9) to target epigenetic modifiers to DNA sequences encoded in guide RNAs (gRNAs) (Jinek et al., 2012; Mali et al., 2013; Hilton et al., 2015; Stepper et al., 2017; Kwon et al., 2017; Li et al., 2021). CRISPR-Cas9-based epigenome editing strategies facilitate unprecedented, precise control of the epigenome and gene activation, providing a path to epigenetic-based therapeutics (Cheng et al., 2019).

A major challenge for epigenome editing is designing gRNAs that can achieve a desired level of transcriptional or epigenetic modulation. Finding effective gRNAs currently typically requires expensive and low-throughput experimental strategies (Mohr et al., 2016; Liu et al., 2020; Mahata et al., 2023). An alternative approach would be to computationally model how epigenome editing impacts histone PTMs as well as how perturbing these PTMs would consequently impact gene expression.

To understand how histone PTMs relate to gene expression, large epigenetic and transcriptomic datasets are required. Advancements in high-throughput sequencing have allowed quantification of gene expression and profiling of histone PTMs. Large consortia have performed an extensive number of assays across a wide variety of cell types (The ENCODE Project Consortium, 2012; Kundaje et al., 2015; Barrett et al., 2012).

These include measurements of histone PTMs, transcription factor binding, gene expression, and chromatin accessibility. These data have enhanced our understanding of how histone PTMs and other chromatin dynamics impact transcriptional regulation (Keung et al., 2015; Rao et al., 2014; Holoch and Moazed, 2015).

Studying the function of these histone PTMs, however, has been largely limited to statistical associations with gene expression, which may not capture causal relationships (Karlić et al., 2010; Stillman, 2018; Singh et al., 2016). For example, deep learning has been successful in predicting gene expression from epigenetic modifications, such as transcription factor binding (Schmidt et al., 2017), chromatin accessibility (Schmidt et al., 2020), histone PTMs (Singh et al., 2016; Sekhon et al., 2018; Frasca et al., 2022; Singh et al., 2017; Hamdy et al., 2022; Chen et al., 2022), and DNA methylation (Zhong et al., 2019). However, these studies predict gene expression as binary levels instead of a continuous quantity. Finally, as statistical associations can be driven by non-causal mechanisms, it is unclear whether such computational models learn mechanistic, causal relationships between various epigenetic modifications and gene expression. Beyond modeling the relationship between histone PTMs and gene expression, to fully describe how a particular gRNA would affect gene expression, a model of how epigenome editing affects histone PTMs is also required. To our knowledge, there currently are no computational models that can accurately model, in silico, the impact of epigenome editing on histone PTMs.

Motivated by these observations, we explored models for how epigenome editing impacts histone PTMs as well as how histone PTMs impact gene expression. We used data available through ENCODE (Schreiber et al., 2020a; The ENCODE Project Consortium, 2012) to train a model of how histone PTMs impact gene expression. Our model is highly predictive of endogenous expression and learns an understanding of chromatin biology which is consistent with known patterns of various histone PTMs (Kimura, 2013). To test this model in the context of epigenome editing, we generated perturbation data using the dCas9-p300 histone acetyltransferase system (Hilton et al., 2015). The dCas9-p300 system is thought to act primarily through local acetylation of histone lysine residues, particularly histone subunit H3 lysine residue 27 (H3K27ac). Therefore, we modeled the impact of dCas9-p300 on the epigenome as a local increase in the H3K27ac profile near the target site; since the precise effect of these perturbations is unknown, we tried a variety of potential modification patterns. We then applied our trained model to predict the impact of these putative H3K27ac modifications on gene expression (Figure 1). We found that our models, which are designed to predict gene expression values, were effective in ranking relative fold-changes among genes in response to the dCas9-p300 system, achieving a Spearman’s rank correlation of ∼0.8. However, their performance in ranking fold-changes within individual genes was less successful when compared to the prediction of gene expression across cell types from their native epigenetic signatures. We offer possible explanations in the discussion section.

Schematic of the epigenome editing prediction pipeline.

The pipeline uses epigenetic data to train models to predict endogenous gene expression. These models were used to predict fold-change in gene expression based on perturbed histone PTM input data, and their predictions were validated using CRISPR-Cas9-based epigenome editing data.

Continue Reading