Connect With Aresty

RAD Collaboration
Deep Learning-Driven Integration of Unpaired Single-Cell Transcriptomic and Epigenomic Data for Discovering Novel Therapeutic Targets in T Cells
Project Summary
Gene regulatory networks (GRNs) are crucial in understanding how genes are expressed in a cell—in response to intracellular and extracellular signals, gene transcription is dynamically regulated to coordinate cellular activities. GRNs are computational models of the regulation of gene expression, taking the form of a network or a graph, defined in mathematical terms. The basic interpretation of these models aims to capture the relationship between transcription factors (TFs) and their target genes—in a graphical model, the nodes are the genes or TFs, and the edges represent the relationship between them. From a historical context, GRNs took center stage due to the increase of experimental techniques and computational algorithms regarding GRN construction and inference. Based on the amount of transcriptomics data available, GRNs can be inferred to better understand the biological problem at hand, though the data may not capture any underlying regulatory mechanisms directly. By considering epigenetic aspects of gene regulation, such as chromatin conformation (e.g., Hi-C, HiChIP) and TF motif accessibility (e.g., chromVAR), we can generate GRNs that have the potential to better represent gene regulation in vivo. The idea of bulk profiling, or mixed measures across cell-types, can also cause problems as we cannot distinguish GRNs specific to one cell. Thus, single-cell technologies serve as a solution in allowing for the inference of GRNs across different cell types and states as well as the introduction of multi-modal profiling technologies. Single-cell RNA sequencing (scRNA-seq) is a method of measuring gene expression in single-cells – specifically for the detection and quantitative analysis of messenger RNAs, useful for studying cellular response. It allows for the assessment of transcriptional differences between individual cells in rare cell populations that would otherwise go undetected. Single-nucleus ATAC sequencing (snATAC-seq) or Assay for Transpose Accessible Chromatin allows for chromatin accessibility measurement across the genome through the transpose Tn5 to reveal regulatory genomic regions in a nuclei. While each modality can independently identify cell types and states, matching RNA and ATAC profiles remains challenging. RNA expression changes often lag behind chromatin accessibility alterations, and identical ATAC profiles can correspond to different RNA profiles due to variations in TF binding. Therefore, the joint-profiling of gene expression and chromatin accessibility is necessary for resolving bono fide GRNs and revealing new insights of the cells. However, joint-profiling data is significantly scarcer than single-modality data due to high costs and limited capture efficiency. This scarcity necessitates accurate computational methods to integrate scRNA-seq and snATAC-seq for effective GRN analysis. Integration methods aim to align cells profiled by separate technologies and project them into a shared low-dimensional space, enabling the analysis of open chromatin regions-primarily distal elements like enhancers-alongside their corresponding RNA expression levels, a challenge known as “linking”. Various integration methodologies have been developed, which can be categorized into five main approaches: (1) Matrix Factorization & Factor Analysis (e.g., LIGER); (2) Manifold Alignment & Optimal Transport (e.g., SCOT, UnionCom); (3) Deep Learning & Variational Autoencoders (VAEs) (e.g., BABEL, scJoint, GLUE); (4) Gene Activity Scoring & Feature Matching, where tools like Cicero predict gene activity scores from snATAC-seq peaks, and Seurat converts peaks to gene activity scores; and (5) Reference Mapping & Label Transfer, where tools like BindSC, FigR, and Seurat use feature correlation in various forms. Despite advancements, challenges remain in this rapidly evolving field. A key issue is the mismatch between feature spaces in different modalities, leading to potential information loss. As data volume grows, computational methods must be both scalable and accurate to capture the non-linear relationship between chromatin accessibility and gene expression. Additionally, they must be robust to high dropout rates and technical noise inherent to scRNA-seq and snATAC-seq. Addressing this situation, it is important to note the rapid increase in the application of deep learning algorithms that utilize neural networks that mimic the framework of the brain—they consist of processing layers that allow them to learn data in different layers of abstraction. Deep learning has shown improved performance with the analysis of bulk multi-omics data, as well as its ability to capture latent features from the combined high-dimensional omics feature space and flexible architecture. We prioritize scJoint and GLUE for their superior performance in benchmarking studies, particularly in handling unpaired data and addressing key integration challenges. In this proposal, we will evaluate scJoint and GLUE using scRNA-seq and snATAC-seq data from individual T cells in our metastatic melanoma cohort. T cells are pivotal in immunotherapy response, with “stem-like” and “exhausted” states critically influencing outcomes. We hypothesize that epigenetic regulators like TOX (linked to exhaustion) and TCF7 (associated with stemness) will emerge as key GRN hubs. Successful integration must preserve cell-subtype specificity (e.g., distinguishing effector vs. memory regulatory logic) and align with orthogonal datasets (e.g., CITE-seq). By rigorously evaluating scJoint and GLUE on our metastatic melanoma T cell dataset, this work will advance our understanding of how epigenetic regulation governs T cell functional states in tumor immunity. Successful integration will enable the identification of key TFs and enhancer-promoter interactions driving exhaustion and stem-like phenotypes, offering mechanistic insights into immunotherapy resistance.

Furthermore, by incorporating prior knowledge (e.g., Cicero-predicted peak-gene links, chromatin conformation data or TF motif accessibility) and exploring time-series alignment or dynamical models (e.g., RNA velocity extensions) to resolve the lag between RNA and ATAC, we aim to refine these models to resolve distal regulatory connections and reduce spurious associations. We will prioritize identified targets for in vitro T cell assays and in vivo mouse models with collaborators. The resulting GRNs will not only shed light on T cell plasticity but also nominate combinatorial therapeutic targets to reinvigorate dysfunctional T cells. This approach establishes a framework for leveraging unpaired multi-omics data to decode regulatory biology in complex tissues, with broad applicability beyond cancer immunology.



Sign in to view more information about this project.