When scientists want to understand how genes are regulated beyond just the sequence of RNA, they look to the epitranscriptome—the collection of chemical modifications on RNA molecules that influence their stability, translation, and function. One of the most common and well-studied modifications is N6-methyladenosine (m6A), which plays key roles in brain development, immune responses, and cancer. However, detecting these modifications requires specialized technologies that are expensive, technically demanding, and not widely accessible.
To make matters more complicated, most existing computational tools for predicting RNA modifications rely on matching datasets: they require epitranscriptome data paired with RNA sequencing (RNA-seq) data from the same condition, such as a specific tissue type or disease state. This restricts their usefulness to only those few contexts where such paired data exist.
ExpressRM is a new machine learning framework developed by researchers at Nanjing University. ExpressRM is designed to predict RNA modifications in entirely new biological conditions—even when no epitranscriptome data is available for training. It accomplishes this by combining multimodal learning with a strategy called zero-shot learning, allowing it to generalize from what it has learned in other conditions.
Overview of zero-shot learning in ExpressRM
(A) A zero-shot learning model is trained to recognize classes or conditions that have never encountered during the training phase [104–108]. There is no overlap between the classes employed for training and those employed for testing. (B) A consistent feature set format was utilized across different conditions to ensure data integration, including both universal features (genome information) and condition-specific features (condition-dependent transcriptome information). By leveraging similarities or correlations between seen and unseen conditions, zero-shot learning framework dissects RNA modification sites in an unseen condition only requiring matched RNA-seq data, without needing epitranscriptome data of that condition as training labels.
This innovation means that researchers can now explore RNA modifications using only RNA-seq data, which is much easier and cheaper to generate. RNA-seq is already a standard tool in biology and medicine, routinely used in cancer studies, stem cell research, and disease diagnostics. By enabling RNA modification predictions from these commonly available datasets, ExpressRM dramatically expands the range of biological questions that can be addressed.
ExpressRM was tested on a benchmark dataset containing matched RNA-seq and epitranscriptome data from 37 different human tissues. Despite never seeing epitranscriptome data from some tissues during training, the model was able to accurately predict modification patterns in these “unseen” tissues. Its performance matched or even exceeded traditional methods that relied on paired training data.
The tool also has a unique ability to distinguish between “housekeeping” RNA methylation sites—which stay constant across many conditions—and dynamic sites that may appear or disappear depending on the cell’s state. In a case study on glioblastoma, a deadly brain cancer, ExpressRM successfully uncovered m6A RNA methylation sites that included both previously validated and new, potentially important disease-related changes.
For researchers studying cancer, development, or any complex biological system, this method is a game-changer. It offers a way to study RNA regulation at the modification level without the need for expensive and time-consuming assays. In contexts like rare diseases or poorly characterized tissues—where it’s hard to collect large sample sets—ExpressRM can still provide meaningful insights.
It also paves the way for more personalized and scalable studies, since it enables researchers to explore the epitranscriptomic landscape of patient samples, experimental conditions, or developmental stages that have never before been analyzed.
ExpressRM is a powerful step forward in decoding RNA biology. By removing the need for specialized datasets, it puts the ability to explore RNA modifications within reach of many more researchers. Whether you’re investigating the roots of cancer, studying how brain cells develop, or exploring the mysteries of RNA in your favorite model organism, ExpressRM makes it possible to ask new questions—and get meaningful answers—with only RNA sequencing data in hand.
Availability – The ExpressRM framework was implemented with Pytorch 2.0.1, and the codes are available on github: https://github.com/yiyousong/ExpressRM.