Which tool tells the truth? A...

July 18, 2025

Tumors are often genetic mosaics, harboring diverse cell populations shaped by mutations like copy number variations (CNVs). These genomic alterations drive cancer evolution, but detecting them at single-cell resolution remains challenging. In this study, scientists systematically evaluated five widely used tools that infer CNVs from single-cell RNA sequencing (scRNA-seq) data. The analysis uncovered dramatic differences in performance among methods, depending on data type and experimental design. Notably, CaSpER and CopyKAT emerged as top performers for CNV inference, while inferCNV and CopyKAT excelled in identifying tumor subpopulations. This comprehensive comparison provides a valuable roadmap for researchers aiming to decode cancer heterogeneity more precisely.

Cancer is not a single disease but a dynamic ecosystem of genetically distinct cells, each evolving under selective pressures. One of the key genomic mechanisms fueling this diversity is copy number variation (CNV), where large DNA segments are duplicated or deleted, altering gene dosage and function. Although bulk sequencing can detect CNVs, it masks important single-cell differences. With the rise of scRNA-seq, researchers now have an opportunity to infer CNVs indirectly through transcriptomic signals. Yet, this task is far from straightforward. Inference methods vary widely in sensitivity, specificity, and robustness, and the lack of systematic evaluation has left researchers guessing which tool to trust. Due to these limitations, a rigorous benchmarking study was needed to guide accurate CNV detection using scRNA-seq.

A multidisciplinary team led by researchers from Loma Linda University, the NCBI, and international partners addressed this knowledge gap with a large-scale benchmarking effort. Their findings (DOI: 10.1093/pcmedi/pbaf011) were published on June 4, 2025, in Precision Clinical Medicine . The team rigorously tested five popular CNV inference methods, HoneyBADGER, inferCNV, sciCNV, CaSpER, and CopyKAT, across diverse scRNA-seq platforms and tumor models, including a newly generated clinical dataset from a small cell lung cancer patient. Their goal: to identify which tools deliver the most accurate and reliable CNV insights in real-world research settings.

Study design of the scCNV benchmark analysis

Study design of the scCNV benchmark analysis. The top panel (A) illustrates the evaluation scheme for sensitivity and specificity of scCNV detection using the scRNA-seq datasets (10x, C1-HT, C1, and ICELL8 full-length) of a breast cancer cell line vs. the paired B-cell line derived from the same donor, which was generated from our previous multicenter benchmarking study. The middle panel (B) illustrates the evaluation scheme for accuracy of subclone identification using the mixed scRNA-seq data from the Tian et al. study derived from a mixture including either three or five human lung adenocarcinoma cell lines. Drop-seq_3cl, scRNA-seq data from the mixed three human lung adenocarcinoma cell lines; CEL-seq2_3cl, scRNA-seq data from the mixed three human lung adenocarcinoma cell lines; 10x_3cl, 10x scRNA-seq data from the mixed three human lung adenocarcinoma cell lines; and 10x_5cl, 10x scRNA-seq data from the mixed five human lung adenocarcinoma cell lines. The lower panel (C) illustrates the application of scCNV methods to a human small cell lung cancer (SCLC) scRNA-seq dataset (20M read/each cell, full-length transcript, SMART-seq2) including 92 primary SCLC single cells and 39 relapse SCLC single cells, plus scWES and bulk cell WGS from primary SCLC and relapsed tumoral tissues as well as peri-tumoral normal tissues.

The top panel (A) illustrates the evaluation scheme for sensitivity and specificity of scCNV detection using the scRNA-seq datasets (10x, C1-HT, C1, and ICELL8 full-length) of a breast cancer cell line vs. the paired B-cell line derived from the same donor, which was generated from our previous multicenter benchmarking study. The middle panel (B) illustrates the evaluation scheme for accuracy of subclone identification using the mixed scRNA-seq data from the Tian et al. study derived from a mixture including either three or five human lung adenocarcinoma cell lines. Drop-seq_3cl, scRNA-seq data from the mixed three human lung adenocarcinoma cell lines; CEL-seq2_3cl, scRNA-seq data from the mixed three human lung adenocarcinoma cell lines; 10x_3cl, 10x scRNA-seq data from the mixed three human lung adenocarcinoma cell lines; and 10x_5cl, 10x scRNA-seq data from the mixed five human lung adenocarcinoma cell lines. The lower panel (C) illustrates the application of scCNV methods to a human small cell lung cancer (SCLC) scRNA-seq dataset (20M read/each cell, full-length transcript, SMART-seq2) including 92 primary SCLC single cells and 39 relapse SCLC single cells, plus scWES and bulk cell WGS from primary SCLC and relapsed tumoral tissues as well as peri-tumoral normal tissues.

The study applied each method to datasets from several distinct scRNA-seq platforms, paired tumor-normal cell lines, artificial mixtures of lung cancer cells, and clinical samples. Performance was measured using criteria such as sensitivity, specificity, and accuracy in identifying tumor subpopulations. CaSpER and CopyKAT consistently delivered the most balanced CNV inference results, though their effectiveness varied with sequencing depth and platform type. In contrast, inferCNV and sciCNV excelled in distinguishing tumor subclones when analyzing data from a single platform.

The team also tested each method’s ability to detect rare tumor populations. InferCNV showed strong sensitivity, especially when enough cells were sequenced, while sciCNV and HoneyBADGER fell short in this regard. When combining datasets across platforms, batch effects severely impacted most methods, unless corrected using tools like ComBat. The allele-based version of HoneyBADGER, although less sensitive overall, proved more resilient to such batch-related distortions. Lastly, validation using clinical small cell lung cancer samples confirmed that CaSpER and CopyKAT yielded the most accurate CNV calls, while inferCNV and CopyKAT best identified relapsed subclones—underscoring the value of method-specific strengths.

“Understanding cancer at the single-cell level is essential for tackling tumor evolution and therapy resistance,” said Prof. Charles Wang, co-corresponding author of the study. “Our benchmarking work provides the field with a clear reference point, highlighting not only which tools work best, but also under what conditions. It’s a step toward more reliable and personalized cancer genomics.”

This study offers a practical guide for selecting CNV inference tools tailored to specific scRNA-seq platforms and research goals. As single-cell genomics transitions from the bench to the clinic, the ability to accurately profile genetic alterations like CNVs is becoming critical for diagnostics, biomarker discovery, and tracking treatment response. These findings may also inform the next generation of algorithm development, aimed at overcoming current limitations such as platform bias and batch variability. Ultimately, better tools mean sharper insights into cancer’s genetic landscape, paving the way for more targeted and effective therapies.”

Source – Newswise

Article by GeneratePress

Lorem ipsum amet elit morbi dolor tortor. Vivamus eget mollis nostra ullam corper. Natoque tellus semper taciti nostra primis lectus donec tortor fusce morbi risus curae. Semper pharetra montes habitant congue integer nisi.

Leave a Comment