Effectively Preserving Biological Variations in Multi-Batch and Multi-Condition Single-Cell Data Integration

Authors

DOI:

https://doi.org/10.4208/csiam-ls.SO-2025-0025

Keywords:

Biological variations, data integration, batch correction, deep learning

Abstract

Understanding phenotypic differences at the cell level is critical for comprehending the underlying pathogenesis of related complex diseases. However, the biological variations are obscured by batch effects, posing a challenge for integrating multi-batch and multi-condition single-cell datasets. Here, we present scFLASH, a deep learning-based model specially designed to explore single-cell biological variations while correcting undesired batch effects. scFLASH employs a conditional variational autoencoder with adversarial training to separate biological variations from technical noise and introduces a penalized condition classifier to preserve condition-specific biological signals. Through comprehensive benchmarking evaluations, scFLASH shows superior integration performances compared to other state-of-the-art methods. Applied to datasets such as Alzheimer’s disease, COVID-19, and diabetes, we demonstrate that scFLASH is applicable to various scenarios, effectively integrating datasets with two or more conditions and different batch sources. scFLASH can enhance the gene expression profiles and identify the condition-related cell subpopulations, facilitating downstream analyses and offering biological insights into the cellular mechanisms of disease pathology.

Author Biographies

  • Qingbin Zhou

    School of Mathematics, Shandong University, Jinan 250100, China

  • Tao Ren

    State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China

    School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

  • Fan Yuan

    School of Mathematics and Information Science, Yantai University, Yantai 264005, China

  • Jiating Yu

    School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China

  • Jiacheng Leng

    Zhejiang Lab, Hangzhou 311121, China

  • Jiahao Song

    School of Mathematics, Shandong University, Jinan 250100, China

  • Duanchen Sun

    School of Mathematics, Shandong University, Jinan 250100, China

    Shandong Key Laboratory of Cancer Digital Medicine, Jinan 250033, China

  • Ling-Yun Wu

    State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China

    School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

Downloads

Published

2025-12-01

Abstract View

  • 44

Pdf View

  • 22

Issue

Section

Research Articles

How to Cite

Effectively Preserving Biological Variations in Multi-Batch and Multi-Condition Single-Cell Data Integration. (2025). CSIAM Transactions on Life Sciences. https://doi.org/10.4208/csiam-ls.SO-2025-0025