Monday, March 30

Empowering AI data scientists using a multi-agent LLM framework with self-evolving capabilities for autonomous, tool-aware biomedical data analyses


  • Agrawal, R. & Prabakaran, S. Big data in digital healthcare: lessons learnt and recommendations for general practice. Heredity 124, 525–534 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Shilo, S., Rossman, H. & Segal, E. Axes of a revolution challenges and promises of big data in healthcare. Nat. Med. 26, 29–38 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Woldemariam, M. T. & Jimma, W. Adoption of electronic health record systems to enhance the quality of healthcare in low-income countries: a systematic review. BMJ Health Care Inform. 30, e100704 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liang, H. et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat. Med. 25, 433–438 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Feinberg, D. A. et al. Next-generation MRI scanner designed for ultra-high-resolution human brain imaging at 7 Tesla. Nat. Methods 20, 2048–2057 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Schuijf, J. D. et al. CT imaging with ultra-high-resolution: opportunities for cardiovascular imaging in clinical practice. J. Cardiovasc. Comput. Tomogr. 16, 388–396 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Metzker, M. L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Karczewski, K. J. & Snyder, M. P. Integrative omics for health and disease. Nat. Rev. Genet. 19, 299–310 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Van de Sande, B. et al. Applications of single-cell RNA sequencing in drug discovery and development. Nat. Rev. Drug Discov. 22, 496–520 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wratten, L., Wilm, A. & Göke, J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat. Methods 18, 1161–1168 (2021.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Cao, Y. et al. Ensemble deep learning in bioinformatics. Nat. Mach. Intell. 2, 500–508 (2020).

    Article 

    Google Scholar
     

  • Dubay, C. et al. Delivering bioinformatics training: bridging the gaps between computer science and biomedicine. Proc. AMIA Symp. 2002, 220–224 (2002).


    Google Scholar
     

  • Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, J. et al. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin. Chem. 48, 1296–1304 (2002).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Fitzgerald, R. C. et al. The future of early cancer detection. Nat. Med. 28, 666–677 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • McDonald, T. O. et al. Computational approaches to modelling and optimizing cancer treatment. Nat. Rev. Bioeng. 1, 695–711 (2023).

    Article 
    CAS 

    Google Scholar
     

  • Misra, B. B. et al. Integrated omics: tools, advances, and future approaches. J. Mol. Endocrinol. 62, R21–R45 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Brooks, T. G. et al. Challenges and best practices in omics benchmarking. Nat. Rev. Genet. 25, 326–339 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • Goecks, J. et al. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, 1–13 (2010).

    Article 

    Google Scholar
     

  • Malhotra, R. et al. Using the seven bridges Cancer Genomics Cloud to access and analyze petabytes of cancer data. Curr. Protoc. Bioinformatics 60, 11.16.1–11.16.32 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kaur, S. & Kaur, S. Genomics with cloud computing. Int. J. Sci. Technol. Res. 4, 146–148 (2015).


    Google Scholar
     

  • Wu, T. et al. A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA J. Autom. Sin. 10, 1122–1136 (2023).

    Article 

    Google Scholar
     

  • Brown, T. B. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).


    Google Scholar
     

  • Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tang, X. et al. MedAgents: large language models as collaborators for zero-shot medical reasoning. Find. Assoc. Comput. Linguist: ACL 2024, 599–621 (2024).


    Google Scholar
     

  • Guo, T. et al. Large language model based multi-agents: a survey of progress and challenges. Proc. Int. Joint Conf. Artif. Intell. 33, 8048–8057 (2024).


    Google Scholar
     

  • Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Boiko, D. A. et al. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dai, T. et al. Autonomous mobile robots for exploratory synthetic chemistry. Nature 635, 890–897 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hou, W. & Ji, Z. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nat. Methods 21, 1462–1465 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lobentanzer, S. et al. A platform for the biomedical application of large language models. Nat. Biotechnol. 43, 166–169 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tayebi Arasteh, S. et al. Large language models streamline automated machine learning for clinical studies. Nat. Commun. 15, 1603 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhou, J. et al. An AI agent for fully automated multi-omic analyses. Adv. Sci. 11, e2407094 (2024).

    Article 

    Google Scholar
     

  • Xiao, Y. et al. CellAgent: an LLM-driven multi-agent framework for automated single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2024.05.13.593861 (2024).

  • Liu, H. & Wang, H. GenoTEX: a benchmark for evaluating LLM-based exploration of gene expression data in alignment with bioinformaticians. Preprint at arXiv https://doi.org/10.48550/arXiv.2406.15341 (2024).

  • Mitchener, L. et al. Bixbench: a comprehensive benchmark for LLM-based agents in computational biology. Preprint at arXiv https://doi.org/10.48550/arXiv.2503.00096 (2025).

  • Gómez-López, G. et al. Precision medicine needs pioneering clinical bioinformaticians. Brief. Bioinform. 20, 752–766 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Hou, X. et al. Large language models for software engineering: A systematic literature review. ACM Trans. Softw. Eng. Methodol. (2023).

  • Xin, Q. et al. BioInformatics Agent (BIA): unleashing the power of large language models to reshape bioinformatics workflow. Preprint at bioRxiv https://doi.org/10.1101/2024.05.22.595240 (2024).

  • Su, H., Long, W. & Zhang, Y. BioMaster: multi-agent system for automated bioinformatics analysis workflow. Preprint at bioRxiv https://doi.org/10.1101/2025.01.23.634608 (2025).

  • Afzal, M. et al. Precision medicine informatics: principles, prospects, and challenges. IEEE Access 8, 13593–13612 (2020).

    Article 

    Google Scholar
     

  • Zhang, W. & Mei, H. A constructive model for collective intelligence. Natl Sci. Rev. 7, 1273–1277 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Qian, C. et al. Iterative experience refinement of software-developing agents. Preprint at arXiv https://doi.org/10.48550/arXiv.2405.04219 (2024).

  • Riffle, D. et al. OLAF: an open life science analysis framework for conversational bioinformatics powered by large language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2504.03976 (2025).

  • Xie, E. et al. CASSIA: a multi-agent large language model for automated and interpretable cell annotation. Nat. Commun. 17, 389 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tsyganov, M. M. et al. Influence of DNA copy number aberrations in ABC transporter family genes on the survival of patients with primary operatable non-small cell lung cancer. Curr. Cancer Drug Targets (2025).

  • Sun, Y. et al. SERINC2-mediated serine metabolism promotes cervical cancer progression and drives T cell exhaustion. Int. J. Biol. Sci. 21, 1361–1377 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, X., Jiang, C. & Li, Q. Serinc2 drives the progression of cervical cancer through regulating Myc pathway. Cancer Med. 13, e70296 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lee, J. S. et al. SEZ6L2 is an important regulator of drug-resistant cells and tumor spheroid cells in lung adenocarcinoma. Biomedicines 8 (2020).

  • Ishikawa, N. et al. Characterization of SEZ6L2 cell-surface protein as a novel prognostic marker for lung cancer. Cancer Sci. 97, 737–745 (2006).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jee, J. et al. DNA liquid biopsy-based prediction of cancer-associated venous thromboembolism. Nat. Med. 30, 2499–2507 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Xu, Z. et al. MiHATP: a multi-hybrid attention super-resolution network for pathological image based on transformation pool contrastive learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2024).

  • Yang, K. et al. If LLM is the wizard, then code is the wand: a survey on how code empowers large language models to serve as intelligent agents. Preprint at arXiv https://doi.org/10.48550/arXiv.2401.00812 (2024).

  • Qian, C. et al. Investigate-consolidate-exploit: A general strategy for inter-task agent self-evolution. Preprint at arXiv https://doi.org/10.48550/arXiv.2401.13996 (2024).

  • Gao, C. et al. Large language models empowered agent-based modeling and simulation: a survey and perspectives. Humanit. Soc. Sci. Commun. 11, 1–24 (2024).

    Article 
    CAS 

    Google Scholar
     

  • Zhong, W. et al. Memorybank: enhancing large language models with long-term memory. Proc. AAAI Conf. Artif. Intell. 38, 19724–19731 (2024).


    Google Scholar
     

  • Liu, L. et al. Think-in-memory: Recalling and post-thinking enable llms with long-term memory. Preprint at arXiv https://doi.org/10.48550/arXiv.2311.08719 (2023).

  • Li, Z. et al. VarBen: generating in silico reference data sets for clinical next-generation sequencing bioinformatics pipeline evaluation. J. Mol. Diagn. 23, 285–299 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yang, Y. et al. Comprehensive landscape of resistance mechanisms for neoadjuvant therapy in esophageal squamous cell carcinoma by single-cell transcriptomics. Signal Transduct. Target. Ther. 8, 298 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 1–5 (2018).

    Article 

    Google Scholar
     

  • Bu, D. et al. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res. 49, W317–W325 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jiang, X. et al. Long term memory: the foundation of AI self-evolution. Preprint at arXiv https://doi.org/10.48550/arXiv.2410.15665 (2024).

  • Sun, J. Biomedical dataset files collection. Zenodo https://doi.org/10.5281/zenodo.17430550 (2025).



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *