- 无标题文档
查看论文信息

论文题名(中文):

 空间分辨代谢组学分析中化学式标注及超分辨率成像方法的研究    

姓名:

 曹英豪    

论文语种:

 chi    

学位:

 博士    

学位类型:

 学术学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院基础医学研究所    

专业:

 生物学-生物化学与分子生物学    

指导教师姓名:

 王琳    

论文完成日期:

 2025-03-05    

论文题名(外文):

 The Research of Formula Assignment and Super-Resolution Imaging Methods in Spatially-Resolved Metabolomics    

关键词(中文):

 空间分辨代谢组学 化学式标注 超分辨率 深度学习 生物信息学    

关键词(外文):

 Spatial Resolved Metabolomics Formula Assignment Super resolution Deep learning Bioinformatics    

论文文摘(中文):

    代谢组学作为系统生物学研究的核心分支,旨在全面解析生物体内代谢物的种类、浓度及其动态变化规律。空间分辨代谢组学(Spatially Resolved Metabolomics, SRM)通过质谱成像(Mass Spectrometry Imaging, MSI)等原位检测技术揭示代谢物在组织、细胞甚至亚细胞水平的空间定位,成为解析生命复杂性的关键技术之一。近年来,尽管基于MSI技术研究代谢物空间分布相关研究越来越多,但其数据分析仍然有一些关键环节尚不成熟。其中,主要难点包括m/z测量值对应的化学式标注不准确、空间分辨率不足,以及现有高通量分析工具不完善等,限制了该技术的发展与应用。

    为此,本研究基于上述三个MSI数据分析难点,分别从三个方面展开研究,开发了一种高效、准确的代谢物化学式标注工具SMART,并系统探究了数种基于深度学习的超分辨率模型,利用极少量MSI图像进行迁移学习,建立了适用于SRM领域的超分辨率算法。在此基础上,通过自主研发,建立了可用于SRM数据分析的MS-Loop集成分析平台。

    化学式标注是代谢组学数据分析的首要问题。常规代谢组学中化学式标注依赖于同位素信息的准确测定及二级质谱数据(MS/MS)采集。针对现阶段SRM无法高通量大规模采集MS/MS和部分同位素信息缺失的难点,SMART通过建立化学式证据数据库,仅以m/z测量值作为输入便可准确进行化学式标注。SMART整合了HMDB、ChEMBL及PubChem等现有代谢物及化学式数据库共计超过280万种化学式,并在此基础上,系统分析了这些化学式之间的差异(Formula shift),得到了8,814条高频潜在化学式差异(ChemEdges)。通过以KEGG生物反应数据库中生物反应为指导,收集了KEGG中1,787条生物反应化学式差异(BioEdges)。以ChemEdges和BioEdges为关联,将280万种化学式进行连接,构建了SMART化学式证据数据库。进一步,SMART从数据库中定义了4种特征组合用于构建多元线性回归模型,从而进行化学式标注。应用多种参考数据集对SMART性能进行评估,结果显示标注准确率达到了92.4%,优于传统的LC-MS等化学式标注工具。利用SMART解析小鼠肾脏、胚胎组织的空间分辨代谢组学数据,成功注释了2,194和986个化学式,并利用LC-MS技术对这些化学式进行了验证从而用于解析组织内代谢物空间分布的异质性。SMART下载链接为https://github.com/bioinfo-ibms-pumc/SMART。

    空间分辨代谢组学中,代谢物的空间表达分布对于揭示生物学机制和疾病标志物至关重要。提高空间分辨率有助于观测到代谢物更加细微的空间变化,而超分辨率技术通过软件算法可提高图像的分辨率。在本研究中,通过对现有基于深度学习进行超分提升的多种模型进行评估,选择了更有潜力的扩散模型超分框架ResShift,利用10张现有小鼠矢状面脑组织质谱成像数据对其预训练模型进行迁移学习,成功完成了模型的微调。相对预训练模型,微调模型性能提升了41.5%,而与MOSR(基于增强生成对抗式网络ESRGAN框架的空间分辨代谢组学超分算法)的对比结果表明,微调模型性能提升了14.0%。另外,利用小鼠水平面脑组织以及小鼠肾脏组织质谱成像数据进行模型的泛化能力评估,结果表明,微调模型相比预训练模型与MOSR模型,表现出了较好的重建性能和良好的泛化能力。该研究充分体现了迁移学习策略的有效性和灵活性,为空间分辨代谢组学分析提供了重要的理论依据和技术支持。通过引入综合权重指标,本研究在评估过程中实现了多目标优化,进一步提升了结果的可信度和全面性。

    最后,在这些工具的基础上,开发了集成分析平台MS-Loop。该平台可用于SRM数据的自动化分析,涉及原始数据格式转换、数据预处理、潜在代谢物的提取及化学式标注等,同时具备差异分析及绘图等功能。

    综上所述,本研究中对于空间分辨代谢组学数据分析的多个技术难点提供了不同解决方案。SMART对空间分辨代谢物的准确鉴定奠定了基础;基于深度学习的扩散模型利用迁移学习策略可有效提高空间分辨率,从而探索空间代谢物的空间表达模式;整合的集成分析MS-Loop将会提供专业、便捷的代谢组学分析工具和流程算法,这些研究成果将进一步推动空间分辨代谢组学的发展。

论文文摘(外文):

    Metabolomics, as a core branch of systems biology, aims to comprehensively characterize the diversity, concentrations, and dynamic regulation of metabolites within biological systems. Spatially resolved metabolomics (SRM) employs in situ techniques such as mass spectrometry imaging (MSI) to map metabolic distributions at tissue, cellular, and subcellular levels, becoming one of the key technologies for deciphering biological complexity. Recently, despite the increasing number of studies focusing on the spatial distribution of metabolites using MSI technology, several critical aspects of data analysis remain underdeveloped. Notably, the main challenges encompass inaccurate m/z to chemical formula annotations, insufficient spatial resolution, and limited analytical toolkits, which collectively hinder technological advancement and practical implementation.

    Therefore, addressing the aforementioned three challenges in MSI data analysis, this study embarked on research from three perspectives, developing an efficient and accurate metabolite chemical formula annotation tool named SMART. Furthermore, various deep learning-based super-resolution models were systematically investigated, followed by employing a minimal number of MSI images for transfer learning to establish a super-resolution algorithm tailored for the SRM domain. Building on this foundation, an integrated analysis platform named MS-Loop for SRM data analysis was established through independent research and development.

    Chemical formula annotation stands as the foremost challenge in metabolomics data analysis. In conventional metabolomics, accurate annotation is dependent on precise isotopic information determination and acquisition of tandem mass spectrometry (MS/MS) data. Addressing the current limitations of SRM in achieving high-throughtput, large-scale MS/MS acquisition and the partial absence of isotopic information, SMART has established a comprehensive chemical formula evidence database that facilitates precise annotation solely based on m/z measurement values as input. SMART integrates existing metabolite and chemical formula databases such as HMDB, ChEMBL, and PubChem, encompassing over 2.8 million chemical formulas. Building upon this, the system analyzes the differences (Formula shifts) between these formulas and identifies 8,814 high-frequency potential chemical formula shifts (ChemEdges). Guided by biochemical reactions from the KEGG reaction database, SMART also collects 1,787 biochemical reaction-based formula shifts (BioEdges) from KEGG. Using ChemEdges and BioEdges as connections, the 2.8 million chemical formulas are interlinked to construct the comprehensive SMART chemical formula evidence database. Furthermore, SMART defines four feature combinations from the database to build a multiple linear regression model for chemical formula annotation. Evaluation using various reference datasets demonstrates that SMART achieves an annotation accuracy of 92.4%, outperforming traditional LC-MS-based formula annotation tools. We applied SMART to spatially resolved metabolomics data from mouse kidney and embryonic tissues, successfully annotating 2,194 and 986 chemical formulas, respectively. These annotations were validated using LC-MS to elucidate the spatial heterogeneity of metabolite distribution within tissues. The download link for SMART is: https://github.com/bioinfo-ibms-pumc/SMART.

    Spatial resolution is critical for revealing metabolic pathways and disease biomarkers. Improving spatial resolution helps to observe more subtle spatial changes in metabolites. We systematically evaluated existing deep learning super-resolution models and selected the diffusion-based framework ResShift for further development. Transfer learning using 10 MSI images from mouse brain coronal section enabled model fine-tuning, achieving 41.5% performance improvement over pre-trained models. Comparative analysis with MOSR (an ESRGAN-based super-resolution algorithm) revealed 14.0% superior performance. Generalization testing on mouse brain horizontal sections and renal tissues demonstrated consistent reconstruction quality and robust cross-tissue adaptability. This study validates transfer learning's efficacy in SRM, providing both theoretical foundations and practical tools for high-resolution imaging analysis. Our multi-objective optimization strategy, incorporating integrated weighting metrics, ensures enhanced result reliability and interpretability.

    Building on these technological advancements, we present MS-Loop, a comprehensive SRM data analysis pipeline supporting automated processing from raw data conversion to final visualization. The platform encompasses functions including file format conversion, data preprocessing, feature extraction, chemical annotation, differential analysis, and interactive visualization, establishing a one-stop solution for SRM researchers. In summary, this study provides different solutions to the multiple technical difficulties in spatially resolved metabolomics data analysis, aiming to provide professional and convenient analysis tools and algorithms to further promote the development of spatially resolved metabolomics.

开放日期:

 2025-06-03    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号