- 无标题文档
查看论文信息

论文题名(中文):

 人脑胶质瘤蛋白质组学结合转录组学数据分析及数据库系统建设    

姓名:

 张宁宁    

论文语种:

 chi    

学位:

 硕士    

学位类型:

 专业学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院基础医学研究所    

专业:

 生物医学工程(工)-生物医学工程    

指导教师姓名:

 杨啸林    

校内导师组成员姓名(逗号分隔):

 王志刚 许雪梅    

论文完成日期:

 2018-05-30    

论文题名(外文):

 Human Glioma Proteomics Combined with Transcriptome Data Analysis and Database System    

关键词(中文):

 胶质瘤 蛋白质组学 数据分析 数据平台 转录组学    

关键词(外文):

 Glioma Proteomics Data Analysis Data Platform Transcriptomics    

论文文摘(中文):

胶质瘤是成年人最常见的恶性颅脑肿瘤,分为星型胶质细胞瘤、少突胶质细胞瘤和混合型细胞瘤,病理分级为Ⅰ~Ⅳ级,其中Ⅳ级的胶质母细胞瘤是最常见且恶性程度最高的胶质瘤。传统上胶质瘤的诊断主要是基于其组织学鉴定,但是对于胶质瘤分子特征的研究有望为研究胶质瘤的恶性机制提供新的思路。

随着高通量技术的发展,基因组、转录组和蛋白质组等组学方法逐渐被应用于胶质瘤的分子机制研究中。蛋白质组学分析通过研究大多数生物过程的直接作用者蛋白质,来研究生物学功能。技术的发展与研究的深入,越来越多的蛋白质组学数据随之产生。因此,迫切需要研发相应的技术平台,对数据进行有效的管理、存储、注释、分析及可视化。

本论文主要分为两部分工作:一,对人脑胶质瘤蛋白质组学质谱数据进行生物信息学分析并结合来自TCGA的转录组数据进一步探索与星型胶质细胞瘤发生发展相关的分子机制;二,建设了一个人类胶质瘤蛋白质组学数据库系统,可实现对胶质瘤蛋白质组数据标准化的管理、检索、可视化和实时的生物信息学数据分析。

在第一部分研究中,我们重点研究了胶质母细胞瘤GBM(Ⅳ级)和低级别星型细胞瘤LGA(Ⅱ级)之间的分子差异,以发现与肿瘤恶性程度相关的分子及其机制。我们通过蛋白质组学iTRAQ标记定量质谱实验得到LGA样本和GBM样本的蛋白质表达相对定量值,将数据进行差异性分析,再运用基因本体富集分析与通路富集分析得到相关功能的本体术语和相关的通路。结合TCGA数据库中LGA和GBM的RNA-seq数据,利用转录组与蛋白质组结合的策略,进一步确定胶质瘤发展相关的关键基因,为研究胶质瘤发展相关重要基因和标志物提供有价值的线索。结果显示在蛋白质组实验中共鉴定出3226个蛋白质,其中有42个蛋白在两种样本内表达存在显著差异,GBM样本中上调表达的蛋白有22个,下调的有20个。在转录本数据中,发现差异表达基因1002个,其中上调表达的编码蛋白基因456个,下调表达的编码蛋白基因546个。将两组学结果进行交集分析,发现蛋白质组学与转录组学可得到13个共同的交集差异因子。结论: 可促进免疫反应的BST2、HLA-DRB 1和PSMB9在本研究中上调表达,与髓鞘形成相关的SEZ6L、PLP1、ERMN和MOG在本研究中下调表达,推测胶质瘤的恶性进展可能增强免疫反应,影响髓鞘的生理功能。本研究为后续的研究提供了新思路,为疾病的早期诊断与治疗提供了支持。

为了进行临床研究蛋白质组学数据有效管理和重利用,我们建设了人类胶质瘤蛋白质组数据库系统(Human Glioma Proteome Database System, hgPDS, http://hgPDS.bmicc.cn)。hgPDS可对蛋白质组学数据进行标准化的管理、检索、可视化和实时的生物信息学数据分析。为了实现对人类胶质瘤蛋白质组学元数据和实验数据的有效管理,我们首先制定针对于本系统的元数据标准和术语标准。样本与实验元数据标准分别参考蛋白质组学最小信息标准MIAPE 与PSI-MS 可控词表,临床元数据术语标准化使用国家癌症研究词库NCIT。然后基于此数据标准制定了数据模型,可对多来源数据进行管理,包括临床元数据、实验元数据、实验数据、注释数据四个部分。系统基于Bootstrap 、Java、MySQL 等技术开发,实现了对以上数据进行有效地存储、检索与可视化。在线分析功能使用R语言实现,可以在线进行差异蛋白筛选,实时进行聚类热图分析、基因本体分析和通路分析,发现分子重要的生物学意义。综上,hgPDS实现了对脑胶质瘤蛋白质组学数据的标准化、高效管理、注释、可视化及生物信息学数据等功能,帮助科研人员进行简单高效的自动化数据分析,简化人工操作,提高科研效率。整个系统具有简捷实用、易于查询和访问、可扩展性好以及便于维护等特点。

本研究首先对GBM和LGA之间进行蛋白组与转录组的整合研究,识别了与胶质瘤恶性机制密切关联的13个基因,发现上调基因中有3个基因参与免疫应答反应,下调基因中4个基因影响神经元髓鞘的生长发育。在此基础上对人类胶质瘤蛋白质组数据库系统的建设,提供了数据的标准化存储,并提供了目标数据检索和蛋白质组生物信息分析的工具。

论文文摘(外文):

Glioma is one of the most common malignant brain tumors in adults and is classified to astrocytomas, oligodendrogliomas, and mixed gliomas. The pathological grades are grades I to Ⅳ. Glioblastoma multiforme(GBM) is grade Ⅳ and it is the most common and malignant glioma. The diagnosis of glioma is traditionally based on histological identification, but studies on the molecular characteristics of gliomas are expected to provide new ideas for new diagnostic treatments.

With the development of high-throughput technology, genomics, transcriptomics, proteomics and other omics methods have gradually been applied to the study of the molecular mechanism of glioma. Proteomics analysis focuses on proteins which play a direct role in almost all biological processes. With the development of technology and further research, more and more proteomics data have been accumulated. Therefore, it is necessary to develop a technology platform to standardize management storage, annotation, analysis and visualization of proteomics data.

This research falls into two parts. First, we conducted bioinformatics analysis on human glioma proteomics data and further identification of possible biomarkers in combination with transcriptome data; Second, we established a human glioma proteomics database system, which can be used for data standardized management, retrieval, visualization and online analysis.

As for the first part, we focused on the molecular differences between GBM (grade Ⅳ) and LGA (Grade II), in order to discover the mechanisms related to tumor development. We used proteomics mass spectrometry followed by iTRAQ labeling quantification experiments to obtain the values of protein expression in LGA and GBM. The proteins with significant fold change were selected to do Gene Ontology enrichment anaylsis and pathway enrichment analysis. Combining the RNA-seq data of LGA and GBM in the TCGA database, we used the strategy of combining transcriptome and proteome to further determine the key genes involved in the development of gliomas and provide valuable insights into important biomarkers for the development of gliomas. The results showed that a total of 3226 proteins were identified in the proteome experiments, of which 42 proteins showed significant differences in the two samples. There are 22 up-regulated proteins and 20 down-regulated proteins in the GBM samples. In the transcript data, 1002 differentially expressed genes were found, of which 456 were up-regulated and 546 were down-regulated. By analyzing the results, we found that proteomics and transcriptomics can obtain 13 common genes. Conclusion: BST2, HLA-DRB1 and PSMB9 which can promote immune response were up-regulated in our study. SEZ6L, PLP1, ERMN and MOG associated with myelination were down-regulated in our study, suggesting that the malignant progression of glioma maybe enhancing the immune response, affect the physiological function of myelin. This study provided new ideas for follow-up research and provided support for the early diagnosis and treatment of diseases.

In order to explore the methodology for managing and utilizing massive proteomics data effectively, we built the Human Glioma Proteome Database System (hgPDS, http://hgPDS.bmicc.cn). The system can be used for data standardized management, retrieval, visualization and online analysis. To achieve effective management of human glioma proteomic metadata and experimental data, we developed metadata standards and terminology standards for this system first. The sample and experimental metadata collection conforms to the guidelines in Minimum Information About a Proteomics Experiment (MIAPE) and PSI-MS control vocabularies by Proteome Standard Initiatives (PSI). The clinical metadata are standardized with National Cancer Institute Thesaurus (NCIT). Then based on this data standard, a data model was developed. The data schema is modulized to manage different data types, such as metadata from clinical record, sample and experiment, as well as those of experimental raw and annotated data. The hgPDS is built by using Bootstrap, Java and MySQL technology. Bioinformatics analysis functions are achieved by R. Online data analyses include cluster heatmap, Gene Ontology enrichment and pathway enrichment. In summary, hgPDS achieves the functions of standardization, efficient management, annotation, visualization, and bioinformatics data of glioma proteomics data, helping scientists to analyse data, simplifying manual operations, and improving research efficiency. The system is practical, expandable, easy to be access and easy to be maintain.

In this study, we conducted proteomic and transcriptomic research between GBM and LGA. We identified 13 genes maybe closely related to the malignant mechanism of glioma. And we found that three of the up-regulated genes were involved in the immune response and four of the down-regulated genes affected the growth and development of the myelin sheath of neurons. We also analyzed the pathogenic molecular mechanism. And then we established the human glioma proteome database system, which provides standardized data storage and provides tools for target data retrieval and proteome bioinformatics analysis.

开放日期:

 2018-06-08    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号