| 论文题名(中文): | 两种姜黄种质转录组和代谢组比较研究 |
| 姓名: | |
| 论文语种: | chi |
| 学位: | 硕士 |
| 学位类型: | 学术学位 |
| 学校: | 北京协和医学院 |
| 院系: | |
| 专业: | |
| 指导教师姓名: | |
| 校内导师组成员姓名(逗号分隔): | |
| 校外导师组成员姓名(逗号分隔): | |
| 论文完成日期: | 2023-05-15 |
| 论文题名(外文): | A comparative study of the transcriptome and metabolome of two turmeric germplasms |
| 关键词(中文): | |
| 关键词(外文): | turmeric (Curcuma.longa.L.) variety curcumin untargeted metabolomics transcriptome SNP/InDel transcription factor. |
| 论文文摘(中文): |
中文摘要 姜黄(Curcuma longa L.)是一种多年生草本植物,属于姜科姜黄属。其富含姜黄素和挥发油类化合物,是一种药食两用的功能性植物,具有极高的药用价值和经济价值。由于种源混杂、品种不纯等原因造成不同品种姜黄的质量参差不齐,姜黄素含量差异大等问题。因此,探究不同姜黄种质姜黄素类成分含量差异的形成机制,培育姜黄素含量高、产量大、适应性强的姜黄新品种,是姜黄种质研究的重点之一。本研究选取姜黄素含量差异显著的两个姜黄种质“GH”(高姜黄素含量)和“BD”(低姜黄素含量)为研究材料,利用UPLC-Q-TOF-MS/MS方法分析姜黄根茎中代谢物变化,同时应用RNA-Seq测序技术分析姜黄根茎的转录组信息,挖掘与姜黄素形成相关的关键酶基因及调控因子,分析其在姜黄素合成中的作用机制。主要研究结果如下: 1.采用非靶向代谢组学方法对同一采收期两个姜黄种质根茎中的代谢产物进行鉴定共鉴别出42个化合物。其中33个为姜黄素类化合物,9个为挥发油类化合物。运用多元统计分析,建立OPLS-DA模型,对两个姜黄种质代谢产物进行特异性分析。共筛选到了31个显著差异代谢物,在这些显著差异代谢物中,有其中28个化合物为姜黄素类化合物,除了1,7-bis(3,4-dimethoxyphenyl)hept-1ene-3,5-dione化合物,其余以姜黄素、二氢姜黄素、二氢去甲氧基姜黄素、四氢双去甲氧基姜黄素、去甲氧基姜黄素为代表的姜黄素类化合物含量在GH中上调:α-姜烯、匙叶桉油烯醇、α-姜黄烯的含量在GH型中下调。 2.利用高通量测序技术对GH和BD型姜黄进行转录组测序,共获得132.50Gb数据,各样品Clean Data均达到11.61Gb,Q30碱基百分比在94.34%及以上,各样品的Reads与参考基因组的比对效率在69.44%~82.79%之间。富集得到15 885个差异基因,其中12 597个在GO数据库具有功能定义,5 577个注释到了KEGG数据库。KEGG数据库功能注释关联到216条代谢途径,姜黄素类物质代谢相关途径主要有苯丙烷类合成、二苯乙烯类合成、二芳基庚烷类合成和姜辣素合成,筛选到姜黄素类化合物生物合成的相关差异表达结构基因49个,涉及姜黄挥发油化合物合成的甲羟戊酸途径和甲基赤藓醇-4-磷酸途径的差异表达基因为46个。从49个姜黄素类代谢相关差异表达基因中随机选取8个候选基因进行实时荧光定量PCR,结果表明与RNA测序基因表达模式的变化趋势整体保持一致,说明RNA测序结果准确可靠,可以根据测序结果确定姜黄中姜黄素类化合物合成相关基因动态变化并进行品质形成相关分子机制的研究。 3. 基于转录组数据对姜黄中出现的SNP/InDel数据进行统计分析表明,10个样品中SNP位点总数最低为579 269个(GH1),最高为1 098 623个(BD4),平均为779 119个。对挖掘到的SNP/InDel-Unigenes进行功能注释,发现姜黄素生物合成路径所涉及到的15个关键酶基因,这些基因为苯丙烷代谢途径中的关键基因,包括(C4H、CCOMT、COMT、CURS、HCT、PAL)。15个基因序列中都出现了InDel,其中7个既有SNP又有InDel。这些关键SNP /Indel位点很可能是造成不同品种姜黄中姜黄素含量差异的关键,为挖掘姜黄功能基因内的SNP/InDel位点作为分子标记提供理论依据。 4. 构建MYB转录因子系统发育树发现CurMYB24和CurMYB29与玉米调控苯丙烷生物合成的ZmMYB31亲缘关系比较近,推测转录因子MYB中CurMYB24和CurMYB29参与调控GH中姜黄素类化合物合成并且其在GH姜黄中表达量显著差异,可能参与了GH中对姜黄素合成途径调控。对CurMYB24和CurMYB29进行生物信息学分析,预测CurMYB24基因编码氨基酸数量234个,CurMYB29基因编码氨基酸数量213个。CurMYB24和CurMYB29编码蛋白都具有4个SANT保守结构域,为R2R3-MYB型转录因子特有结构,推测其为R2R3型转录因子。两者都属于亲水性蛋白,定位在细胞核中。 |
| 论文文摘(外文): |
ABSTRACT Curcuma longa L. is a perennial herbaceous plant, belonging to the ginger family. It is rich in curcumin and volatile oil compounds, making it a functional plant with high medicinal and economic value. However, due to problems such as source mixing and impure varieties, there is a discrepancy in the quality of different Curcuma longa L. varieties, as well as differences in curcumin content. Therefore, studying the mechanism of differences in curcumin content among different Curcuma longa L. germplasms, cultivating new varieties with high curcumin content, large yield, and strong adaptability is one of the key focuses of Curcuma longa L. germplasm research. In this study, two Curcuma longa L. germplasms with significant difference in curcumin content, “GH” (high curcumin content) and “BD” (low curcumin content), were selected as research materials. The UPLC-Q-TOF-MS/MS method was used to analyze the metabolic changes in Curcuma longa L. rhizomes, and RNA-Seq sequencing technology was utilized to analyze the transcriptome information of the Curcuma longa L. rhizomes. Key enzyme genes and regulatory factors related to the formation of curcumin were identified, and their role in the synthesis of curcumin was analyzed. The main research results are as follows: 1.Non-targeted metabolomics methods were used to identify and distinguish 42 compounds in the rhizomes of two Curcuma longa L. germplasms collected at the same harvest time. Among them, 33 were curcumin compounds and 9 were volatile oil compounds. Multivariate statistical analysis was used to establish an OPLS-DA model to analyze the specific metabolites of the two Curcuma longa L. germplasms. A total of 31 significantly different metabolites were screened out, including 28 curcumin compounds such as curcumin, dihydrocurcumin, dihydrodemethoxycurcumin, tetrahydrocurcumin, and demethoxycurcumin whose contents were up-regulated in “GH” except for the compound 1,7-bis(3,4-dimethoxyphenyl)hept-1ene-3,5-dione. The content of α-zingiberene, cubenol, and α-turmerone were down-regulated in “GH”. 2. High-throughput sequencing technology was used to perform transcriptome sequencing of “GH” and “BD” turmeric germplasms, obtaining a total of 132.50Gb data, with clean data reaching 11.61Gb for each sample, and Q30 base percentage above 94.34%. The efficiency of the reads alignment to the reference genome was between 69.44% and 82.79% among the samples. A total of 15 885 differentially expressed genes were enriched, of which 12 597 had functional definitions in the GO database, and 5 577 were annotated to the KEGG database. The KEGG database function annotation is associated with 216 metabolic pathways. Curcumin-related metabolic pathways mainly included phenylpropane synthesis, stilbene synthesis, diarylheptanoid synthesis and curcuminoid synthesis. A total of 49 differentially expressed structural genes related to curcumin biosynthesis were screened out. The differentially expressed genes involved in the pathways of malate and mevalonic acid-4-phosphate for the synthesis of turmeric volatile oil compounds were 46. Eight candidate genes randomly selected from the 49 differentially expressed genes related to curcumin metabolism were quantified by real-time fluorescence quantitative PCR, and the results showed that the expression pattern changes were consistent with the RNA-Seq gene expression pattern, indicating that the RNA sequencing results were accurate and reliable. The dynamic changes of specific genes related to the biosynthesis of curcumin in turmeric can be determined based on the sequencing results for studying the molecular mechanism of quality formation. 3. Transcriptomic data analysis showed that the total number of SNP sites in turmeric ranged from 579 269 (in “GH1”) to 1 098 623 (in “BD4”), with an average of 779 119 among the 10 samples. Functional annotation was performed on the SNP/InDel-Unigenes mined, which identified 15 key enzyme genes involved in the biosynthesis pathway of curcuminoids, including C4H, CCOMT, COMT, CURS, HCT, and PAL, which are all critical genes in the phenylpropanoid metabolism pathway. InDel occurred in the sequences of all 15 genes, with seven of them harboring both SNPs and InDels. These key SNP/Indel sites are likely to be responsible for differences in curcuminoid content in different turmeric varieties and provide a theoretical basis for using SNP/InDel sites within functional genes as molecular markers for turmeric quality research. 4. The phylogenetic tree constructed using MYB transcription factors revealed that CurMYB24 and CurMYB29 are closely related to ZmMYB31, which regulates phenylpropane biosynthesis in Maize. It is speculated that CurMYB24 and CurMYB29 participate in regulating the synthesis of curcuminoids in Curcuma longa L., with their expression levels differing significantly among different lines. They may be involved in regulating the curcumin synthesis pathway in Curcuma longa L.. Bioinformatics analysis of CurMYB24 and CurMYB29 predicted that CurMYB24 encodes a protein with 234 amino acids, while CurMYB29 encodes a protein with 213 amino acids. Both CurMYB24 and CurMYB29 encode proteins with four SANT domains, which are conserved structures specific to R2R3-MYB transcription factors, suggesting that they are R2R3-type transcription factors. Both belong to hydrophilic proteins and are localized in the cell nucleus. |
| 开放日期: | 2023-06-13 |