论文题名(中文): | 甘草属植物基因组进化及甘草酸差异分布机制研究 |
姓名: | |
论文语种: | chi |
学位: | 博士 |
学位类型: | 学术学位 |
学校: | 北京协和医学院 |
院系: | |
专业: | |
指导教师姓名: | |
论文完成日期: | 2025-05-21 |
论文题名(外文): | Genomic evolution and mechanisms of glycyrrhizic acid differential distribution in Glycyrrhiza species |
关键词(中文): | |
关键词(外文): | Glycyrrhiza comparative genomics glycyrrhizic acid CYP88D6 functional validation |
论文文摘(中文): |
甘草是我国传统药材,活性成分丰富,甘草酸为其主要的药效成分之一,具有抗炎、抗癌、抗病毒、心脏保护等多重活性。甘草酸在甘草属物种间分布存在差异。甘草属分为“甘草组”和“刺果甘草组”,其中甘草组的物种含有甘草酸,刺果甘草组的物种不含甘草酸。目前甘草酸的生物合成途径已经被完全解析,但甘草酸在甘草属物种间分布差异的分子机制还尚未阐明。本研究基于多组学联合分析,系统解析甘草属CYP88D亚家族基因的表达模式和催化功能,阐明甘草酸在甘草属物种间分布差异的分子机制。主要研究内容及结果如下: 1. 基于叶绿体基因组的甘草属遗传背景和系统发育分析 叶绿体基因组的比较基因组学结果表明,甘草属叶绿体基因组普遍缺乏反向重复区,基因组长度、结构、GC含量、密码子使用和基因分布高度保守。结合mVISTA和核苷酸多样性的结果,筛选了accD-psaI、ndhD-ccsA、rpl36-rps8和rrn5-trnR-ACG这4个高变区作为候选片段用于物种鉴定和亲缘关系研究。选择压力分析表明,甘草属叶绿体基因组存在整体的负向选择,少数正向选择基因与环境适应有关。系统发育分析结果表明,豆科的物种形成了不同的聚类分支,与6个亚科的分类方案一致。对于豆科物种分化的时间顺序,首先分化的是甘豆亚科,其次为紫荆亚科、山姜豆亚科、酸榄豆亚科、蝶形花亚科和云实亚科;蝶形花亚科中包括甘草属在内的反向重复缺失分支是最后分化的。豆科叶绿体基因组共线性分析证实了甘草属叶绿体基因组的保守性,而蝶形花亚科与其它亚科相比较存在基因重排和倒置事件。 2. 甘草属6个物种染色体级别基因组组装及进化分析 本研究对甘草属共6个物种进行了染色体级别的基因组组装,最终确定甘草属基因组组装大小约为373.08~433.02 Mb,基因组组装contig N50长度约为18.53~30.72 Mb,scaffold N50为49.43~58.42 Mb,BUSCO检测基因组组装完整度达97.9%~99.4%。基因组注释结果显示共有36,157~39,758个蛋白编码基因被注释,BUSCO检测基因组注释完整度达95.6%~98.2%。通过染色体种间比对,发现同源染色体中有着高水平的共线性,在部分染色体中发现倒置现象。构建了一个包含6个物种的甘草属泛基因组,该泛基因组由18,731个核心基因家族、2,870个软核心基因家族、11,272个非必需基因家族以及1,471个物种特异性基因家族组成。以甘草基因组为参考,通过比较甘草属物种间基因组变异,共发现35,708,906个单核苷酸多态性,8,104,361个小片段插入缺失,以及381,499个结构变异。全基因组复制分析显示甘草属内存在两次复制事件,分别是核心被子植物γ复制事件和与其他蝶形花亚科物种共同发生的一次复制事件,在甘草基因组中未观察到近期的全基因组复制事件。 3. CYP88D亚家族基因表达模式分析及功能验证 通过对6个甘草属基因组的分析,鉴定出甘草酸生物合成酶编码基因,其中CYP88D6基因在甘草组和刺果甘草组中的分布和表达存在差异,其它基因的拷贝数以及表达模式在甘草属物种中高度保守,且主要在根部高表达,与甘草酸主要积累于根部相符。研究发现CYP88D6基因在甘草组中有2个拷贝(CYP88D6/CYP88D6-2),且均在根部高表达;而在刺果甘草组中只有1个低表达的拷贝。接着对甘草属中CYP88D亚家族基因进行鉴定并构建系统发育树。根据与CYP88D6氨基酸序列相似度可将此进化树分为3支,即CYP88D6/CYP88D6-2,CYP88D15以及CYP88D16。通过酵母表达体系验证CYP88D亚家族基因的催化功能,其中CYP88D6/CYP88D6-2可催化β-香树脂醇C-11位两步氧化,生成酮基,产生11-氧代-β-香树脂醇;CYP88D15对C-11位只具有第一步催化功能,生成羟基,产生11α-羟基-β-香树脂醇;而CYP88D16对C-11位无催化功能。基于遗传转化体系对CYP88D6进行体内功能验证,在刺果甘草组的云南甘草中分别过表达CYP88D6基因GyCYP88D6、GpCYP88D6和GuCYP88D6,使原本不产生甘草酸的云南甘草成功产生了甘草酸,证实了刺果甘草组不能形成甘草酸是由于CYP88D6基因在体内低表达所致。进一步研究发现,CYP88D6基因在甘草组和刺果甘草组体内表达差异并非由启动子活性差异引起。 4. CYP88D6基因进化分析 CYP88D6的进化分析结果显示CYP88D6同源基因只存在于蝶形花亚科中的野豌豆超族和崖豆藤超族中,甘草属的CYP88D6和CYP88D15应为旁系同源基因,由其祖先基因分化而来,而CYP88D6基因又在甘草组中发生了一次复制。甘草属中的CYP88D6基因可以催化β-香树脂醇C-11氧化推测是在基因分化后产生的新功能。选择压力分析表明所有CYP88D6同源基因是在纯化选择的压力下进化的。 本研究首次发布光果甘草、胀果甘草、刺果甘草、圆果甘草、云南甘草的染色体级别基因组并构建甘草属泛基因组,有助于研究甘草属物种遗传多样性的进化,为甘草属植物资源的可持续利用提供了分子工具;发现CYP88D亚家族基因在甘草属内的功能分化,为解析植物中齐墩果烷型五环三萜类化合物多样性及生物合成途径提供新证据;阐明甘草酸在甘草属内分布差异的分子机制,助力甘草种质创新及分子育种实践,为甘草属药用植物资源的科学及合理利用提供理论支撑。 |
论文文摘(外文): |
Glycyrrhiza species, traditional Chinese medicinal herb, are renowned for their rich array of bioactive compounds, among which glycyrrhizic acid stands out due to its significant pharmacological activities, including anti-inflammatory, anticancer, antiviral, and cardioprotective effects. Notably, the distribution of glycyrrhizic acid varies among Glycyrrhiza species. The Glycyrrhiza genus is taxonomically divided into sect. Glycyrrhiza and sect. Pseudoglycyrrhiza, and glycyrrhizic acid is found only in sect. Glycyrrhiza but is absent in sect. Pseudoglycyrrhiza. Although the biosynthetic pathway of glycyrrhizic acid has been fully elucidated, the molecular mechanisms underlying its differential distribution among Glycyrrhiza species remain unclear. This study employs integrated multi-omics analyses to systematically investigate the expression patterns and catalytic functions of CYP88D subfamily genes in Glycyrrhiza, aiming to elucidate the molecular basis for the differential distribution of glycyrrhizic acid across species. The main research contents and findings are as follows: 1. Chloroplast genome-based analysis of genetic background and phylogenetic relationships in the genus Glycyrrhiza Comparative genomics of chloroplast genomes revealed that Glycyrrhiza species generally lack inverted repeat (IR) regions, and exhibit highly conserved genome sizes, structures, GC content, codon usage, and gene distribution. Four high-variability regions (accD-psaI, ndhD-ccsA, rpl36-rps8, and rrn5-trnR-ACG) were identified for species identification and phylogenetic analysis. Selective pressure analysis indicated overall purifying selection, with a few positively selected genes potentially related to environmental adaptation. The results of phylogenetic analysis and divergence time estimation showed that species from the six subfamilies formed distinct clusters, consistent with the classification scheme of the six subfamilies. The subfamily Detarioideae was the first to diverge, followed by the subfamily Cercidoideae, subfamily Duparquetioideae, subfamily Dialioideae, subfamily Papilionoideae, and subfamily Caesalpinioideae. The inverted repeat-lacking clade of the subfamily Papilionoideae, including Glycyrrhiza, was the last to diverge. Comparative synteny analysis revealed conserved chloroplast genomes within Glycyrrhiza, with gene rearrangements and inversions observed in Papilionoideae compared to other subfamilies. 2. Chromosome-level genome assembly and evolutionary analysis of six Glycyrrhiza species Chromosome-level genome assemblies were completed for six Glycyrrhiza species, with genome sizes ranging from 373.08 to 433.02 Mb. The contig N50 lengths ranged from 18.53 to 30.72 Mb, and scaffold N50 lengths from 49.43 to 58.42 Mb. BUSCO analysis indicated genome assembly completeness between 97.9% and 99.4%. Genome annotation identified 36,157 to 39,758 protein-coding genes, with annotation completeness between 95.6% and 98.2%. Inter-species chromosomal comparisons revealed high levels of synteny among homologous chromosomes, with inversions detected in certain chromosomes. A pan-genome comprising 18,731 core gene families, 2,870 soft-core gene families, 11,272 dispensable gene families, and 1,471 species-specific gene families was constructed. Using G. uralensis as a reference, comparative genomic analyses identified 35,708,906 single nucleotide polymorphisms, 8,104,361 small insertions and deletions, and 381,499 structural variations among Glycyrrhiza species. Whole-genome duplication analysis revealed two duplication events: an ancient γ event shared by core angiosperms and a more recent event shared with other Papilionoideae species. No recent whole-genome duplication events were observed in Glycyrrhiza genomes. 3. Expression pattern analysis and functional validation of CYP88D subfamily genes Through the analysis of six Glycyrrhiza genomes, all enzyme genes involved in the glycyrrhizic acid biosynthetic pathway were identified. Among them, CYP88D6 exhibited differences in distribution and expression between sect. Glycyrrhiza and sect. Pseudoglycyrrhiza, whereas the copy number and expression patterns of other genes were highly conserved across Glycyrrhiza species. These genes were predominantly expressed in the roots, consistent with the major accumulation site of glycyrrhizic acid. The study revealed that CYP88D6 underwent duplication in sect. Glycyrrhiza, resulting in two copies that were both highly expressed in the roots (CYP88D6/CYP88D6-2). In contrast, sect. Pseudoglycyrrhiza retained only a single, lowly expressed copy, which may explain why these species are unable to produce glycyrrhizic acid. Phylogenetic analysis of CYP88D subfamily genes categorized them into three clades based on amino acid sequence similarity to CYP88D6: CYP88D6/CYP88D6-2, CYP88D15, and CYP88D16. Yeast expression systems were employed to validate the catalytic functions of CYP88D subfamily genes. CYP88D6/CYP88D6-2 catalyzed the two-step oxidation at the C-11 position of β-amyrin, producing 11-oxo-β-amyrin. CYP88D15 facilitated only the first-step hydroxylation at C-11, yielding 11α-hydroxy-β-amyrin, while CYP88D16 lacked catalytic activity at C-11. In vivo functional validation involved overexpressing GyCYP88D6, GpCYP88D6, and GuCYP88D6 in G. yunnanensis, a species from sect. Pseudoglycyrrhiza that does not naturally produce glycyrrhizic acid. This genetic transformation enabled glycyrrhizic acid production, confirming that the absence of glycyrrhizic acid in sect. Pseudoglycyrrhiza is due to low in vivo expression of the CYP88D6 gene. Further investigation revealed that the differential expression of CYP88D6 between the two sections was not attributed to promoter activity differences, suggesting the involvement of other regulatory mechanisms. 4. Evolutionary analysis of the CYP88D6 gene Evolutionary analysis of CYP88D6 revealed that its homologs are exclusively present in Supertr. Fabodae and Supertr. Millettiodae of the Papilionoideae subfamily. In Glycyrrhiza, CYP88D6 and CYP88D15 were identified as paralogous genes, originating from the divergence of an ancestral gene, with CYP88D6 undergoing an additional duplication event specifically in sect. Glycyrrhiza. The catalytic function of CYP88D6 in the oxidation of β-amyrin at the C-11 position appears to be a newly acquired trait following gene divergence. Selection pressure analysis indicated that all CYP88D6 homologs evolved under purifying selection constraints. This study is the first to release the chromosome-level genome of G. glabra, G. inflate, G. pallidiflora, G. squamulose, and G. yunnanensis, which facilitates the study of genetic diversity and evolution in Glycyrrhiza species and offers molecular tools for the sustainable utilization of Glycyrrhiza plant resources. The discovery of functional divergence within the CYP88D subfamily in Glycyrrhiza provides new evidence for understanding the diversity and biosynthetic pathways of oleanane-type triterpenoids in plants. Elucidating the molecular mechanisms underlying the differential distribution of glycyrrhizic acid in Glycyrrhiza species supports germplasm innovation and molecular breeding practices, offering theoretical foundations for the scientific and rational utilization of medicinal plant resources in the genus. |
开放日期: | 2025-06-10 |