- 无标题文档
查看论文信息

论文题名(中文):

 五倍体绞股蓝基因组的解析及达玛烯二醇-II 12-羟化酶的进化研究    

姓名:

 张楚熠    

论文语种:

 chi    

学位:

 硕士    

学位类型:

 学术学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院药用植物研究所    

专业:

 中药学-中药学    

指导教师姓名:

 李滢    

校内导师组成员姓名(逗号分隔):

 孙超 沈晓凤    

论文完成日期:

 2025-03-30    

论文题名(外文):

 Analysis of the Pentaploid Gynostemma pentaphyllum Genome and Evolutionary Study of Dammarenediol-II 12-Hydroxylase    

关键词(中文):

 绞股蓝 多倍体基因组 20S-原人参二醇 细胞色素P450酶 趋同进化    

关键词(外文):

 Gynostemma pentaphyllum Polyploid genome Cytochrome P450 20S-protopanaxadiol Convergent evolution    

论文文摘(中文):

绞股蓝是一味用药历史悠久的传统中药,其基源植物为葫芦科多年生藤本绞股蓝[Gynostemma pentaphyllum(Thunb.)Makino]。现代临床研究表明,绞股蓝提取物具有抗癌、抗炎、降血脂、免疫调节等多种生物活性,拥有极高的药用价值。到目前为止,绞股蓝属是除人参属植物外,唯一报道产生20S-原人参二醇型皂苷的植物类群。此类皂苷的苷元由达玛烯二醇-II的C12位羟化而来。本研究针对富含此类型皂苷的绞股蓝江西株系个体,采用三代测序及组装方法搭建高质量染色体级别基因组;根据基因组、转录组以及进化树筛选并鉴定了一个编码达玛烯二醇-II 12-羟化酶的细胞色素P450酶(CYP450)基因;根据对已鉴定功能的达玛烯二醇-II 12-羟化酶在进化树中临近序列的基因克隆与功能验证,确定了出现在绞股蓝中的此酶具有物种特异性,补充并完善了绞股蓝与人参达玛烯二醇-II 12-羟化酶趋同进化的理论。具体结果如下:

1、确定富含20S-原人参二醇型皂苷的绞股蓝江西测序株为五倍体植株,测序并组装完成首个单倍型解析的五倍体基因组,形成了一套适用于多倍体基因组组装的流程。利用有丝分裂旺盛的根尖细胞进行核型分析,发现共存在55条染色体,根据此前报道的单倍型染色体条数为11,确定了绞股蓝江西株系为五倍体植株(2n=5x=55)。利用PacBio HiFi测序及高通量染色质构象捕获(HiC)技术,结合比较了多种Contig组装和染色体级别组装软件及策略,形成了一套目前最适用于绞股蓝多倍体组装的流程,并成功构建了单倍型解析的五倍体绞股蓝基因组,将其命名为GpJX。基因组总大小约为2.48 Gb。基因组注释得到 106,794个蛋白编码基因以及19,609个tRNA、21,552个rRNA、603个miRNA与2,662个snRNA。而对重复序列的注释发现基因组70.39 %的序列由转座元件(TE)序列组成。从完整性、连续性及准确性等方面的评估表明GpJX质量较高,同样验证了组装流程的适配性。根据TE在每条染色体的特性分布等推测GpJX为同源五倍体。

2、研究了葫芦科物种的系统发育关系,并重构祖先核型,与二倍体T2T基因组(GpT2T)的比较分析揭示了丰富的结构变异,以及不同倍性绞股蓝着丝粒区域DNA的快速进化,推测这也是倍性变异的主要驱动力。利用最新测序的葫芦科代表物种,构建了葫芦科系统发育树,并通过同义突变率(Ks)的比较揭示苦瓜是这些物种中从祖先分化后进化最慢的物种,保留了更多的祖先染色体遗迹。以其为参考物种,重构的葫芦科共祖核型数为14,并展示了进化过程中的染色体融合断裂事件。不同倍性的绞股蓝基因组比较分析发现长末端重复反转座子(LTR-RT)插入事件发生时间较近,插入爆发时间的细微差别可能与各自地理环境压力下的适应性进化有关。结构变异比较发现GpJX与GpT2T之间存在大量的染色倒位变异,集中在低基因密度、高TE密度区域。一些大片段倒位主要集中在GpT2T着丝粒对应的区域中,表明着丝粒存在丰富变异。而经过对T2T着丝粒区域以及江西株部分着丝粒区域分析发现,着丝粒区域的重复单元序列之间的确存在有较大的变异,表现为DNA快速进化。绞股蓝两个倍性中着丝粒串联重复数量对比拟南芥等极少,在串联重复单元构成的高阶重复之间往往近期被大量LTR-RT插入,这可能会影响着丝粒的稳定性。因此推测着丝粒区域发生的结构重排等显著变异可能是绞股蓝种内倍性多样性的驱动力之一。

3、根据结构域搜索以及CYP450酶的进化树筛选并鉴定了一个达玛烯二醇-II 12-羟化酶CYP88G2,并通过对其所在进化枝临近序列的功能表征及分子对接结果表明绞股蓝与人参达玛烯二醇-II 12-羟化酶在各自谱系演化过程中独立招募了不同的关键氨基酸位点趋同进化而来。利用富含20S-原人参二醇的GpJX基因组结构域搜索CYP450酶,并构建了CYP716和CYP88家族序列的系统发育树,结合转录组数据挑选了5个CYP450候选基因。经过对5个基因的克隆,成功克隆得到Gpjx10G001898与Gpjx10G001842,后续将其构建至植物表达载体,并采用农杆菌介导法转化至本氏烟草叶片中,提取产物进行GC-MS 检测。结果显示,与对照相比,Gpjx10G001898样品组检测到了新产物。经与标准品的色谱和质谱数据比对,该化合物被鉴定为20S-原人参二醇,而Gpjx10G001842样品组未能检测到此化合物,表明Gpjx10G001898编码了达玛烯二醇-II 12-羟化酶,并命名为CYP88G2。根据系统发育关系,两个远缘物种的达玛烯二醇-II 12-羟化酶序列分别聚类在不同分支各自独立进化。分子对接结果展示了二者的底物以相反构象被固定在活性口袋,暗示存在不同的关键氨基酸作用力。结合氨基酸序列的极低相似度,推测二者的远古共祖并无此功能,绞股蓝与人参中的达玛烯二醇-II 12-羟化酶由各自物种进化过程中独立招募不同的关键氨基酸位点趋同进化而来。

综上所述,本文构建了一个五倍体的江西株系绞股蓝基因组,为该物种的研究提供了一份多倍体遗传资源。重构葫芦科物种系统发育关系和祖先核型,为葫芦科物种进化提供新视角。并通过绞股蓝不同倍性基因组的比较分析推测了着丝粒DNA快速进化可能是物种内丰富倍性变异产生的驱动力。筛选和鉴定了江西株系中的达玛烯二醇-II 12-羟化酶CYP88G2,进一步丰富了生物合成20S-原人参二醇的遗传元件。系统发育分析及分子对接证明了绞股蓝与人参中此类CYP450酶独立募集不同的关键氨基酸位点进行了趋同进化。本研究不仅完善了达玛烯二醇-II 12-羟化酶的趋同进化研究,也为酶遗传元件的改造提供了一定研究基础,同时还为远缘物种的次生代谢途径趋同进化研究提供了一个案例。

论文文摘(外文):

Gynostemma pentaphyllum , a perennial vine of the Cucurbitaceae family, is the source plant of a traditional Chinese medicine with a long history of use. Modern clinical studies have shown that the extracts of G. pentaphyllum possess various biological activities such as anti-cancer, anti-inflammatory, lipid-lowering, and immunomodulatory effects, and have extremely high medicinal value. To date, the genus Gynostemma is the only plant group, apart from the genus Panax, reported to produce 20S-protopanaxadiol-type saponins. This sapogenin is derived from the hydroxylation of C12 of dammarenediol-II. In this study, for the individuals of the Jiangxi strain of G. pentaphyllum rich in this type of saponin, a high-quality chromosome-level genome was constructed using the third-generation sequencing and assembly method; a cytochrome P450 (CYP450) gene encoding dammarenediol-II 12-hydroxylase was screened and identified based on the genome, transcriptome, and phylogenetic tree; through the gene cloning and functional verification of the sequences adjacent to the identified dammarenediol-II 12-hydroxylase in the phylogenetic tree, it was determined that this enzyme present in G. pentaphyllum has species specificity, supplementing and perfecting the theory of convergent evolution of dammarenediol-II 12-hydroxylase between G. pentaphyllum and Panax ginseng. The specific results are as follows:

1. It was determined that the Jiangxi sequencing strain of G. pentaphyllum rich in 20S-protopanaxadiol-type saponins is a pentaploid plant. The first haplotype-resolved pentaploid genome was sequenced and assembled, and a set of procedures suitable for polyploid genome assembly was established. Karyotype analysis was carried out using root tip cells with vigorous mitosis, and it was found that there were a total of 55 chromosomes. According to the previously reported number of haplotype chromosomes being 11, it was determined that the Jiangxi strain of G. pentaphyllum is a pentaploid plant (2n=5x=55). PacBio HiFi sequencing and High-throughput chromatin conformation capture (HiC) technology were used, and various primary assembly and chromosome-level assembly software and strategies were compared. A set of procedures currently most suitable for the polyploid assembly of G. pentaphyllum was established, and a haplotype-resolved pentaploid G. pentaphyllum genome was successfully constructed and named GpJX. The total size of the genome is approximately 2.48 Gb. After genome annotation, 106,794 protein-coding genes, 19,609 tRNAs, 21,552 rRNAs, 603 miRNAs, and 2,662 snRNAs were obtained. The annotation of repetitive sequences revealed that 70.39% of the genomic sequences are composed of transposable element (TE) sequences. The evaluation in terms of integrity, continuity, and accuracy showed that GpJX has high quality, also verifying the adaptability of the assembly procedure. Subsequently, based on the characteristic distribution of TEs on each chromosome, etc., it was speculated that GpJX is an autopolyploid.

2. The phylogenetic relationships of Cucurbitaceae species were studied, and the ancestral karyotype was reconstructed. Comparative analysis with the diploid T2T genome (GpT2T) revealed abundant structural variations and the rapid evolution of DNA in the centromere regions of G. pentaphyllum with different ploidies. It is speculated that this is also the main driving force for ploidy variation. Using the latest sequenced representative species of each taxonomic relationship in the Cucurbitaceae family, a phylogenetic tree of the Cucurbitaceae family was constructed. Through the comparison of the synonymous mutation rate (Ks), it was revealed that Momordica charantia is the species with the slowest evolution after differentiation from the ancestor among these species and retains more ancestral chromosome remnants. Taking it as the reference species, the ancestral karyotype number of the Cucurbitaceae family was reconstructed to be 14, showing multiple chromosome fusion and fission events that occurred during the evolutionary process. Comparative analysis of the genomes of G. pentaphyllum with different ploidies found that the insertion events of long terminal repeat retrotransposons (LTR-RT) occurred relatively recently, but the burst times of the two ploidies were different, which may be related to the adaptive evolution under the respective geographical environmental pressures. Comparison of structural variations found that there are a large number of chromosomal inversion variations between GpJX and GpT2T, concentrated in regions with low gene density and high TE density. Some large-scale inversions are mainly concentrated in the regions corresponding to the centromeres of GpT2T, suggesting that there are abundant variations in the centromeres. Through the analysis of the T2T centromere region and part of the centromere region of the Jiangxi strain, it was found that there are indeed large variations between the repeat unit sequences in the centromere region, showing rapid DNA evolution. The number of centromeric tandem repeats in the two ploidies of G. pentaphyllum is extremely small compared to that of Arabidopsis thaliana, etc. A large number of LTR-RT insertions often occur recently between the higher-order repeats composed of tandem repeat units, which may affect the stability of the centromeres. Therefore, it is speculated that drastic variations such as structural rearrangements in the centromere region may be one of the driving forces for the abundant ploidy variations within the species of G. pentaphyllum.

3. A dammarenediol-II 12-hydroxylase CYP88G2 was screened and identified according to the search of the structural domain and the phylogenetic tree of CYP450 enzymes. The functional characterization of the sequences adjacent to all clades and the results of molecular docking showed that the dammarenediol-II 12-hydroxylases of G. pentaphyllum and P. ginseng independently recruited different key amino acid sites for convergent evolution during the evolution of their respective lineages. The CYP450 enzymes were searched in the GpJX genome rich in 20S-protopanaxadiol based on the structural domain, and a phylogenetic tree of the sequences of the CYP716 and CYP88 families was constructed. Combined with the transcriptome data, 5 CYP450 candidate genes were selected. After cloning the 5 genes, Gpjx10G001898 and Gpjx10G001842 were successfully cloned. Subsequently, they were constructed into plant expression vectors and transformed into the leaves of Nicotiana benthamiana using the Agrobacterium-mediated method, and the products were extracted for GC-MS detection. The results showed that compared with the control, a new product was detected in the sample group of Gpjx10G001898. By comparing with the chromatographic and mass spectrometry data of the standard, this compound was identified as 20S-protopanaxadiol, while this compound was not detected in the sample group of Gpjx10G001842, indicating that Gpjx10G001898 encodes dammarenediol-II 12-hydroxylase and was named CYP88G2. The results of molecular docking showed that the substrates of the two were catalyzed in opposite postures, suggesting the different roles of the key amino acids. Combined with the extremely low similarity of the amino acid sequences, it is speculated that the ancient common ancestor of the two did not have this function, and the dammarenediol-II 12-hydroxylases in G. pentaphyllum and P. ginseng independently recruited key amino acid for convergent evolution during the evolution of their respective species.

In conclusion, this paper constructed the genome of the Jiangxi strain of G. pentaphyllum, a pentaploid, providing a polyploid genetic resource for the study of this species. Reconstructing the phylogenetic relationships and ancestral karyotypes of Cucurbitaceae species provides a new perspective for the evolution of Cucurbitaceae species. Through the comparative analysis of the genomes of G. pentaphyllum with different ploidies, it is speculated that the rapid evolution of centromeric DNA may be the driving force for the abundant ploidy variations within the species. The dammarenediol-II 12-hydroxylase CYP88G2 in the Jiangxi strain was screened and identified, further enriching the genetic elements for the biosynthesis of 20S-protopanaxadiol. Phylogenetic analysis and molecular docking have proven that the CYP450 enzymes of this type in G. pentaphyllum and P. ginseng independently recruited different key amino acid for convergent evolution. This study not only improves the research on the convergent evolution of dammarenediol-II 12-hydroxylase but also provides a certain research basis for the modification of enzyme genetic elements. At the same time, it also provides a case for the study of the convergent evolution of secondary metabolic pathways in distantly related species.

开放日期:

 2025-06-11    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号