- 无标题文档
查看论文信息

论文题名(中文):

 基于柴胡、狭叶柴胡和窄竹叶柴胡比较基因组学的根皂苷含量差异分子机制研究    

姓名:

 温东    

论文语种:

 chi    

学位:

 博士    

学位类型:

 学术学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院药用植物研究所    

专业:

 药学-生药学    

指导教师姓名:

 魏建和    

论文完成日期:

 2025-06-20    

论文题名(外文):

 Study on the molecular mechanisms underlying root saikosaponin content differences in Bupleurum chinense, B. scorzonerifolium, and B. marginatum var. stenophyllum based on comparative genomics analysis    

关键词(中文):

 柴胡 狭叶柴胡 窄竹叶柴胡 D值 柴胡皂苷类化合物 比较基因组 生物合成基因簇    

关键词(外文):

 B. chinense B. scorzonerifolium B. marginatum var. stenophyllum D-value saikosaponins comparative genomics biosynthesis-related gene cluster    

论文文摘(中文):

柴胡属(Bupleurum L.)植物是伞形科(Apiaceae)多年生草本植物,中国该属大部分植物以根入药,始载于《神农本草经》。2025版《中国药典》收载柴胡B. chinense DC.和狭叶柴胡B. scorzonerifolium Willd.二种基原物种,按照药材性状不同,分别习称“北柴胡”和“南柴胡”,且规定柴胡皂苷a+d的总量不得少于0.30%。藏柴胡来源于窄竹叶柴胡B. marginatum Wall.ex DC. var. stenophyllum (Wolff) Shan et Y. Li的干燥根,非药典种,在《贵州省中药材、民族药材质量标准》中收录。三种柴胡均含有柴胡皂苷类化合物,是柴胡药材的主要药效成分,但含量差异大,藏柴胡中柴胡皂苷a+d的含量是北柴胡的4倍、南柴胡的8倍。柴胡皂苷类化合物的生物合成途径已基本阐明,但为什么藏柴胡中皂苷类化合物含量远高于北柴胡和南柴胡,其分子机制是什么?该问题的阐释有助于了解柴胡属植物柴胡皂苷类化合物合成途径的进化及指导品种的改良,但迄今未清晰阐释。

柴胡(B. chinense)基因组数据已在Zhang等发表的文章中报道,但由于质控后的HiFi数据未公开,暂无法获得该基因组数据。为此本论文测序并组装了柴胡(B. chinense,本课题组选育新品种“中柴2号”)、窄竹叶柴胡(B. marginatum var. stenophyllum)的基因组,结合课题组前期测序、组装完成的狭叶柴胡(B. scorzonerifolium)基因组,通过比较基因组、转录组、代谢组以及miRNA测序等多组学联合分析探究不同柴胡中皂苷类化合物含量差异的分子机制。主要研究结果如下:

构建柴胡和窄竹叶柴胡高质量染色体水平基因组,发现柴胡基因组具有超高的多拷贝基因数(Duplicated BUSCOs,D值),D值高达92.5%。核型分析确定柴胡(2n=2x=12)和窄竹叶柴胡(2n=2x=16)均为二倍体。数据经去冗余和杂合后,按二倍体进行组装。柴胡基因组大小882.46 Mb,contig N50为132.92 Mb,96.74%的序列挂载到6条染色体。窄竹叶柴胡基因组大小489.82 Mb,contig N50为63.15 Mb,95.74%的序列挂载到8条染色体。二种柴胡基因组的BUSCO评估结果分别为99.20%和98.90%,基因组结构完整度高。二种柴胡基因组D值分别为92.5%和4.2%,柴胡基因组的D值是目前已报道的二倍体物种中最高的(二倍体物种D值通常在20%以内)。

通过同义替换率(Ks)、共线性和基因丢失率等分析解析柴胡基因组超高D值的原因为其基因组内发生了额外的、更近期的加倍(WGD)事件。Ks分析结果显示,柴胡和狭叶柴胡基因组与同为伞形科的窄竹叶柴胡和胡萝卜基因组相比,分别在~0.055和~0.039处出现额外的峰值,表明柴胡和狭叶柴胡基因组分别在3.1-5.2百万年前(Mya)和2.2-3.6 Mya发生了一次额外的、更近期的WGD事件。柴胡基因组与葡萄(Vitis vinifera)、胡萝卜以及窄竹叶柴胡基因组之间的共线性关系分别为8:2、4:2和2:1,进一步证明柴胡基因组经历了额外的WGD事件。柴胡(D值92.5%)和狭叶柴胡(D值37.4%)基因组分别与葡萄基因组相比计算基因丢失率,分别为51.70%和56.28%,表明柴胡基因组内基因丢失速度慢于狭叶柴胡,且基因丢失方式单一。因此推测由于柴胡和狭叶柴胡基因组在发生WGD事件后重回二倍化过程的速度和方式不同,导致柴胡基因组中重复基因多于狭叶柴胡,D值更高。

比较基因组学进一步揭示柴胡和狭叶柴胡亲缘关系更近,柴胡基因组多拷贝基因数多于窄竹叶柴胡,且基因家族发生明显扩张。柴胡、狭叶柴胡和窄竹叶柴胡与胡萝卜、葡萄等14个物种的系统进化树显示,柴胡、狭叶柴胡和窄竹叶柴胡聚为一支,大约在36.3 Mya从伞形科分化,窄竹叶柴胡大约在12.3 Mya从柴胡属分化,柴胡和狭叶柴胡亲缘关系最近,大约在3.8 Mya分化。柴胡基因组中多拷贝基因数最多,22,144个,是窄竹叶柴胡的2.8倍,其次是狭叶柴胡,是窄竹叶柴胡的1.7倍。柴胡基因组中的基因家族发生明显扩张,三种柴胡基因组发生扩张的基因家族数分别为7,471个、1,616个和374个。因此推测柴胡基因组多拷贝基因数增加、基因家族扩张可能与其经历额外的、更近期的WGD事件有关。

基于比较基因组、转录组和代谢组联合分析,在窄竹叶柴胡基因组的5号染色体中鉴定到皂苷类化合物合成基因簇β-AS、CYP450s和UGTs基因,但这些基因在柴胡和狭叶柴胡基因组中分散在不同染色体。柴胡、狭叶柴胡和窄竹叶柴胡根的代谢组结果显示,窄竹叶柴胡中总皂苷以及柴胡皂苷a+d的相对含量都是柴胡的1倍、狭叶柴胡的3倍,狭叶柴胡中单萜类化合物的相对含量最高,分别是柴胡和窄竹叶柴胡的1.6和9.8倍。柴胡皂苷类化合物含量检测结果显示,窄竹叶柴胡中柴胡皂苷a+d的含量分别是柴胡和狭叶柴胡的4倍和8倍。三种柴胡根的转录组结果显示,窄竹叶柴胡中皂苷合成途径下游基因FPPS、SS、SE以及β-AS的表达量显著高于柴胡和狭叶柴胡。比较基因组结合转录组分析显示,窄竹叶柴胡中与高表达的β-AS基因共表达的CYP450s和UGTs基因在其5号染色体上形成基因簇,但这些基因分散在柴胡的4和5号染色体,狭叶柴胡的1和5号染色体。窄竹叶柴胡基因簇中基因的表达量与柴胡皂苷a、b1、d和f的含量显著正相关。且基因簇中CYP450和UGT基因可能作用于β-香树脂醇的C-16、C-28和C-3位。因此窄竹叶柴胡根中柴胡皂苷类化合物含量高于柴胡和狭叶柴胡可能与其基因组中存在皂苷合成相关的基因簇有关。

基于miRNA测序和转录组联合分析在柴胡和狭叶柴胡基因组中鉴定到可能负调控皂苷合成途径基因的miRNAs,而窄竹叶柴胡中未鉴定到负调控皂苷合成途径基因的miRNAs。柴胡、狭叶柴胡和窄竹叶柴胡中分别检测到422,732和224个miRNAs,并分别预测到13,939,15,095和3,966个靶基因。三种柴胡中皂苷合成途径基因的表达量与miRNAs的表达量关联分析结果显示,柴胡中有3个miRNAs分别负调控AACT、HMGS、SE以及DXS基因。狭叶柴胡中有9个miRNAs分别负调控HMGS、MK、SS、SE、β-AS、CMK以及GPPS基因,且在β-AS基因(窄竹叶柴胡基因簇的同源基因)与Bsco_miR489之间预测到明确的切割位点,窄竹叶柴胡中没有负调控皂苷合成途径基因的miRNAs。因此推测miRNAs可能通过负调控抑制柴胡和狭叶柴胡中皂苷合成途径基因的翻译,使二种柴胡根中皂苷类化合物含量低于窄竹叶柴胡。

综上,本研究良好组装了二倍体、D值高达92.5%的柴胡选育品种“中柴2号”的染色体水平参考基因组。同时首次构建了二倍体窄竹叶柴胡染色体水平参考基因组。解析出柴胡基因组高D值是由于其发生了额外的、更近期的WGD事件,而该WGD事件的发生与柴胡基因组内多拷贝基因数增多,基因家族发生大量扩张,以及仍处于重回二倍化过程有关。基于比较基因组学等多组学联合分析,解析出柴胡、狭叶柴胡和窄竹叶柴胡根中柴胡皂苷类化合物含量差异可能与窄竹叶柴胡5号染色体上的皂苷合成基因簇(β-AS、CYP450s和UGTs基因)以及其皂苷生物合成途径基因未受miRNAs的负调控有关。本论文不仅组装出高D值柴胡基因组丰富了柴胡属植物基因组数据,同时还初步解析出了三种柴胡皂苷类化合物含量差异的分子原因,对认识柴胡的起源、进化和挖掘控制化学成分的关键基因具有重要意义。

论文文摘(外文):

The genus Bupleurum L. comprises perennial herbaceous plants belonging to the family Apiaceae. Most species of this genus in China are used medicinally for their roots, with records dating back to the Shennong Ben Cao Jing. The 2025 edition of the Chinese Pharmacopoeia includes two source species: B. chinense DC. and B. scorzonerifolium Willd., which are traditionally referred to as "Bei Chaihu" and "Nan Chaihu", respectively, based on their morphological characteristics. The pharmacopoeia stipulates that the combined content of saikosaponins a and d must not be less than 0.30%. B. marginatum var. stenophyllum, derived from the dried roots of B. marginatum Wall. Ex DC. var. stenophyllum (Wolff) Shan et Y. Li, is a non-pharmacopoeial species but is included in the Quality Standards of Chinese Medicinal Materials and Ethnic Medicinal Materials of Guizhou Province. All three types of Bupleurum contain saikosaponins, the primary bioactive constituents of Bupleurum, though their content varies significantly. The saikosaponin a+d content in B. marginatum var. stenophyllum is four times that of B.chinense and eight times that of B. scorzonerifolium. While the biosynthetic pathway of saikosaponins has been largely elucidated, the molecular mechanisms underlying the substantially higher saikosaponin content in B. marginatum var. stenophyllum compared to B. chinense and B. scorzonerifolium remain unclear. Clarifying this issue would enhance our understanding of the evolution of the saikosaponin biosynthesis pathway in Bupleurum species and guide cultivar improvement, yet it has not been fully elucidated to date.

The genome data of B. chinense has been reported in the article published by Zhang et al. However, as the hifi data after quality control has not been made public, the genome data cannot be obtained for the time being. In this study, we sequenced and assembled the genomes of B. chinense (a new cultivar, "Zhongchai No. 2," bred by our research group) and B. marginatum var. stenophyllum. Combined with our previously assembled genome of B. scorzonerifolium, we conducted comparative genomic, transcriptomic, metabolomic, and miRNA sequencing analyses to investigate the molecular mechanisms underlying the differences in saikosaponin biosynthesis among these Bupleurum species. The main findings are as follows:

High-quality chromosome-level genomes for B. chinense and B. marginatum var. stenophyllum revealed that the B. chinense genome exhibits an exceptionally high number of duplicated genes (Duplicated BUSCOs, D-value), reaching 92.5%. Karyotype analysis confirmed that both B. chinense (2n=2x=12) and B. marginatum var. stenophyllum (2n=2x=16) are diploids. After data redundancy and heterozygosity removal, it is assembled as a diploid. The B. chinense genome size is 882.46 Mb, with a contig N50 of 132.92 Mb, and 96.74% of sequences anchored to 6 chromosomes. The B. marginatum var. stenophyllum genome size is 489.82 Mb, with a contig N50 of 63.15 Mb, and 95.74% of sequences anchored to 8 chromosomes. The BUSCO assessment results of the two Bupleurum genomes were 99.20% and 98.90%, respectively, indicating highly complete genome structures. Notably, the D-values for the two species were 92.5% and 4.2%, respectively. The D-value of B. chinense is the highest ever reported among diploid species (typically < 20% in diploids), suggesting an extreme level of gene duplication.

The exceptionally high D value of B. chinense genome was attributed to an additional and more recent WGD event within its genome, as evidenced by analyses of synonymous substitution rates (Ks), collinearity, and gene loss rates. Ks analysis results show that compared with the genomes of B. marginatum var. stenophyllum and Daucus carota, both of which belong to the Apiaceae family, the genomes of B. chinense and B. scorzonerifolium exhibit additional peaks at approximately 0.055 and 0.039, respectively. This indicates that the genomes of B. chinense and B. scorzonerifolium underwent an additional and more recent WGD event around 3.1-5.2 million years ago (Mya) and 2.2-3.6 Mya, respectively. The collinearity relationships between the B. chinense genomes and the genomes of V. vinifera, D. carota, and B. marginatum var. stenophyllum were 8:2, 4:2, and 2:1, respectively, further demonstrating that the B. chinense genome has experienced an additional WGD event. The gene loss rates of the genomes of B. chinense (D value: 92.5%) and B. scorzonerifolium (D value: 37.4%) were calculated respectively by comparing them with the genome of V. vinifera, which were 51.70% and 56.28% respectively. This indicates that the gene loss rate in the genome of B. chinense is slower than that of B. scorzonerifolium, and the mode of gene loss is single. Therefore, it is speculated that due to the different speeds and mode of the rediploidization process after the WGD event in the genomes of B. chinense and B. scorzonerifolium, the number of duplicated genes in the genome of B. chinense is greater than that of B. scorzonerifolium, and the D value is higher.

Comparative genomics further reveals that the genetic relationship between B. chinense and B. scorzonerifolium is closer. The number of multi-copy genes in the genome of B. chinense is greater than that of B. scorzonerifolium. And there is a significant expansion of the gene family. The phylogenetic trees of B. chinense, B. scorzonerifolium, and B. marginatum var. stenophyllum with 14 species such as carrots and grapes show that the B. chinense, B. scorzonerifolium, and B. marginatum var. stenophyllum form a single branch. They diverged approximately 36.3 Mya from the Apiaceae family. B. marginatum var. stenophyllum diverged approximately 12.3 Mya from the genus Bupleurum. B. chinense and B. scorzonerifolium have the closest genetic relationship, and they diverged approximately 3.8 Mya. Among the three species of Bupleurum, the genome of B. chinense has the most multi-copy genes, 22,144 genes, which is 2.8 times that of B. marginatum var. stenophyllum. B. scorzonerifolium has 1.7 times as many genes as B. marginatum var. stenophyllum. The gene families in the genomes of B. chinense have undergone significant expansion. There are 7,471, 1,616, and 374 expanded gene families in the genomes of B. chinense, B. scorzonerifolium, and B. marginatum var. stenophyllum, respectively. Therefore, it is speculated that the additional and more recent WGD events experienced by the genome of B. chinense may be related to the increase in the number of multi-copy genes and the expansion of gene families.

Integrated analysis of comparative genomics, transcriptomics, and metabolomics revealed a saikosaponins biosynthesis gene cluster (containing β-AS, CYP450s, and UGTs) located on chromosome 5 of B. marginatum var. stenophyllum, but these genes were dispersed across different chromosomes in B. chinense and B. scorzonerifolium. Metabolomic profiling of roots from the three species showed, total saikosaponins and the relative content of saikosaponins a+d in B. marginatum var. stenophyllum were 1-fold higher than in B. chinense and 3-fold higher than in B. scorzonerifolium. B. scorzonerifolium exhibited the highest relative content of monoterpenes-1.6-fold and 9.8-fold higher than B. chinense and B. marginatum var. stenophyllum, respectively. Quantitative analysis confirmed that saikosaponins a+d in B. marginatum var. stenophyllum were 4-fold and 8-fold more abundant than in B. chinense and B. scorzonerifolium, respectively. Transcriptomic analysis further revealed that downstream genes in the saikosaponins biosynthesis pathway (FPPS, SS, SE, and β-AS) were significantly upregulated in B. marginatum var. stenophyllum compared to the other two species. Comparative genomic and transcriptomic analyses revealed that the highly expressed β-AS gene in B. marginatum var. stenophyllum is co-expressed with CYP450s and UGTs genes, which form a tight gene cluster on chromosome 5. In contrast, these genes are dispersed-located on chromosomes 4 and 5 in B. chinense and chromosomes 1 and 5 in B. scorzonerifolium. The expression levels of genes in this cluster (B. marginatum var. stenophyllum) show significant positive correlations with the contents of saikosaponins a, b1, d, and f. Moreover, the CYP450 and UGT genes in the gene cluster may act on the C-16, C-28 and C-3 positions of β-amyrin. Thus, the higher saikosaponin content in B. marginatum var. stenophyllum roots (compared to B. chinense and B. scorzonerifolium) may be attributed to the presence of this compact saikosaponin biosynthesis gene cluster, and higher expression of key pathway genes (FPPS, SS, SE, and β-AS).

Based on the combined analysis of miRNA sequencing and transcriptome, miRNAs that might negatively regulate the genes involved in the saikosaponins biosynthesis pathway were identified in the genomes of B. chinense and B. scorzonerifolium, but not in B. marginatum var. stenophyllum. A total of 422, 732, and 224 miRNAs were detected in B. chinense, B. scorzonerifolium, and B. marginatum var. stenophyllum, and 13,939, 15,095, and 3,966 target genes were predicted respectively. The correlation analysis of the expression levels of the saikosaponins biosynthesis pathway genes and miRNAs showed that in B. chinense, 3 miRNAs negatively regulated the AACT, HMGS, SE, and DXS genes respectively; in B. scorzonerifolium, 9 miRNAs negatively regulated the HMGS, MK, SS, SE, β-AS, CMK, and GPPS genes respectively, and the β-AS gene (the homologous gene of the gene in the B. marginatum var. stenophyllum gene cluster) in B. scorzonerifolium had a clear cleavage site predicted by Bsco_miR489; no miRNAs negatively regulating the saikosaponins biosynthesis pathway genes were found in B. marginatum var. stenophyllum. Therefore, miRNAs might negatively regulate the translation of the saikosaponins biosynthesis pathway genes in B. chinense and B. scorzonerifolium, thereby reducing the saikosaponins content in the roots of B. chinense and B. scorzonerifolium compared to B. marginatum var. stenophyllum.

In conclusion, this study successfully assembled a chromosome-level reference genome of the diploid B. chinens breeding variety "Zhongchai No. 2", with a D value of up to 92.5%, significantly higher than the published D value of 14.4% for the B. chinense genome. Moreover, it is the highest D value among the published diploid species genomes. At the same time, a chromosome-level reference genome of the diploid B. marginatum var. stenophyllum was constructed for the first time. The high D value of the B. chinense genome was analyzed to be due to the occurrence of additional and more recent WGD events, an increase in the number of multi-copy genes within the genome, extensive expansion of gene families, and still being in the process of reverting to diploidy. Based on the combined analysis of multiple omics such as comparative genomics, the reasons for the differences in the content of saikosaponins compounds in the roots of B. chinense, B. scorzonerifolium, and B. marginatum var. stenophyllum were analyzed. It was found that the gene family that expanded in the genome of B. marginatum var. stenophyllum was significantly enriched in the biosynthesis pathways of sesquiterpenoids and triterpenoids; the β-AS, CYP450s and UGTs genes formed a gene cluster for saikosaponins synthesis on chromosome 5; the expression levels of FPPS, SS, SE and β-AS were significantly higher than those of B. chinense and B. scorzonerifolium, and were not negatively regulated by miRNAs. This paper not only assembled a high-D-value B. chinense genome, enriching the genomic data of Bupleurum plants, but also preliminarily analyzed the molecular reasons for the differences in the contents of saikosaponin compounds in three Bupleurum. This is of great significance for understanding the origin and evolution of Bupleurum and mining the key genes that control chemical components.

开放日期:

 2025-06-20    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号