查看论文信息

免费浏览

查看论文信息

论文题名(中文)：	基于全基因组分析的药用植物物种鉴定研究
姓名：	郝利军
论文语种：	chi
学位：	博士
学位类型：	学术学位
学校：	北京协和医学院
院系：	北京协和医学院药用植物研究所
专业：	药学-生药学
指导教师姓名：	宋经元
论文完成日期：	2024-04-30
论文题名(外文)：	Identification of medicinal plant species based on Analysis of whole-GEnome
关键词(中文)：	全基因组分析药用植物鉴定基因组编辑近缘物种蓼科
关键词(外文)：	Analysis of whole-GEnome Identification of medicinal plant species Genome editing Closely related species Polygonacea
论文文摘（中文）：	︿ WHO统计，目前世界约有七万种药用植物，超过80%人口依赖于药用植物满足初级健康需求。药用植物的准确鉴定不仅是植物创新药研究的前提，更直接关系到临床用药的安全性和有效性。因此，开发一种准确、通用的方法对这些种类繁多的药用植物进行鉴定十分紧迫且意义重大。基因组学的兴起提供了创新方案：不同物种的全基因组必定存在差异，通过寻找物种间的基因组差异理论上可鉴定任意物种。虽然相关序列分析技术已经成熟，但全基因组分析是否可用于药用植物物种鉴定；如何低成本获取药用植物的全基因组数据进行分析；该方法能否解决药用植物鉴定中近缘物种难以区分的问题仍有待探索。本研究首先以西红花及其混伪品基原植物的鉴定为例建立基于全基因组分析的药用植物物种鉴定方法——全基因组分析法（Analysis of whole-GEnome，AGE）；在6种来自不同高等植物门的药用植物中确认AGE广泛的通用性；之后聚焦无公开基因组的4种蓼科药用植物，探索利用成本较低的高通量测序数据进行AGE研究；最后应用AGE鉴定3种大黄属近缘植物，提供药用植物近缘物种的鉴定范例，主要研究内容及结果如下： 1. 构建基于全基因组分析的药用植物物种鉴定方法。全基因组分析法由生物信息分析和实验验证两部分组成，分别用于筛选和检测物种特异靶标序列进行物种鉴定。以西红花基原植物番红花（Crocus sativus）鉴定为例建立全基因组分析法，生物信息分析包括将番红花基因组切成25 bp片段，提取包含PAM（Protospacer Adjacent Motif）的序列构建候选靶标序列库，共获得59,282,259条候选靶标序列。选择其中一条与西红花常见伪品红花、莲须和玉米须的基原植物基因组进行序列比对，确认该序列为物种特异靶标序列。实验验证包括使用基因组编辑技术检测该序列验证方法可行性和特异性，测试方法灵敏度。最终成功利用全基因组分析法鉴定西红花及其混伪品。为测试方法通用性，以来自主要高等植物门（被子、裸子、蕨类、石松和苔藓）的6种代表性药用植物为对象，根据建立的全基因组分析法生物信息分析策略获得6条物种特异靶标序列，利用基因组编辑技术检测这些序列实现上述6种药用植物的准确鉴定，证明全基因组分析法适用于主要高等植物门的药用植物物种鉴定。 2. 制定基于高通量测序数据的物种特异靶标序列生物信息分析关键策略。获取基因组数据是全基因组分析法的首要任务，针对目前药用植物公开基因组数量相对较少的现状，使用高通量测序技术对无公开基因组的4种蓼科药用植物：木藤蓼（Fallopia aubertii）、何首乌（F. multiflora）、拳参（Bistorta officinalis）和珠芽蓼（B. vivipara）进行测序。制定针对高通量测序数据的分析策略：将测序数据切成25 bp片段。根据出现频率、PAM、GC含量、连续核苷酸数量和序列比对的条件进行筛选，分别获得12,053,493、11,394,032、2,462,609和5,441,188条物种特异靶标序列。从中各随机选取1条序列利用基因组编辑技术进行检测，发现选择的物种特异靶标序列可成功鉴定上述4种蓼科植物，且比对分析发现4条序列均为全新、非注释序列，此前未被研究。证明制定的基于高通量测序数据的分析策略可有效筛选物种特异靶标序列，因此可利用测序低成本获取基因组数据，并使用全基因组分析法进行物种鉴定。 3. 提供药用植物近缘物种的AGE鉴定方法研究范例。同属近缘物种是鉴定领域的重点和难点，以3种大黄属药用植物近缘物种的鉴定为例探索解决该难题。对药用大黄（Rheum officinale）、掌叶大黄（R. palmatum）和唐古特大黄（R. tanguticum）基因组进行测序。提取测序深度为80 X的数据，根据制定的基于高通量测序数据的分析策略，分别筛选获得1,827,449、999,948和508,387条物种特异序列。从中各随机选取1条序列利用基因组编辑技术进行检测，结果显示选择的物种特异靶标序列可成功鉴定3种大黄属植物，比对分析发现该序列同样为全新、非注释序列。利用3条物种特异靶标序列对36份大黄饮片和7种中成药所含大黄饮片的基原植物进行鉴定，结果表明，26份大黄饮片和全部7种中成药所含大黄饮片的原料均来源于3种大黄基原植物。另外10份大黄饮片的基原植物则被确认不属于3种大黄基原植物。上述结果证明全基因组分析法可准确鉴定3种大黄属药用植物近缘物种，同时可用于中药饮片和中成药基原植物的鉴定。本论文创新点集中在：（1）构建了基于全基因组分析的药用植物物种鉴定方法，为中药鉴定新方法研发奠定基础。（2）制定基于高通量测序数据的物种特异靶标序列生物信息分析关键策略，突破全基因组分析法鉴定研究中基因组数量不足瓶颈。（3）提供药用植物近缘物种全基因组分析法鉴定研究范例，为难以区分的中药基原物种鉴定引入新思路。综上所述，本研究证实了全基因组分析可用于药用植物物种鉴定；通过高通量测序能够以低成本获取可用的全基因组数据；全基因组分析法可解决药用植物鉴定中近缘物种难以区分的问题，为药用植物物种鉴定提供重要工具。﹀
论文文摘（外文）：	︿ According to WHO statistics, there are about 70,000 medicinal plants in the world, serving as the primary healthcare source for over 80% of the world's population. The accurate identification of medicinal plants is not only a prerequisite for the research of innovative plant-based medicines but also directly relates to the safety and efficacy of clinical medication. Therefore, it is both urgent and highly significant to develop an accurate and universal method for identifying these various types of medicinal plants. The rise of genomics provides an innovative solution: the genomes of different species inevitably differ, and identifying any species theoretically can be achieved by searching genomic differences between species. However, three challenges remain: First, whether analysis of whole-genome can be used for the identification of medicinal plant species needs to be confirmed. Second, the method for cost-effectively obtaining whole-genome data for analysis is still unknown. Third, the capability of this method to distinguish closely related species requires further exploration. Using the identification of saffron and its adulterant as an example, we first established a whole-genome analysis based method for the identification of medicinal plant species—Analysis of whole-GEnome (AGE); next confirmed the broad universality of AGE in six medicinal plants from different higher plant phyla; then focused on four Polygonaceae medicinal plants without public genome data, exploring the use of cost-effective high-throughput sequencing data for AGE research; and finally applied AGE to identify three closely related species of the genus Rheum, providing an example for the identification of closely related species of medicinal plants. The main research content and results are as follows: Developing a novel method based on Analysis of whole-Genome for the identification of medicinal plant species. AGE consists of bioinformatics analysis and experimental validation, used to select and detect species-specific target sequences for species identification. Using the identification of Crocus sativus as an example, the steps in bioinformatics analysis included cutting the genome of Crocus sativus into 25 bp fragments, extracting sequences containing the PAM (Protospacer Adjacent Motif) to build a candidate target sequence library, selecting one of these sequences to perform sequence alignment with the genomes of saffron adulterants. Ultimately, a total of 59,282,259 candidate target sequences and one species-specific sequence were obtained. Experimental validation included using genome editing technology to detect this species-specific sequence for feasibility, specificity and sensitivity test, confirming the detection limit of AGE to be 0.01 ng/uL. Finally, saffron and its adulterants were successfully identified by AGE. To test the universality of the AGE, six representative medicinal plants from the main higher plant phyla (angiosperm, gymnosperm, fern, lycophyte, and moss) were selected and six species-specific target sequences were obtained based on the established bioinformatics analysis strategy. Using genome editing technology to detect the selected target sequences can accurately identify six medicinal plants, which proved that AGE is suitable for the identification of medicinal plant species from the main higher plant phyla. Formulating a key strategy for the screening of species-specific target sequences based on high-throughput sequencing data. Obtaining genomic data is the primary task of AGE. In response to the relatively small number of publicly available genomes of medicinal plants, high-throughput sequencing technology was used to sequence the genomes of medicinal plants. Four Polygonaceae medicinal plants: Fallopia aubertii, F. multiflora, Bistorta officinalis, and B. vivipara were chosen as an example and sequenced. To analyze the sequencing data, a key strategy for the screening of species-specific target sequences was formulated: cutting the sequencing data into 25 bp fragments, screening the species-specific sequences according to the rules of occurrence frequency, PAM motif, GC content, consecutive nucleotide numbers, and sequence alignment. Finally, 12,053,493, 11,394,032, 2,462,609, and 5,441,188 species-specific target sequences were obtained, respectively. One sequence from each was randomly selected for identification using genome editing technology, and it was found that the four Polygonaceae plants could be successfully identified with these sequences, and sequence analysis revealed all four sequences to be novel, unannotated sequences. This proves that the formulated bioinformatics analysis strategy can effectively screen the species-specific sequences, therefore, the genomic data of medicinal plants without publicly available genome can be obtained at low cost with sequencing. Providing a case study for the identification of closely related species of medicinal plants. Identifying closely related species within the same genus is a focus and challenge in the field of identification, which was explored using the identification of three closely related species of the genus Rheum as an example. The genomes of Rheum officinale, R. palmatum, and R. tanguticum were sequenced. Sequencing data with a depth of 80X were extracted, and based on the same bioinformatics analysis strategy, 1,827,449; 999,948, and 508,387 species-specific sequences were screened, respectively. One sequence from each was randomly selected for identification using genome editing technology, and the results showed that the three rhubarb plants could successfully be identified. Sequence analysis also revealed these sequences to be novel, unannotated sequences. Using three species-specific target sequences to identify the original plants of thirty-six rhubarb decoction pieces and seven Chinese patent medicines containing rhubarb decoction pieces, the results showed that twenty-six rhubarb decoctions and all Chinese patent medicines containing rhubarb decoction pieces were sourced from the three rhubarb plants. Additionally, the original plants of the other ten decoction pieces were confirmed not to belong to the three rhubarb plants. The above results proved that AGE can accurately identify three closely related species of the genus Rheum and can be used for the identification of original plants of raw materials in decoction pieces and Chinese patent medicines. Regarding the three initial challenges, we have provided targeted responses herein. First, we have confirmed that AGE is effective for the identification of medicinal plant species. Second, we have developed a strategy for bioinformatics analysis that allows high-throughput sequencing data to be directly utilized for AGE. Third, we have demonstrated that AGE can accurately identify three closely related species within the genus Rheum, offering a valuable reference for the identification of other closely related species. ﹀
开放日期：	2024-06-24

附件下载