论文题名(中文): | 基于组学数据的新发病原体进化及其与宿主相互作用研究 |
姓名: | |
论文语种: | chi |
学位: | 博士 |
学位类型: | 学术学位 |
学校: | 北京协和医学院 |
院系: | |
专业: | |
指导教师姓名: | |
论文完成日期: | 2023-05-30 |
论文题名(外文): | Studies on the evolution of emerging pathogens and their interactions with hosts based on omics data |
关键词(中文): | |
关键词(外文): | Klebsiella pneumoniae Whole genome sequencing Comparative genomics analysis SARS-CoV-2 Multi-omics integration analysis |
论文文摘(中文): |
自20世纪70年代以来,几乎每年都有新发病原体被发现。其中病毒和细菌所引起的感染与传播严重威胁了人类的生命财产安全,给人类经济造成了巨大损失。在过去的十年中,组学方法已经成为对新发病原体进行基础研究的有效工具。基因组分析能让我们迅速了解新发病原体的基因组基本特征,从而有利于对新发病原体的针对性防控,同时也能密切监测病原体的进化和变异过程。多组学整合分析能为我们提供更全面的理解,从而有利于对新发病原体与宿主复杂互作的深入研究。本论文以肺炎克雷伯菌ST967、肺炎克雷伯菌ST307以及SARS-CoV-2作为实例分别从基因组水平以及转录组和互作组系统整合水平对组学分析在新发病原体进化及其与宿主相互作用的应用进行阐述。 肺炎克雷伯菌是全球医院和社区获得性感染的常见原因,但在许多地区,特别是在中低收入国家中,其基因组监测仍十分缺乏。在这里,我们首次报告了从亚美尼亚一名患者中回收的多药耐药肺炎克雷伯菌ARM01的全基因组测序数据。抗生素敏感性试验显示,ARM01对11种测试的抗生素中的7种具有耐药性。基因组测序分析表明,ARM01属于序列型ST967、荚膜血清型K18和抗原型O1。ARM01携带13个抗菌素耐药(antimicrobial resistance; AMR)基因,但只检测到一个已知的毒力因子yagZ/ecpA和一个质粒复制子IncFIB(K)(pCAV1099-114)。贝叶斯进化分析显示ST967谱系最近共同祖先的日期为2004年(95%置信区间为1998-2009年),说明ST967亚型是近20年新出现的克隆。多层次比较分析显示ARM01与卡塔尔分离株(SRR11267909和SRR1126796)具有高度的基因组相似性。ARM01、SRR11267909和SRR1126796在2017年(95%置信区间:2017-2018)共享同一个祖先,并由其进化而来。 除此之外,我们也首次报告了从亚美尼亚两家医院患者中回收的四株耐多药、高毒力、序列类型为ST307的肺炎克雷伯菌的全基因组测序数据,并对亚美尼亚本地菌株以及全球收集到的ST307菌株进行了比较基因组分析。比较基因组分析显示四株亚美尼亚菌株的核心基因组和附属基因组皆彼此密切相关,它们之间最远的SNP距离为39个核苷酸差异。我们发现一个四株菌株组成的进化分支(SRR9854284、SRR10615702、SRR11460696和SRR11460688)与亚美尼亚菌株在系统发育树上显示出非常密切的进化关系,并且均具有携带耶尔森杆菌素(ybt)基因座的ICEkp4可移动元件。它们与亚美尼亚菌株共享相同的进化起源,最近分歧日期为2005年(95%置信区间为1999年至2011年)。抗生素敏感性测试显示亚美尼亚菌株分别对8(n=1)种和9(n=3)种抗生素具有耐药性。进一步研究发现亚美尼亚菌株携带了11(n=2)个和18(n=2)个AMR基因。其中粘菌素耐药基因mcr-8.1是ARM47与ARM83所独有的AMR基因,在其余ST307菌株中均不存在。除了ybt(irp1-2, ybtAEPQSTUX 和fyuA)外,ARM47和ARM83还获得了需氧菌素相关基因(iucABCD和iutA)以及黏液表型调节基因rmpADC(其中ARM47具有不完整的rmpA)。我们的研究结果表明,耐多药-高毒力的ST307菌株可能在亚美尼亚已经发生了跨院传播。同时我们猜测在其中一家医院的传播过程中部分菌株可能获得了携带高毒力基因和AMR基因的质粒,并且已经在该医院内形成了稳定的传播。 随着新冠肺炎疫情的暴发,SARS-CoV-2已严重威胁了全球公共卫生安全,造成巨大的经济损失。SARS-CoV-2的组学研究有助于了解病毒与宿主之间的相互作用,从而为病毒的干预和治疗提供新的视角。由于公共数据库中已经积累了大量SARS-CoV-2组学数据,本论文旨在通过系统整合转录组和相互作用组数据来发现SARS-CoV-2感染相关的关键宿主因子。从已发表的研究中进行手动筛选,我们获得了一个全面的SARS-CoV-2与人的蛋白质-蛋白质相互作用网络,包括31个SARS-CoV-2病毒蛋白与3591个人蛋白的直接相互作用。使用RobustRankAggreg方法,我们鉴定了123个多细胞系一致变化基因(CLCGs),其中115个上调的CLCGs显示出宿主增强的先天性免疫和趋化反应特征。结合网络分析、共表达分析和功能富集分析,我们发现了4个关键宿主因子:IFITM1、SERPINE1、DDX60和TNFAIP2。此外,我们还发现SERPINE1能与SARS-CoV-2的ORF8病毒蛋白发生相互作用,减轻ORF8诱导的内质网应激,并且能促进SARS-CoV-2复制。 综上,组学数据分析在新发致病性病原体的研究中发挥了重要的作用。本论文前两部分通过基因组水平分析分别揭示了两类亚美尼亚新发肺炎克雷伯菌(ST967与ST307)的基因组特征与系统发育。肺炎克雷伯菌ARM01(ST967)菌株具有多药耐药性,携带多个AMR基因,可能与卡塔尔流行的菌株相关。而四株ST307菌株则携带多个AMR基因与高毒力因子,其中两株还获得了粘菌素耐药基因mcr-8.1,可能已经发生了院内和跨院传播。这两部分研究强调了对新发耐多药病原体进行基因组和流行病学监测的重要性,以起到早期预警以及为国家提供防控建议的作用。本论文第三部分通过对公共数据库中SARS-CoV-2的互作组和转录组整合分析,发现了四个参与SARS-CoV-2感染的关键宿主因子。对SERPINE1的深入功能研究发现SERPINE1能与SARS-CoV-2的ORF8病毒蛋白发生相互作用,减轻ORF8诱导的内质网应激,并且能促进SARS-CoV-2复制。这部分研究突出了系统整合分析在理解新发病原体感染宿主方面的价值,并为未来针对SARS-CoV-2有效治疗靶点的研究提供了新的见解。 |
论文文摘(外文): |
Since the 1970s, emerging pathogens have been discovered almost every year. Among them, infections and transmission caused by viruses and bacteria have greatly threatened the safety of human life and property, as well as caused huge losses to the human economy. Over the past decade, omics approaches have emerged as powerful tools for basic research on the emerging pathogens. Genomic analysis allows us to quickly understand the basic genomic features of emerging pathogens, thus facilitating targeted prevention and control of emerging pathogens, as well as closely monitoring the evolution and mutation process. Integrated multi-omics analysis can provide a more comprehensive understanding, thus facilitating in-depth studies of complex interactions between emerging pathogens and their hosts. This thesis describes the basic research and application of omics study in the emerging pathogens evolution and their interactions with hosts by three examples: Klebsiella pneumoniae ST967, K. pneumoniae ST307, and SARS-CoV-2. We studied them at the genomic level as well as at the transcriptome and interactome integration level, respectively. K. pneumoniae is a common cause of hospital and community-acquired infections globally, yet its genetic information remains unknown for many regions, particularly in low-and middle-income countries. Here, we report for the first-time whole genome sequencing (WGS) data of a multidrug resistant K. pneumoniae ARM01 recovered from a patient in Armenia. Antibiotic susceptibility testing revealed that ARM01 was resistant to 7 out of 11 antibiotics tested. Genome sequencing analysis revealed that ARM01 belonged to sequence type (ST) 967, capsule type K18 and antigen type O1. ARM01 carried 13 antimicrobial resistance (AMR) genes, but only one known virulence factor, yagZ/ecpA, and one plasmid replicon, IncFIB(K) (pCAV1099-114) were detected. Bayesian evolutionary analysis revealed that the most recent common ancestor of K. pneumoniae ST967 lineage was estimated to be 2004 (95% CI: 1998-2009), implying that the K. pneumoniae ST967 subgroup was a newly emerged clone that had spread across countries in the last two decades. Multilevel comparative analysis revealed high genomic similarity between ARM01 and Qatar isolates (SRR11267909 and SRR1126796). ARM01, SRR11267909 and SRR1126796 shared and had descended from a common ancestor in 2017 (95% CI: 2017-2018). Besides, we also report for the first-time whole genome sequencing data of four MDR-hypervirulent, sequence type 307 K. pneumoniae isolates recovered from patients in two hospitals in Armenia in 2019 and performed a comparative genomic analysis with global ST307 isolates. Comparative genomic analysis revealed that the core and accessory genomes of all four Armenian isolates were closely related to each other, with the farthest SNP distance being 39. A phylogenetic clade consisting of four isolates (SRR9854284, SRR10615702, SRR11460696 and SRR11460688) showed a very close evolutionary relationship with Armenian strains in the phylogenetic tree and all isolates carried the integrative and conjugative element ICEkp4 that bearing yersiniabactin (ybt) locus. They also shared a same evolutionary origin, with the most recent divergence date of 2005 (95% CI: 1999-2011). Antibiotic susceptibility testing revealed that Armenian isolates were resistant to 8 (n=1) and 9 (n=3) of the antibiotics tested. Further studies revealed that the Armenian isolates carried 11 (n=2) and 18 (n=2) AMR genes. The unique AMR gene mcr-8.1 identified in ARM47 and ARM83, was absent in all other ST127 isolates. In addition to ybt (irp1-2, ybtAEPQSTUX and fyuA), ARM47 and ARM83 also acquired multiple virulence loci, including aerobactin (iucABCD and iutA) and the hypermucoidy locus rmpADC (ARM47 has incomplete rmpA). Our findings suggest that transmission of K. pneumoniae ST307 may have occurred between multiple hospitals across Armenia. At the same time, we speculate that some isolates may have obtained plasmids carrying high virulence genes and AMR genes during the transmission in one of the hospitals and have formed stable transmission within the hospital. Since the outbreak of the COVID-19 pandemic, the SARS-CoV-2 has seriously threatened global public health and caused huge economic losses. The omics study of SARS-CoV-2 can help understand the interaction between virus and host, thereby providing a new perspective for the intervention and treatment of the virus. Since large amount of SARS-CoV-2 omics data have been accumulated in public databases, this study intends to identify key host factors involved in SARS-CoV-2 infection through systematically integration of transcriptome and interactome data. Through manually curated from published studies, we obtained a comprehensive SARS-CoV-2–human protein-protein interactions network, comprising 3591 human proteins interacted with 31 SARS-CoV-2 virus proteins. Using the RobustRankAggreg method we identified 123 multi-cell lines common genes (CLCGs), where 115 up-regulated CLCGs showed host enhanced innate immunity and chemotactic response signature. Combined with the network analysis, co-expression and functional enrichment analysis, we discovered 4 key host factors: IFITM1, SERPINE1, DDX60 and TNFAIP2. Furthermore, SERPINE1 was found to alleviate the endoplasmic reticulum (ER) stress induced by ORF8 protein through interaction with ORF8, and can facilitate SARS-CoV-2 replication. In summary, omics-scale data analysis has played an important role in the study of emerging pathogenic pathogens. The first two parts of this thesis revealed the genomic features and phylogeny of two sub-types of emerging K. pneumoniae (ST967 and ST307) isolates in Armenia by genomic level analysis, respectively. K. pneumoniae ARM01 (ST967) was multidrug resistant, carried multiple AMR genes, and might be associated with strains prevalent in Qatar. Four ST307 isolates carried multiple AMR genes and high virulence related factors, two of which also acquired the colistin-resistance gene mcr-8.1 and our findings suggested that both intra-hospital and inter-hospital transmission of K. pneumoniae ST307 might have occurred in Armenia. These two parts of the study highlights the importance of genomic and epidemiological surveillance of these emerging MDR pathogens in order to provide an early warning and suggestions for the prevention and control measures. The third part of this thesis identified four key host factors involved in SARS-CoV-2 infection through an integrated analysis of SARS-CoV-2 interactome and transcriptome data in public databases. In-depth functional studies of SERPINE1 revealed that SERPINE1 could interact with the ORF8 viral protein and attenuated ORF8-induced ER stress, and can promote SARS-CoV-2 replication. This part of the study highlights the value of systematically integration analysis in understanding host infection by emerging pathogens and provides new insights for future research on effective therapeutic targets against SARS-CoV-2. |
开放日期: | 2023-06-02 |