论文题名(中文): | 先天性脊柱侧凸二代测序数据分析系统的研发 |
姓名: | |
论文语种: | chi |
学位: | 博士 |
学位类型: | 学术学位 |
学校: | 北京协和医学院 |
院系: | |
专业: | |
指导教师姓名: | |
校内导师组成员姓名(逗号分隔): | |
论文完成日期: | 2021-04-28 |
论文题名(外文): | Design and development of second-generation sequencing data analysis system for congenital scoliosis |
关键词(中文): | |
关键词(外文): | Congenital scoliosis CSF1R TBX6 cost-effectiveness analysis Second-generation sequencing data analysis |
论文文摘(中文): |
研究背景: 先天性脊柱侧凸(Congenital Scoliosis,CS)是一种由胚胎时期体节发育异常导致的多病因复杂的脊柱三维畸形,典型临床表现为患者脊柱在冠状面向左或向右侧方的弯曲角度超过10度,同时可有腰背痛、剃刀背、胸廓畸形、心肺功能下降等临床表现,严重时可导致永久瘫痪甚至心、肺功能衰竭而死亡。目前CS的病因尚不完全明确,缺乏对其发病和进展的预测手段,主要的临床治疗方法为保守治疗或手术矫形以控制侧凸的进展。但往往发病就诊时,患者已出现较严重畸形,因而面临较大手术风险,有时需要多次手术矫形,对患者及家庭造成沉重的经济和精神负担。因此,探索CS的病因,构建CS预警筛查和精准诊疗体系成为当前国际的研究热点。 近年来,二代测序技术(Next generation sequencing, NGS)得到突飞猛进的发展,国际上针对不同环节的、不同样本类型的变异检测和分析软件层出不穷,NGS在临床诊断中也发挥越来越重要的作用。但现有二代测序数据分析系统多为国外所搭建,缺乏国人自主开发的针对骨骼畸形疾病的分析系统。同时,现有数据分析系统功能较为单一,缺乏兼具检测、解读与临床评估功能,既能完成单样本分析又能完成人群队列分析的系统。 随着CS候选致病基因及突变位点的不断发现,遗传因素在CS发病中所扮演的角色愈发被人们重视。此前已有研究报道,TBX6基因的复合杂合遗传模式可解释约10%的CS患者。基于对CS这一重要致病基因的揭示,此类CS患者被定义为TBX6相关性的先天性脊柱侧凸(TBX6-associated congenital scoliosis, TACS),TACScore可用来评估此类患者。除此以外,还有MESP2等CS致病基因陆续被发现。尽管如此,依然有约80%的CS患者没有得到分子诊断。因此,一方面随着样本队列的积累,基于人群队列关联分析从已知基因的角度累加罕见变异信号,有望发现CS相关的罕见突变。另一方面,在单样本新基因角度,CSF1R基因通过隐性遗传机制导致患者出现骨硬化、扁平椎等临床表型,结合csf11r突变在小鼠中的多效性影响,提示CSF1R基因突变可能是CS的潜在病因。目前尚未有关于CSF1R基因突变在CS队列致病性评估的研究。 在本研究中,我们自主搭建二代测序数据分析系统,以揭示更多CS的致病基因/突变及相关机制,从而进一步扩大CS的疾病突变谱并提高临床的诊断率,同时通过成本效益分析探索CS遗传病因的基础挖掘对临床诊疗的影响,为CS的筛查、诊治提供新的指导。 研究目的: 自主搭建针对骨骼畸形的涵盖检测、解读与临床评估功能的二代测序数据分析系统,利用北京协和医院CS队列评估基于已知基因风险预测模型的成本效益,发现TBX6介导基因和CSF1R基因的致病突变,通过体内体外功能实验探索CSF1R基因在CS的致病机制,为CS遗传病因挖掘和临床诊疗应用提供技术和经济学支持。 研究方法: 以文件重命名模块、质量控制模块、序列比对模块、突变检测模块、突变注释模块、打分评级模块、过滤模块和突变评论备注模块构建散发及家系样本二代测序数据突变检测分析系统;以突变检测模块、突变过滤模块、样本过滤模块和负荷分析模块四个模块构建人群队列样本突变负荷分析系统。完善系统下游临床评估功能模块,从诊断花费和诊断时间的角度评估以WES为首选检测和以TACScore为筛查检测的成本效益。根据严格的纳入和排除标准,基于北京协和医院DISCO队列中CS患者进行了全外显子组测序(Whole Exome Sequencing, WES)及数据分析,评估、筛选TBX6介导基因中的变异及CSF1R基因致病性突变并进行Sanger测序验证。体外实验部分通过蛋白质免疫印迹实验评估CSF1R蛋白表达变化;体内实验部分通过在斑马鱼体内过表达人CSF1R变异,观察斑马鱼脊柱表型。 研究结果: 二代测序数据分析系统具有良好的敏感性和特异性。在成本效益分析中,无论从医疗保健支付角度还是从个人预算角度,相比使用TACScore进行筛查,将WES作为第一检测方法的策略都需要更长的诊断时间以及更昂贵诊断成本。在TBX6介导基因的突变负荷分析中,分别从两个样本中鉴定出RIPPLY1和MYF5的截短突变,以及五个位于MYOD1和RIPPLY2基因的有害错义突变。此外,观察到MYOD1 /MEOX1和MYOD1 / RIPPLY1的潜在寡基因致病模式。从3例先天性脊柱侧凸患者中,鉴定出三种CSF1R杂合变异,包括2个移码突变和1个新的错义突变。体外实验证明这三种变异的累积水平都高于野生型,表明突变型CSF1R蛋白的稳定性增强。斑马鱼体内实验证明过表达CSF1R mRNA(NM_005211.3: c.2749_2758delGACAGGAGAG)会出椎体融合、半椎体、椎弓融合等畸形,再现人类CS表型。 研究结论: 本文构建了高效、灵敏的先天性脊柱侧凸二代测序数据分析系统,证明了系统风险预测模型的成本效益。针对TBX6介导的基因突变,我们发现了其潜在的寡基因致病模式。针对CSF1R的突变,基于细胞水平的体外功能实验结果和斑马鱼体内功能实验结果,我们发现CSF1R的突变可以通过扰乱蛋白的羧基端区,对CSF1R的生物学功能产生功能获得效应导致先天性脊柱侧凸的发生。本研究扩大了先天性脊柱侧凸的基因突变谱,初步解释了CSF1R基因突变在CS的致病机制,提高了其分子诊断检测率,为CS的基因筛查临床应用提供了经济学证据。此外还为骨骼疾病甚至其他系统疾病的二代测序数据分析解读提供了系统的、科学的方法与策略。 |
论文文摘(外文): |
Background: Congenital Scoliosis (CS) is a complex three-dimensional deformity of the spine caused by abnormal somitogenesis during the embryonic period, typically with more than 10°coronal Cobb angle. As a result of abnormal vertebral development, patients may have low back pain, razor back, thoracic deformity, cardiopulmonary function decline and other clinical manifestations. In severe cases, the disease can lead to permanent paralysis and even death from heart or lung failure. At present, the etiology of CS is not completely clear, and there is a lack of means to predict its incidence and progress. The main clinical treatment methods are conservative treatment or surgical correction to control the progress of scoliosis. However, patients often have severe deformities at the time of diagnosis and treatment, so they are faced with greater risk of surgery. Sometimes, multiple operations are needed, causing heavy economic and mental burden to patients and their families. Therefore, exploring early diagnosis methods and effective etiological intervention targets of CS is the current international research hotspot. In recent years, genetic sequencing technology, especially the second generation sequencing, is developing by leaps and leaps. There are multiple variation detection and analysis software for different links and samples in the world, and second generation sequencing also plays an increasingly important role in clinical diagnosis. However, the existing second-generation sequencing data analysis system is mostly built by foreign countries, and there is a lack of self-developed analysis system for bone deformity diseases. At the same time, the function of the existing data analysis system is relatively simple, which cannot complete both detection, interpretation and clinical evaluation, or both single sample analysis and population analysis. With the discovery of more and more candidate pathogenic genes of CS, genetic factors are considered to be an important cause of CS. Previous studies have reported that about 10% of CS patients in Chinese Han population are caused by the complex heterozygous inheritance pattern of TBX6 gene. Such CS patients were defined as TBX6-associated congenital scoliosis (TACS) based on the disclosure of this important disease-causing gene, and TACScore could be used to assess such patients. In addition, other CS-causing genes such as MESP2 have been discovered in succession. Still, about 80% of patients with CS do not receive a molecular diagnosis. Therefore, on the one hand, with the accumulation of sample cohort, association analysis based on population accumulates rare mutation signals from the perspective of known genes, and rare mutations related to CS are expected to be found. On the other hand, from the perspective of new genes and single sample, CSF1R gene leads to clinical phenotypes such as osteosclerosis and flattened vertebra in patients through recessive genetic mechanism. Combined with the pluripotency effect of CSF1R mutation in mice, it is suggested that CSF1R mutation may be the potential cause of CS. Currently, there are no studies on the pathogenicity assessment CSF1R variants in the CS cohort. In this study, we set up the second generation sequencing data analysis system, in order to reveal more CS pathogenic genes/mutations and relevant mechanisms, so as to further expand the CS spectrum and improve the diagnosis of clinical disease mutations. At the same time, we use cost-effectiveness analysis to explore the clinical application of CS genetic etiology, in order to providing a new guidance for screening, diagnosis and treatment of CS. Objects: We aim to provide technical and economic support for genetic etiology and clinical diagnosis of CS. A second-generation sequencing data analysis system covering detection, interpretation and clinical evaluation functions for bone deformities was independently built. The CS cohort from Peking Union Medical College Hospital was used to evaluate the cost-effectiveness based on the known gene risk prediction model, and find pathogenic mutations of TBX6-mediated genes and CSF1R genes. In vivo and in vitro functional experiments were conducted to explore the pathogenic mechanism of CSF1R gene in CS. Methods: The file renaming module, quality control module, sequence alignment module, mutation detection module, mutation annotation module, rating module, filtering module and mutation comment annotation module were used to build the mutation detection and analysis system for second generation sequencing data of sporadic and trio samples. The mutation detection module, mutation filter module, sample filter module and burden analysis module were used to construct the burden analysis system of population cohort samples. Then, the downstream clinical evaluation function module of the system was improved to evaluate the cost-effectiveness of WES as the first-line genetic test and TACScore as the screening test from the perspective of diagnosis cost and diagnosis time. According to strict inclusion and exclusion criteria, based on CS patients from PUMCH, we performed whole exome sequencing and data analysis to evaluate and screen for pathogenic variants of TBX6-mediated genes and CSF1R and performed Sanger sequencing for verification. In vitro experiments were used to evaluate the expression of CSF1R protein. In vivo functional experiment were used to observe zebrafish spine phenotypes. Results: The second-generation sequencing data analysis system has good sensitivity and specificity. In the cost-effectiveness analysis, the strategy using WES as the first test requires longer diagnostic time and more expensive diagnostic costs than using TACScore for screening, both from a healthcare payment perspective and an individual's budget perspective. In burden analysis of TBX6-mediated genes, we identified truncating mutations in RIPP1 and MYF5 and five deleterious missense mutations in MYOD1 and RIPP2. In addition, potential oligogenetic patterns of MYOD1 /MEOX1 and MYOD1 / RIPPLY1 were observed. We identified three CSF1R heterozygous variants from 3 unrelated CS patients, including 2 frameshift mutations and 1 new missense mutation. In vitro experiments showed that the cumulative levels of these three variants were higher than those of the wild type, indicating that the stability of mutant CSF1R protein was enhanced. In vivo experiments in zebrafish have demonstrated that overexpression of CSF1R mRNA (NM_005211.3: C. 2749_2758delGACAGGAGAG) will produce vertebral fusion, hemivertebra, vertebral arch fusion and other deformations, which is the same as human CS phenotype. Conclusion: In this paper, we constructed an efficient and sensitive second-generation sequencing data analysis system for congenital scoliosis, and the risk prediction model of the system was proved to be cost-effective. We identified a potential oligogenetic pathogenicity pattern for TBX6-mediated gene mutations. And evidence for protein C-terminal region disruption and gain-of-function mechanism were found as molecular mechanisms for CSF1R frameshift and missense mutations. This study also expanded the gene mutation spectrum of congenital scoliosis, improved the detection rate of its molecular diagnosis and provided economic evidence for the clinical application of genetic screening of CS. In addition, it also provides systematic and scientific methods and strategies for the analysis and interpretation of the second-generation sequencing data of skeletal diseases and even other systemic diseases. |
开放日期: | 2021-06-01 |