| 论文题名(中文): | 基于AI辅助的精准紧凑型腺嘌呤碱基编辑器的设计 |
| 姓名: | |
| 论文语种: | chi |
| 学位: | 硕士 |
| 学位类型: | 学术学位 |
| 学校: | 北京协和医学院 |
| 院系: | |
| 专业: | |
| 指导教师姓名: | |
| 论文完成日期: | 2025-05-30 |
| 论文题名(外文): | Protein-Nucleic Acid Constrained Language Model Assisted Design of Precise and Compact Adenine Base Editor |
| 关键词(中文): | |
| 关键词(外文): | Adenine base editor (ABE) Protein-Nucleic Acid Language model precision editing off-target effects gene therapy |
| 论文文摘(中文): |
研究背景 碱基编辑器(base editor, BE)是在CRISPR系统基础上开发的一种基因编辑工具,能够在不引起DNA双链断裂的情况下,实现对特定碱基的转换或者颠换,显著降低了由于非同源末端连接(Non-Homologous End Joining,NHEJ)修复途径导致的插入缺失(indels)频率。其中,腺嘌呤碱基编辑器(adenine base editors,ABEs)可在基因组DNA上实现高效的A-to-G碱基转换,能够纠正单核苷酸变异(single nucleotide variants, SNV),在单基因遗传疾病的临床治疗中展现出巨大的应用前景。尽管ABEs在基础研究中已取得重大进展,但在实际应用中仍面临较大挑战。例如ABEs仍存在较高频率的旁观者编辑和脱靶效应。而由于ABEs体积较为庞大,限制了通过病毒载体进行体内递送的效率。而传统进化和筛选方式费时费力,应用人工智能可降低设计门槛,有望以此开发出新的ABEs。 研究目的 ABEs的局限性极大地影响了临床应用前景,在目前最高效的ABE8e基础上,对工程化改造的腺嘌呤脱氨酶(TadA-8e)再次进行理性设计,引入突变后能够降低旁观者编辑和脱靶效应,但同时也会降低对目标碱基的编辑效率。本研究利用人工智能,开发蛋白质核酸语言模型(Protein-Nucleic Acid constrained Language Model, PNLM)对TadA-8e进行结构改造,以开发出在不影响编辑效率的同时最小化脱靶效应和分子大小的新型ABEs。 研究方法 首先,利用人工智能,结合脱氨酶和核酸底物的结构,从整体的角度开发蛋白质核酸语言模型。为了提高模型的准确性,将目前所有的脱氨酶突变体、脱氨酶的同源蛋白等都加入模型的训练集中。以ABE8e为基础生成一个TadA-8e变体集,筛选出截短变体并进行计算评估,选择排名前二十位TadA-8e变体进行实验验证。通过细胞转染、高通量测序等实验分析ABE变体的实际编辑能力,对符合预期的ABE变体进行排列组合,以获得靶向编辑效率最高、旁观者编辑最少而体积最小的ABE变体。进一步选择大量内源性靶点对综合得出的ABE变体进行表征,并通过GUIDE-seq(Genome-wide Unbiased Identification of Double-strand breaks Evaluated by sequencing)、ChIP-seq(Chromatin Immunoprecipitation Sequencing)和全转录组测序(RNA Sequencing, RNA-Seq)等实验进行脱靶评估,验证获得的ABE变体是否能够有效降低脱靶效应。接下来,在ClinVar数据库中挑选合适的单核苷酸变异遗传病(Single Nucleotide Variant Genetic Disease, SNVGD),构建相应的慢病毒疾病点突变细胞系,在细胞水平验证ABE变体纠正遗传疾病中单核苷酸变异的能力。最后通过构建小鼠疾病动物模型,验证ABE变体在体内是否同样能够实现高效精准的碱基编辑。 研究结果 利用人工智能,从蛋白与核酸整体的结构出发,开发蛋白核酸语言模型。利用该模型设计一系列全新的TadA-8e的突变体,其中25.3 %是截短体,在递送方式上更具优势。因此,在HEK293T细胞中测试评分前20的TadA-8e截短变体,其中有3个(Δ2-8aa、Δ147-152aa、Δ158-167aa)的编辑效率与ABE8e相当,其中缺失147-152aa后缩窄了ABE8e的主要编辑窗口。通过进一步组合并验证,最终得到一款编辑效率与ABE8e几乎一致,编辑窗口由A3-A9缩窄为A5-A7,且体积缩小了27 %的ABE(减少45aa),将其命名为PNLM-pcABE。PNLM-pcABE的编辑精度是ABE8e的1.2-127倍。且与ABE8e相比PNLM-pcABE的indels更低、脱靶效应更接近背景水平。此外,PNLM-pcABE的致病突变修正精度是ABE8e的134倍。通过胚胎显微注射靶向小鼠Tyr基因时,几乎所有小鼠都展现出白化表型,且具有精确编辑的基因型。通过脂质纳米颗粒(Lipid Nanoparticles, LNP)将PNLM-pcABE递送进小鼠体内可以实现在体基因编辑治疗高胆固醇血症。 研究结论 综上所述,本论文开发了新的蛋白质核酸语言模型。利用该模型开发的PNLM-pcABE相较于ABE8e,靶向编辑水平相当,能够高效地实现目标碱基的转换;具备更窄的编辑窗口,有效降低了旁观者编辑;显著降低了indels和脱靶水平,有效提高了工具应用的安全性。在疾病细胞系和动物体内均展现出良好的编辑性能,表明PNLM-pcABE是一款更加安全有效的基因编辑工具,在基因治疗和疾病建模方面具有更为广泛的应用前景。 |
| 论文文摘(外文): |
Research background Base editor (BE), a gene editing tool developed on the base of the CRISPR system, can convert or invert specific bases without causing DNA double-strand breaks. This type of editing significantly reduces the frequency of insertion deletions (indels) due to the Non-Homologous End Joining (NHEJ) repair pathway. Among them, adenine base editors (ABEs) can realize highly efficient A-to-G base conversion on genomic DNA, which can correct single nucleotide variants (SNVs) in certain monogenic genetic diseases, and show great potential for clinical treatment of genetic diseases. The application of ABEs has shown great promise in the clinical treatment of genetic diseases. Although ABEs have made significant progress in basic research, they still face some challenges in advancing clinical applications. For example, ABEs still have high frequency of bystander editing and off-target effects. And the relatively large size of ABEs limits the efficiency of in vivo delivery via viral vectors. Whereas traditional evolution and screening are time-consuming and laborious, the emergence of AI can greatly reduce the design threshold, save time and effort, and is expected to be the means that new ABEs can be developed. Research objectives The limitations of ABEs greatly affect the prospect of clinical applications, and the engineered modified adenine deaminase (TadA-8e) was again rationalized based on the most efficient ABE8e, which was able to reduce bystander editing and off-target effects after the introduction of a mutation, but at the same time reduced the editing efficiency of target bases. In this study, we used artificial intelligence to develop a Protein-Nucleic Acid constrained Language Model (PNLM) to structurally modify TadA-8e to develop a novel adenine base editor that minimizes off-target effects and molecular size without compromising editing efficiency. Research methods First, a set of TadA-8e variants was generated based on ABE8e using a pre-trained Protein-Nucleic Acid constrained Language Model (PNLM), and the truncated variants among them were screened out, and these potential TadA-8e variants were evaluated by a computational evaluation approach evaluated, and the top twenty TadA-8e variants were selected for experimental validation. The actual editing ability of the ABE variants was analyzed by cell transfection, high-throughput sequencing and other experiments, and the ABE variants among them that met the expectations were ranked and combined to obtain the ABE variants with the highest target editing efficiency, the least bystander editing and the smallest size. A large number of endogenous targets were further selected to characterize the ABE variants derived from the synthesis and were identified by GUIDE-seq (Genome-wide Unbiased Identification of Double-strand breaks Evaluated by sequencing), ChIP-seq (Chromatin Immunoprecipitation Sequencing), and Whole Transcriptome Sequencing (RNA Sequencing, RNA-Seq) experiments for off-target evaluation, to verify whether the ABE variants obtained by this method can effectively reduce off-target editing and improve the safety of ABE application. Next, several appropriate Single Nucleotide Variant Genetic Disease (SNVGD) were selected from ClinVar database, and the corresponding disease point mutant cell lines were constructed to verify the ability of ABE variants to correct single nucleotide variants in genetic diseases at the cellular level. Finally, an animal model of the disease was constructed in mice to verify whether the ABE variants are also capable of efficient and precise base editing in vivo. Results Firstly, using artificial intelligence, the protein-nucleic acid language model was developed from a holistic perspective by combining the structures of deaminase and nucleic acid substrates to generate a TadA-8e variant set based on ABE8e, screen out the truncated variants among them, evaluate these potential TadA-8e variants by computational evaluation methods, and select the top twenty ranked TadA-8e variants for experimental validation. The actual editing ability of the ABE variants was analyzed by cell transfection, high-throughput sequencing and other experiments, and the ABE variants among them that met the expectations were ranked and combined to obtain the ABE variants with the highest target editing efficiency, the least bystander editing and the smallest size. A large number of endogenous targets were further selected to characterize the ABE variants derived from the synthesis, and the results were obtained by GUIDE-seq (Genome-wide Unbiased Identification of Double-strand breaks Evaluated by sequencing), ChIP-seq (Chromatin Immunoprecipitation Sequencing), and whole transcriptome sequencing (RNA Sequencing, RNA-Seq) experiments for off-target evaluation, to verify whether the ABE variants obtained by this method can effectively reduce off-target effects and improve the safety of ABE application. Next, several appropriate Single Nucleotide Variant Genetic Disease (SNVGD) were selected from ClinVar database, and the corresponding disease point mutant cell lines were constructed to verify the ability of ABE variants to correct single nucleotide variants in genetic diseases at the cellular level. Finally, targeting the mouse Tyr gene through embryo microinjection could efficiently generate albinism models. Delivering PNLM-pcABE into mice via LNP generated in vivo gene editing for the treatment of hypercholesterolemia. Conclusion In summary, we developed a new protein-nucleic acid language model and utilized it to develop a novel adenine base editor, PNLM-pcABE compared to ABE8e, with comparable levels of target editing, which is able to efficiently achieve the conversion of the target bases; possesses a narrower editing window, which effectively reduces the bystander editing; and significantly reduces the level of indels and off-targets, which effectively improves the tools The safety of the tool is effectively improved by significantly reducing indels and off-target levels. The good editing performance of PNLM-pcABE in disease cell lines and animals indicates that PNLM-pcABE is a safer and more effective gene editing tool, which has a broader application prospect in gene therapy and disease modeling. |
| 开放日期: | 2025-11-07 |