论文题名(中文): | 面向慢性糖脂代谢疾病诊疗的医学大数据挖掘与人工智能分析研究 |
姓名: | |
论文语种: | chi |
学位: | 博士 |
学位类型: | 学术学位 |
学校: | 北京协和医学院 |
院系: | |
专业: | |
指导教师姓名: | |
校内导师组成员姓名(逗号分隔): | |
论文完成日期: | 2024-04-24 |
论文题名(外文): | Research on medical big data mining and artificial intelligence analysis for the diagnosis and treatment of chronicmetabolic diseases involving glucose and lipid metablism |
关键词(中文): | |
关键词(外文): | Chronic metabolic diseases involving glucose and lipd metabolism Adverse drug reactions Medical big data mining Natural language processing Artificial intelligence analysis |
论文文摘(中文): |
【研究背景】 全球慢性糖脂代谢疾病患病率持续上升,患者多药物长期使用增加了药物不良反应风险,生活方式等是影响这类疾病的重要因素。随着信息技术和医学数据库的发展,医学大数据和人工智能分析的应用为预防和管理这些疾病提供了新的可能。 【研究目的】 本研究旨在利用现有的医学大数据挖掘技术和人工智能分析方法,深入探索慢性糖脂代谢疾病的药物不良反应及相关影响因素,以提高治疗的安全性和管理的有效性,优化慢性代谢疾病诊疗管理方案。 【研究方法】 研究利用自然语言处理技术挖掘海量的中文电子病历非结构化数据,采用深度学习的大语言模型融入医学知识增强的方法,利用现有大语言模型资源,构建在中文电子病历非结构化文本数据中抽取常用降糖药物不良反应信息的抽取技术体系。研究也挖掘医学文献中的随机对照试验(Randomized controlled clinical trial, RCT)研究数据和药物警戒数据库(FAERS数据库)的真实世界数据,检测新型降糖药物肠促胰素类药物(GLP-1受体激动剂和DPP-4抑制剂)的不良反应。挖掘和分析美国哈佛医学院的电子病历及中国健康与营养调查(China Health and Nutrition Survey, CHNS)数据库,探究慢性糖脂代谢疾病相关影响因素。 【研究结果】 研究成功构建了用于中文电子病历非结构化数据的基于医学知识增强的大语言模型药物不良反应信息抽取系统,研究中二甲双胍药物不良反应抽取效能的F1值达0.8698。在挖掘RCT研究数据中,结果显示GLP-1受体激动剂和DPP-4抑制剂会增加胆囊或胆道疾病发生风险,但绝对风险值较小;挖掘真实世界数据所得到结果与之一致。哈佛医学院的电子病历数据分析显示减肥手术讨论可能是患者心血管事件、糖尿病的保护因素;基于CHNS数据库的中国人群数据,研究发现较长睡眠时长与炎症因子升高相关,而皮下脂肪、工作活动量分别可能是全因死亡率、糖尿病的保护因素。 【研究结论】 基于医学知识增强的大语言模型能够相对有效地抽取非结构化中文电子病历中的药物不良反应信息,有助于实现药物不良反应的主动监测,增强药物治疗的个体化和安全性。结合大型RCT研究数据和上市后的真实世界数据有利于检测慢性代谢性疾病新上市药物的未知的不良反应。充分挖掘电子病历数据和大规模人群的医学公共数据库,能有效识别慢性代谢性疾病关键生活方式因素,有助于优化和提高慢性糖脂代谢疾病患者的生活方式干预的管理方案。 |
论文文摘(外文): |
Backgrounds The global prevalence of chronic metabolic diseases involving glucose and lipid metabolism is on the rise, with patients often requiring long-term use of multiple medications, increasing the risk of adverse drug reactions. Lifestyle factors are significant influencers of these diseases. Advances in information technology and the development of medical databases have enabled the application of medical big data and artificial intelligence analysis, offering new possibilities for the prevention and management of these conditions. Objectives This study aims to utilize existing medical big data mining technologies and artificial intelligence analysis methods to deeply explore the adverse drug reactions and related factors of chronic metabolic diseases. The objectives are to enhance the safety and effectiveness of treatments and to optimize management for chronic metabolic diseases. Methods The research utilizes natural language processing technology to mine a vast amount of unstructured data from Chinese electronic medical records. It incorporates enhanced medical knowledge using deep learning's large language models to construct a system for extracting information about adverse reactions to common hypoglycemic drugs from these records. The study also mines randomized controlled clinical trials (RCTs) from public medical databases and real-world data from the FDA’s Adverse Event Reporting System (FAERS) to detect adverse reactions of new hypoglycemic drugs, including GLP-1 receptor agonists and DPP-4 inhibitors. Additionally, the study analyzes the impact of lifestyle and body composition on chronic metabolic diseases using data from the electronic health records of Harvard Medical School and the China Health and Nutrition Survey (CHNS). Results The research successfully developed a drug adverse reaction information extraction system from unstructured Chinese electronic medical records, enhanced with medical knowledge, achieving an F1-score of 0.8698 in extracting metformin’s adverse reactions. In mining data from RCTs, the results showed that GLP-1 receptor agonists and DPP-4 inhibitors could increase the risk of gallbladder or biliary diseases, although the absolute risk is relatively small; The findings from real-world data mining were consistent with these results. Using the population data from the CHNS database, we observed that longer sleep durations are associated with higher levels of inflammatory markers, and subcutaneous fat and physical activity at work may be protective factors against all-cause mortality and diabetes. Analysis of electronic medical records from Harvard Medical School suggests that discussions about weight loss surgery may be protective factors against cardiovascular events and diabetes. Based on data from the CHNS database, we observed that longer sleep durations are associated with higher levels of inflammatory markers, and subcutaneous fat and physical activity at work may be protective factors against all-cause mortality and diabetes. Conclusions The large language model, enhanced with medical knowledge, proved to be relatively effective in extracting information about adverse drug reactions from unstructured Chinese electronic medical records. This facilitates active monitoring of adverse drug reactions and enhances the individualization and safety of drug treatments. Integrating large RCT data with post-marketing real-world data is beneficial in detecting unknown adverse reactions of newly marketed drugs for chronic metabolic diseases. Thoroughly mining large-scale electronic health records and public medical databases effectively identifies key lifestyle factors, aiding in optimizing and improving lifestyle intervention management strategies for patients with these diseases. |
开放日期: | 2024-06-20 |