论文题名(中文): | 基于中文电子病历的肿瘤知识图谱构建研究——以消化系统肿瘤为例 |
姓名: | |
论文语种: | chi |
学位: | 硕士 |
学位类型: | 学术学位 |
学校: | 北京协和医学院 |
院系: | |
专业: | |
指导教师姓名: | |
论文完成日期: | 2019-04-29 |
论文题名(外文): | Construction of Tumor Knowledge Graph based on Chinese Electronic Medical Records--A Case Study of Digestive System Tumor |
关键词(中文): | |
关键词(外文): | Chinese Electronic Medical Records Knowledge Graph Digestive System Tumor Graph Drawing Graph Evaluation |
论文文摘(中文): |
近年来,全球恶性肿瘤发病率和死亡率持续升高,如何利用已有的诊疗经验进行归纳总结,挖掘潜在的、有效的诊疗关系,以加强恶性肿瘤防治工作,成为医务工作者迫切需要解决的问题。随着我国医药卫生信息化的发展,各大医院已经积累了丰富的中文肿瘤电子病历。电子病历中蕴含着丰富的医学事实,然而其非结构化的文本结构,包含大量的医学专业术语、缩略语等特点,给大数据环境下电子病历的组织和利用带来极大的挑战。知识图谱作为人工智能的重要组成部分,具有强大的信息处理和知识组织能力,为该问题的解决提供了新途径。 针对中文电子病历肿瘤知识图谱构建需求,本研究结合肿瘤疾病和中文肿瘤电子病历的结构、语言特点,提出一套完整的基于中文电子病历的肿瘤知识图谱构建框架,为肿瘤知识图谱构建提供思路。论文以消化系统肿瘤为例,设计并构建了消化系统肿瘤知识图谱,采用定量评估和专家评估相结合的方式,对消化系统肿瘤知识图谱进行了质量评估。具体来说,本研究的主要工作包括以下四部分: (1)系统梳理了国内外知识图谱研究现状,借鉴已有研究思路和相关技术,总结现有研究的局限性,包括:1在数据源上,较少使用医院实际临床文本数据;2多关注于数据层面,对图谱模式构建研究不足;3在语义关系上,定义的语义关系较为简单,无法准确表达疾病诊疗过程中医疗事实之间的复杂关联关系;4在自然语言处理工具上,缺乏高效的中文医学文本自然语言处理工具。 (2)提出一套完整的基于中文电子病历的肿瘤知识图谱构建框架。详细分析了肿瘤疾病和中文肿瘤电子病历的结构、语言特点,在定义肿瘤知识图谱设计原则、明确设计思路的基础上,针对现有研究的不足,聚焦于肿瘤知识图谱模式构建研究不足和缺乏语义考虑的问题,结合肿瘤疾病和中文肿瘤电子病历的特点,提出一套完整的基于中文电子病历的肿瘤知识图谱构建框架。 (3)构建了一个包含丰富语义关系的消化系统肿瘤知识图谱。为验证基于中文电子病历的肿瘤知识图谱构建框架的可行性和科学性,本研究采用实证研究的方法,以消化系统肿瘤为例,构建了消化系统肿瘤知识图谱。首先,结合消化系统肿瘤疾病特点,如消化系统肿瘤的病理分期和组织学分型标准,利用斯坦福大学提出的模式构建“七步法”,通过参考i2b2 2010,复用SNOMED CT、NCI叙词表、ICD-10、消化系统肿瘤WHO分类等资源,构建了包含7类实体和15种语义关系的消化系统肿瘤知识图谱模式;然后,结合肿瘤电子病历中包含大量的习惯用语、具有固定的文法和句法、肿瘤疾病相同类型的实体成对出现等特点,引入实体组的概念,分别采用基于规则和BiLSTM-CRF模型相结合的方式及BiGRU-Attention模型对消化系统肿瘤电子病历进行命名实体识别和语义关系抽取;最后,采用分层、分批实体对齐的策略实现图谱数据对齐,并将数据存储在Neo4j图形数据库中,完成对基于中文电子病历的消化系统肿瘤知识图谱构建。 (4)开展了消化系统肿瘤知识图谱质量评估。采用定量评估和专家评估相结合的方式,从数据层、模式层和应用层三个方面对消化系统肿瘤知识图谱进行质量评估。评估结果表明,本研究构建的消化系统肿瘤知识图谱数据较为全面、可靠,图谱模式结构合理,能够全面、清晰地展示电子病历文本内容,便于用户进行语义搜索,研究构建的基于中文电子病历的肿瘤知识图谱构建框架具有一定的科学性和实用性。 |
论文文摘(外文): |
In recent years, the incidence and mortality of malignant tumors have been increasing all over the world. How to use the existing experience of diagnosis and treatment to sum up and explore the potential and effective relationship between diagnosis and treatment, so as to strengthen the prevention and treatment of malignant tumors? It has become an urgent problem for medical workers to solve. With the development of medical and health information in China, major hospitals have accumulated a wealth of EMRs (Electronic Medical Records). EMRs contain rich medical facts, but its unstructured text structure, including a large number of medical terms, acronyms and other characteristics, brings great challenges to the organization and utilization of EMRs under the environment of big data. As an important part of artificial intelligence, knowledge graph has strong ability of information processing and knowledge organization, which provides a new way to solve this problem. In order to meet the needs of tumor knowledge graph construction based on CEMRs (Chinese Electronic Medical Records), this study puts forward a complete framework of tumor knowledge graph based on CEMRs according to the structure and language characteristics of tumor disease and Chinese tumor electronic medical records. It provides ideas for the construction of tumor knowledge graph. Taking digestive system tumor as an example, this study designs and constructs the knowledge graph of digestive system tumor, and evaluates the quality by means of quantitative evaluation and expert evaluation. Specifically, the main work of this study includes the following four parts: (1) Systematically comb the research status of knowledge graph at home and abroad, draw lessons from the existing research ideas and related technologies, and summary the limitations of the existing research, including: 1In data sources, less use of clinical data, especially CEMRs; 2The current research is mostly focused on the data level, but the research on construction of knowledge graph schema is not enough; 3In terms of semantic relationship, the defined semantic relationship is relatively simple and can not accurately express the complex relationship between medical facts in the process of disease diagnosis and treatment; 4In natural language processing tools, lack of efficient natural language processing tools for Chinese medical texts. (2) Propose a framework of tumor knowledge graph based on CEMRs. The structure and language characteristics of tumor diseases and CEMRs are analyzed in detail. On the basis of defining the design principles of tumor knowledge graph and defining design ideas, in view of the shortcomings of the existing research, focusing on the lack of research on the construction of tumor knowledge graph schema and the lack of semantic consideration, combined with the characteristics of tumor diseases and CEMRs, a complete framework of tumor knowledge graph construction based on CEMRs is proposed. (3) Construct a knowledge graph of digestive system tumor with rich semantic relations. In order to verify the feasibility and science of the construction framework of tumor knowledge graph based on CEMRs, this study uses the method of empirical research to construct the knowledge graph of digestive system tumor by taking digestive system tumor as an example. First of all, combined with the characteristics of digestive system tumor diseases, such as pathological staging and histological classification of digestive system tumors, using the model construction "seven steps" proposed by Stanford University to construct the knowledge graph schema of digestive system tumor by referring to i2b2 2010, SNOMED CT, NCI thesaurus, ICD-10 and WHO classification of digestive system tumors. The schema includes 7 kinds of entities and 15 kinds of semantic relations. Combined with the characteristics of tumor electronic medical records, such as a large number of idioms, fixed grammar and syntax, and the same type of tumor entity appearance in pairs, this study introduces the concept of entity group. The method based on rule-based and BiLSTM-CRF model is used to named entity recognition, BiGRU-Attention model is used to extract semantic relationships from EMRs of digestive system tumors. Finally, the data alignment is realized by using the strategy of hierarchical and batch entity alignment. The study uses Neo4j graphic database to store and manage data. (4) Carry out the quality evaluation of knowledge graph of digestive system tumor. By means of quantitative evaluation and expert evaluation, the quality of digestive system tumor knowledge graph is evaluated from three aspects: data layer, model layer and application layer. The evaluation results show that the data of digestive system tumor knowledge graph constructed in this study is more comprehensive and reliable, the schema structure of the graph is reasonable, the content of electronic medical record text can be displayed comprehensively and clearly, and it is convenient for users to search semantics. The framework of tumor knowledge graph based on CEMRs is scientific and practical. |
开放日期: | 2019-06-04 |