论文题名(中文): | 面向人口健康科学数据共享需求的数据敏感度评估研究 |
姓名: | |
论文语种: | chi |
学位: | 硕士 |
学位类型: | 学术学位 |
学校: | 北京协和医学院 |
院系: | |
专业: | |
指导教师姓名: | |
校内导师组成员姓名(逗号分隔): | |
论文完成日期: | 2021-05-24 |
论文题名(外文): | Research on Data Sensitivity Assessment for Population Health Data Sharing Needs |
关键词(中文): | |
关键词(外文): | Population Health Scientific Data Sensitive Information Identification Data Sensitivity Data Sharing |
论文文摘(中文): |
科学数据是国家科技创新和可持续发展的重要战略资源。作为最活跃的科学研究领域之一,人口健康领域的科学数据被广泛应用于药物研发、疫情监测、公共健康监控、临床实验数据分析、药械安全性与有效性以及卫生经济学评价等多方面。数据共享是充分发挥人口健康科学数据的科学价值、社会价值和经济价值的重要手段和方式之一,一般通过共享平台这一关键基础设施组织科学数据和提供共享服务。然而,人口健康科学数据类型复杂,形式多样,通常包含个体身份标识、诊疗结果、医疗费用等敏感信息,具有隐私性强的专业特殊性,一旦泄露可能给国家、社会、个人带来巨大危害,因此敏感信息保护和数据安全问题成为人口健康科学数据共享的制约关键,这也对共享平台的敏感信息管控提出了更高要求。 近年来,我国对人口健康科学数据管理和共享十分重视,针对共享平台的建设工作也取得很大进步。由国家科技部和财政部认证的国家人口健康科学数据中心建立了人口健康科学数据仓储(Population Health Data Archive,PHDA),这是我国最大的人口健康科技资源共享服务平台,承担国家人口健康领域科学数据整合汇交和共享服务任务,其要实现数据的分级管理与安全共享,需要对用户提交的数据进行敏感信息检测以及敏感度评估。 因此,本研究针对人口健康科学数据安全共享中的敏感信息保护问题,结合国家人口健康科学数据仓储PHDA建设目标,深入研究敏感信息检测与敏感度评估需求,旨在探索一套数据敏感度评估方法,以期为人口健康科学数据共享平台中的数据分级管理与安全共享工作提供参考。具体而言,本研究主要工作包括以下3部分: 一是面向PHDA进行了人口健康科学数据共享中数据敏感度评估需求分析,在此基础上明确研究内容。首先从数据类型与体量、数据组织结构等方面分析了PHDA的数据特点,又从敏感信息管控流程以及利益相关方需求调研方面分析了数据敏感度检测与评估相关需求,最终明确了数据敏感度评估研究的评估对象、评估内容和评估场景,为后续工作的有序开展奠定了基础。 二是提出了一套面向人口健康科学数据共享需求的数据敏感度评估方法框架。该方法基于《健康保险携带和责任法案》(Health Insurance Portability and Accountability Act,HIPAA)、《信息安全技术个人信息安全规范》等政策法规要求细化人口健康科学数据中的敏感信息范畴,面向敏感信息的内容特点、形式特征等构建敏感信息识别词典库和规则库,在元数据、数据项和数据值层次上进行敏感特征识别与分析,并基于标识程度特征和泄露损失程度特征设置了统一的敏感度评估标准,基于此进行数据敏感度计算,最后针对每个数据集生成数据敏感度评估报告,标记、描述和揭示数据集中敏感信息情况,用于为人口健康科学数据的分级管理与安全共享工作提供参考。 三是进行了实验与专家评估。本文应用PHDA中真实世界数据集进行数据敏感度评估,就生成的敏感度评估报告邀请专家进行打分,以此验证本文所提出数据敏感度评估方法的应用效果,评估该方法的可行性、科学性、实用性与有效性。 本研究的创新之处在于构建了一整套面向我国人口健康科学数据共享需求的数据敏感度评估框架,针对PHDA人口健康科学数据基于敏感信息识别和特征分析进行数据敏感度评估。具体而言确定了敏感信息识别范畴,构建了敏感信息识别词典库和规则库,并设置了统一的人口健康科学数据敏感度评估标准。本研究的理论价值在于为数据安全共享和敏感信息保护研究提供理论支持,应用价值在于可识别PHDA人口健康科学数据中的敏感信息,且为数据管理者进行数据敏感度评估提供参考。 |
论文文摘(外文): |
Scientific data is an important strategic resource for national scientific and technological innovation and sustainable development. Scientific data in the field of population health,one of the most active scientific research fields, is widely used in drug research and development, epidemic situation monitoring, public health monitoring, public health monitoring, clinical experiment data analysis, drug safety and effectiveness, health economics evaluation and other aspects. Data sharing is one of the important means to give full play to the scientific value, social value and economic value of population health scientific data. Generally, scientific data is organized and shared through the key infrastructure sharing platform. However, population health science data has complex types and diverse forms. It usually contains sensitive information such as individual identification, diagnosis and treatment results, and medical expenses. It has strong privacy and professional specificity. Once it is leaked, it may bring great harm to the country, society and individuals. Therefore, the protection of sensitive information and data security become the constraints of population health science data sharing, which also puts forward higher requirements for the control of sensitive information of sharing platform. In recent years, more and more attention has been paid to the management and sharing of population health scientific data, and great progress has been made in the construction of the sharing platform. The national population health data archive (PHDA), which is certified by the Ministry of science and technology and the Ministry of finance of the people's Republic of China, is the largest population health science and technology resource sharing service platform in China. It undertakes the task of integration, collection and sharing of scientific data in the field of national population health, and realizes the hierarchical management and safe sharing of data, It is necessary to detect and evaluate the sensitive information of the data submitted by users. Therefore, this study focuses on the protection of sensitive information in the security sharing of population health science data, combined with the construction goal of national population health science data repository PHDA, deeply studies the needs of sensitive information detection and sensitivity assessment, and aims to explore a kind of data sensitivity assessment method to provide a reference for the hierarchical data management and security sharing of population health science data sharing platform. Specifically, the main work of this study includes the following three parts. Firstly, the data sensitivity evaluation requirements analysis was carried out for PHDA, and the research content of this sensitivity assessment was clarified on this basis. At first, the data characteristics of PHDA were analyzed from data type, volume and data organization structure, and the status quo of PHDA sensitive information management and control was analyzed from the aspects of sensitive information control process and stakeholder demand investigation. On this basis, the corresponding data sensitivity assessment related requirements were summarized. Finally, the evaluation object, the content of the data sensitivity evaluation research and assessment scenario are defined, which lays the foundation for the orderly development of the follow-up work. Secondly, the data sensitivity evaluation method for the sharing needs of population health data was designed. Based on the requirements of policies and regulations such as Health Insurance Portability and Accountability Act (HIPAA) and Information security technology—Personal information security specification, this method refines the category of sensitive information in population health science data, and constructs a sensitive information recognition dictionary and rule base for the content and formal characteristics of sensitive information types. Sensitive features are identified and analyzed at the level of metadata, data item and data value. In addition, a unified sensitivity evaluation standard is set from the two aspects of identification degree and leakage loss degree. Based on that, data sensitivity is calculated. Finally, a data sensitivity evaluation report is generated for each data set to mark, describe and reveal the sensitive information in the data set, It is used to provide reference for hierarchical management and safety sharing of population health scientific data. Thirdly, demonstration and expert evaluation was conducted. This study used the real world data set in PHDA to evaluate the data sensitivity, and experts are invited to evaluate the generated sensitivity evaluation report, so as to verify the application effect of the proposed data sensitivity evaluation method, and evaluate the feasibility, scientificity, practicability and effectiveness of the method. The innovation of this research lies in the construction of a kind of data sensitivity assessment framework for population health science data sharing needs in China, which meets the needs of PHDA for data sensitivity assessment based on sensitive information identification and feature analysis. Specifically, the category of sensitive information was determined, the sensitive information identification dictionary database and rule database were constructed, and a unified population health science data sensitivity evaluation standard was set. The theoretical value of this study is to provide theoretical support for data security sharing and sensitive information protection research, and the application value is to identify sensitive information in PHDA population health science data, and to provide a reference for data managers to evaluate data sensitivity. |
开放日期: | 2021-06-08 |