- 无标题文档
查看论文信息

论文题名(中文):

 基于在线学习的ICU患者重度急性呼吸窘迫综合征预测研究    

姓名:

 武俊伟    

论文语种:

 chi    

学位:

 硕士    

学位类型:

 学术学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院医学信息研究所    

专业:

 图书情报与档案管理-情报学    

指导教师姓名:

 李姣    

论文完成日期:

 2022-11-01    

论文题名(外文):

 Prediction of Severe Acute Respiratory Distress Syndrome in ICU based on Online Learning    

关键词(中文):

 在线机器学习 急性呼吸窘迫综合征 生命体征数据 早期预测模型    

关键词(外文):

 Online machine learning Acute respiratory distress syndrome Vital sign data Early prediction model    

论文文摘(中文):

重症医学领域的快速发展及大数据时代的到来,使得相关领域的学者越来越重视数据的收集、整理、共享及应用。其中关于急性呼吸窘迫综合征(ARDS),人们已经通过不同的机器学习方法,从疾病的分类、诊断、危险因素分析、治疗、住院时间、死亡率预测等方面进行研究且各项模型有着不俗的表现。但也存在一些不足之处:针对ARDS疾病严重程度的早期预测方法研究较少,而其中大多数研究是基于固定或静态数据建立相应的模型,忽略了对患者实时产生和波动较大的生命体征数据的分析。由于经典学习算法上的局限性,导致其无法胜任对处于实时波动状态的生命体征时间序列数据的分析处理。另一项不足之处在于,以往的研究选取的特征维度较多,大多数临床指标需要通过有创检查或医疗仪器进行数据采集,建立模型时会存在大量的缺失数据,且无法对患者的疾病情况进行动态评价。因此,本研究将采用AMF在线学习算法构建ARDS疾病严重程度的早期预测模型。以患者每次测量的氧合指数(P/F)为观测原点,倒推不同时间窗口内三种生命体征HR、RR、Temp及其波动值(极值、极差、平均值),用于提前预测患者是否会进展为重度ARDS(P/F≤100mmHg)。该模型结合了在线学习算法的对动态数据实时预测的特点,以及生命体征时序数据的连续性、无创且易获取等优势,更加快速准确的实现对重度ARDS的早期预测。

本研究主要包括以下内容:                

(1)基于在线学习的ARDS预测模型构建。通过文献调研法,对已有的在线学习算法进行调研,并通过目前在线学习算法在医学领域的应用情况,选取在线聚合蒙德里安森林AMF构建本研究的预测模型。通过对AMF模型的原理及相关应用情况的回顾,为后续建立早期ARDS预测模型做进一步的铺垫。结合ARDS患者的不同时间窗口数据,构建基于在线学习的ARDS预测模型。

(2)ARDS患者及其时间窗口数据的提取处理方法。各数据集之间会有包含或不包含ARDS诊断结果的情况存在,因此需要对不同数据集ARDS患者的提取方法进行研究。其次,所筛选患者的三种生命体征特征变量均为时间序列数据,需要充分考虑时间序列数据的记录间隔、在各个时间段内的波动情况,划分最佳的预测时间窗口。为明确时间序列的选择范围以及预测时间窗口长短,需要将不同时序范围与时间窗口相结合,并通过在线学习模型进行验证,进而筛选出最优的时序范围与最佳的时间窗口。

(3)在线学习模型的对比评价及验证。根据已构建的在线学习模型,将在线学习方法与经典机器学习方法从多个维度(决策树棵树、各评价指标、校准曲线等方面)进行对比分析。其次,应用MIMIC-III数据集对在线学习模型进行独立验证,以此来评价模型的泛化能力。最后,通过模拟生成的错误数据,探讨数据质量对在线学习模型的影响,进而对构建在线学习模型提出相应的安全防范措施。

综上,本研究应用在线学习AMF模型,通过对重症医学数据集中的ARDS患者的生命体征数据进行提取和处理,建立ARDS的早期预警模型。模型的构建结合了在线学习的预测实时性以及生命体征数据的连续以获取性等优点,对于患者疾病严重程度进行提前预测,以达到快速分级救治、合理分配医疗资源的目的。

论文文摘(外文):

The rapid development of the field in critical care medicine and advent of the big data have led scholars in related fields to pay more and more attention to the collection, organization, sharing, and application of data. Among which acute respiratory distress syndrome (ARDS) has been conducted by different machine learning methods to report the classification, diagnosis, risk factor analysis, treatment, length of stay, mortality prediction and other aspects of the disease and the models have good performance more often. However, the following problems exist in the real world: There are few studies about early prediction methods for disease severity of ARDS, and most of them are based on fixed or static data to build corresponding models, neglecting the analysis of patients' vital signs data that are generated in real time and fluctuate greatly.These limitations for classical learning algorithms make them incompetent to analyze and process time-series data of vital signs which are in a real-time fluctuation state. Another shortcoming is that previous studies have selected a large number of feature dimensions, and most of the clinical indicators need to be collected through invasive examinations or medical devices, which results in a large amount of missing data when building the model, and does not allow dynamic evaluation of the patient's disease condition. Therefore, this study will use AMF online learning algorithm to construct an early prediction model of ARDS disease severity. Using the patient's oxygenation index (P/F ratio) of each measurement as the observation origin, the three vital signs HR, RR, Temp and their fluctuation values (extreme value, extreme difference, mean value) in different time windows are inverted and used to predict in advance whether the patient will progress to severe ARDS (P/F≤100mmHg). The model combines the characteristics of online learning algorithm for real-time prediction of dynamic data and vital sign time series data which are consecutive, non-invasived and easily obtain to achieve more rapid and accurate early prediction of severe ARDS.This study includes the following main contents:

(1) ARDS prediction model based on Online-learning. The existing Online-learning algorithms with the current application in medical field are investigated by literature, so the online aggregated Mondrian forest-AMF is selected to construct the prediction model of this study. By reviewing the principles and related applications of the AMF model, further preparation is made for the subsequent establishment of the early ARDS prediction model. Combining different time windows data of ARDS patients, the ARDS prediction model based on online learning was constructed.

(2) Extraction and processing methods for ARDS patients and their time window data. Diagnose of ARDS is in or not in different data sets, so the extraction methods of ARDS patients in different data sets need to be studied. Secondly, the three vital sign features of the patients are all time-series data, and the optimal prediction time window needs to be delineated by fully considering the recording interval of the time-series data, and the fluctuation in each time period. In order to clarify the selection range of different time series and the length of the prediction time window, it is necessary to combine different time series ranges and time windows, and verify them by online learning models, then single out the optimal time series range and the best time window.

(3) Comparative evaluation and validation of the online learning model. According to the constructed online learning model, the online learning method is compared and analyzed with the classical machine learning method in several dimensions (numbers of decision trees, each evaluation index, calibration curve, etc.). Secondly, external validation of the online learning model is applied to an independent dataset as a way to evaluate the generalization ability of the model. Finally, the impact of data quality on online learning models is explored by simulating the generated error data, and then corresponding security precautions are proposed for building online learning models.

In summary, this study applies an online learning AMF model to build an early prediction model for severe ARDS by extracting and processing the vital sign data of ARDS patients in the critical care medicine dataset. The model is constructed by combining the advantages of online learning in prediction in real time and continuous accessibility of vital sign data to achieve early prediction of disease severity for a large number of patients during public emergencies or epidemic infectious disease outbreaks for the purpose of rapid graded treatment and rational allocation of medical resources.

开放日期:

 2022-11-24    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号