- 无标题文档
查看论文信息

论文题名(中文):

 基于机器学习构建老年糖尿病患者轻度认知障碍风险评估模型    

姓名:

 张海鑫    

论文语种:

 chi    

学位:

 硕士    

学位类型:

 学术学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院群医学及公共卫生学院    

专业:

 公共卫生与预防医学-流行病与卫生统计学    

指导教师姓名:

 王宇萍    

论文完成日期:

 2024-05-10    

论文题名(外文):

 Risk assessment of mild cognitive impairment in elderly patients with diabetes mellitus based on machine learning    

关键词(中文):

 2型糖尿病 轻度认知障碍 BP神经网络 随机森林 XGBoost    

关键词(外文):

 Type 2 diabetes Mild cognitive impairment BP neural network Random forest XGBoost    

论文文摘(中文):

研究目的:

利用一般可获得性影响因素构建老年糖尿病患者MCI风险评估机器学习模型,并比较各模型的风险评估性能,为医务人员快速评估老年糖尿病患者的MCI风险提供参考,提高患者对MCI的风险预警意识,避免或减缓痴呆的发生。

研究方法:

纳入2021年10月至2022年5月就诊于山东省烟台市蓬莱人民医院内分泌科的60岁及以上2型糖尿病患者作为研究对象,调查患者一般人口学、疾病状态、生活方式、心理健康、体检指标资料,采用蒙特利尔认知评估量表(MoCA)量表评估患者认知功能。将所有数据按照7:3的比例分成训练集和测试集。在训练集中通过R 4.1.3软件构建logistic回归模型、BP神经网络模型、随机森林模型、XGBoost模型,并采用网格搜索法经过十折交叉验证进行参数调优。最后计算测试集中模型准确率、灵敏度、特异度、阳性预测值、阴性预测值、F1评分、AUC值及其95%CI,并进行Delong检验。

研究结果:

1、共纳入1319名60岁及以上2型糖尿病患者,患者平均年龄为68.0(±5.1)岁,男性724人(54.9%),其中657人(49.8%)被评估为MCI。

2、单因素分析结果显示,与认知正常组相比,MCI组人群年龄较大、受教育程度较低,病程较长,有痴呆家族史、听力障碍、睡眠障碍、抑郁、焦虑的比例较高,体育锻炼、做家务、读书看报、参与社交、中心性肥胖的比例较低,舒张压、空腹血糖水平较高,差异具有统计学意义(P<0.05)。

3、构建的logistic回归模型在测试集中的灵敏度为56.78%,特异度为75.13%,AUC值为0.715(95%CI: 0.666~0.765)。

4、当隐藏层单元数为6时,构建的三层BP神经网络模型拟合最好,此时在测试集中的灵敏度为57.79%,特异度为78.17%,AUC值为0.746(95%CI: 0.698~0.794)。

5、随机森林模型最佳mtry值为3,此时在测试集中的灵敏度为77.89%,特异度为60.41%,AUC值为0.755(95%CI: 0.708~0.802)。

6、XGBoost模型树的最佳深度为9,最佳学习率为0.01,在500次迭代下模型收敛,此时在测试集上的灵敏度为80.40%,特异度为61.42%,AUC值为0.756(95%CI: 0.709~0.803)。

7、logistic回归模型影响因素分析结果显示,在调整其他潜在混杂因素后,老年糖尿病患者的糖尿病病程(OR=1.016, 95%CI: 1.001~1.031, P=0.040)、舒张压(OR=1.017, 95%CI: 1.003~1.031, P=0.021)、空腹血糖(OR=1.078, 95%CI: 1.013~1.147, P=0.019)与MCI呈现显著正相关。基于随机森林和XGBoost模型进行特征变量重要性计算和排序,排名前4的特征变量为空腹血糖、舒张压、病程、年龄。

研究结论:

本研究筛选纳入年龄、受教育程度、痴呆家族史、病程、中心性肥胖、听力障碍、体育锻炼、做家务、读书看报、参与社交、睡眠障碍、抑郁、焦虑、舒张压、空腹血糖共15个特征变量作为模型输入变量,基于机器学习技术构建了老年糖尿病患者MCI风险评估模型(logistic回归模型、BP神经网络模型、随机森林模型和XGBoost模型)。性能表现来看,logistic回归和BP神经网络模型特异性较高,而随机森林和XGBoost模型灵敏度较高,更加符合通过模型早期识别和筛查MCI高危人群的需求,在老年糖尿病患者MCI风险评估领域具有一定的应用前景。

论文文摘(外文):

Objectives:

The purpose of this study is to construct a machine learning model for assessing the risk of mild cognitive impairment (MCI) in elderly diabetic patients using commonly available influencing factor indicators. This model aims to provide healthcare professionals with a reference for rapidly assessing the risk of MCI in elderly diabetic patients, increase their awareness of MCI risk warning, and prevent and delay the occurrence and development of MCI in elderly diabetic patients in a targeted manner, thus avoiding or slowing down the onset of dementia and improving their quality of life.

Methods:

From October 2021 to May 2022, 60 years and older type 2 diabetic patients treated in the Endocrinology Department of Penglai People's Hospital in Yantai City, Shandong Province, were included as research subjects. The study investigated the general demographics, physical illnesses, lifestyle, psychological health status, and physiological indicators of the patients and assessed their cognitive function using the Montreal Cognitive Assessment (MoCA) scale. All data were divided into training and testing sets in a 7:3 ratio. In the training set, logistic regression model, BP neural network model, random forest model, and XGBoost model were constructed using R 4.1.3 software, and parameter optimization was performed using grid search method after ten-fold cross-validation. Finally, the accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, AUC value, and its 95%CI in the testing set were calculated, and Delong test was conducted.

Results:

1. A total of 1319 type 2 diabetic patients aged 60 years and older were included, with an average age of 68.0 (±5.1) years. There were 724 males (54.9%), and 657 individuals (49.8%) were assessed as MCI by the MoCA scale.

2. The results of univariate analysis showed that compared with the cognitively normal group, the MCI group had older age, lower education level, longer duration of illness, higher proportions of family history of dementia, hearing impairment, sleep disorders, depression, anxiety, lower proportions of physical exercise, household chores, reading newspapers, social participation, and central obesity, higher diastolic blood pressure, and higher fasting blood glucose levels, with statistically significant differences (P<0.05).

3. The logistic regression model constructed achieved a sensitivity of 56.78% and a specificity of 75.13% in the test set, with an area under the ROC curve of 0.715 (95%CI: 0.666~0.765).

4. When the number of hidden layer units was 6, the three-layer BP neural network model fitted the best. At this time, the sensitivity in the testing set was 57.79%, specificity was 78.17%, and the AUC was 0.746 (95%CI: 0.698~0.794).

5. The optimal mtry value for the random forest model was 3, with a sensitivity of 77.89% and specificity of 60.41% in the testing set, and the AUC was 0.755 (95%CI: 0.708~0.802).

6. The tree depth of the XGBoost model was 9, the learning rate was 0.01, and the model converged after 500 iterations. At this time, the sensitivity in the testing set was 80.40%, specificity was 61.42%, and the AUC was 0.756 (95%CI: 0.709~0.803).

7. The results of the logistic regression model's influencing factor analysis show that, after adjusting for other potential confounding factors, the duration of diabetes (OR=1.016, 95%CI: 1.001~1.031, P=0.040), diastolic blood pressure (OR=1.017, 95%CI: 1.003~1.031, P=0.021), and fasting blood glucose (OR=1.078, 95%CI: 1.013~1.147, P=0.019) of elderly diabetic patients are significantly positively correlated with MCI. Based on the calculation and ranking of feature variable importance using random forest and XGBoost models, the top 4 feature variables are fasting blood glucose, diastolic blood pressure, duration of illness, and age.

Conclusion:

This study include 15 feature variables (age, education level, family history of dementia, duration of illness, hearing impairment, physical exercise, household chores, reading newspapers, social participation, sleep disorders, depression, anxiety, central obesity, diastolic blood pressure, and fasting blood glucose) as input variables for the model. Based on machine learning technology, a risk assessment model for MCI in elderly diabetic patients (logistic regression model, BP neural network model, random forest model, and XGBoost model) was constructed. From the performance perspective, the logistic regression and BP neural network model had higher specificity, while the random forest and XGBoost models had higher sensitivity, which better meets the needs of early identification and screening of high-risk individuals with MCI through the model, suggesting certain application prospects in the field of risk assessment of MCI in elderly diabetic patients.

开放日期:

 2024-07-05    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号