- 无标题文档
查看论文信息

论文题名(中文):

 亚厘米实性肺结节的人工智能辅助CT影像诊断研究    

姓名:

 刘嘉宁    

论文语种:

 chi    

学位:

 博士    

学位类型:

 学术学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院肿瘤医院    

专业:

 临床医学-影像医学与核医学    

指导教师姓名:

 王建卫    

论文完成日期:

 2024-05-27    

论文题名(外文):

 Artificial intelligence assisted CT imaging diagnosis of subcentimeter solid pulmonary nodules    

关键词(中文):

 肺结节 亚厘米实性肺结节 鉴别诊断 深度学习 影像组学    

关键词(外文):

 pulmonary nodule subcentimeter solid pulmonary nodule differential diagnosis deep learning radiomics    

论文文摘(中文):

第一部分

 

基于CT影像特征的亚厘米实性肺结节恶性风险预测模型的建立

 

目的:探讨恶性亚厘米实性肺结节的危险因素,建立并验证基于计算机断层扫描(computed tomography,CT)影像特征的风险预测模型,以辅助恶性亚厘米(≤10 mm)实性肺结节的早期诊断。

方法:从本课题组亚厘米实性肺结节数据库中,随机抽取2015年1月至2022年7月期间经手术病理或随诊证实良恶性的肺结节,最终来自257名患者的257例肺结节符合纳排标准被纳入本部分研究,将患者按照7:3的比例随机分配至训练集(181名)和验证集(76名)。恶性结节经手术病理证实,良性结节经手术病理或随诊证实。收集患者的临床资料和CT影像学特征,通过多因素Logistic回归分析评估亚厘米实性肺结节恶性风险的独立预测因素,并建立Logistic回归分析模型。回顾性纳入另一中心于2022年1月至2022年12月期间检查出的69例亚厘米实性肺结节(69名患者),作为本部分研究的外部验证集。通过受试者工作特征曲线下面积(area under the receiver operating characteristic curve,AUC)、灵敏度、特异度评估模型的诊断效能。

结果:训练集、内部验证集和外部验证集中分别有113例(62.4%)、47例(61.8%)和28例(40.6%)恶性肺结节。多因素Logistic回归分析结果显示,结节-肺界面粗糙(P < 0.001)、结节-肺界面模糊/晕征(P = 0.011)、短毛刺征(P = 0.015)、空气支气管征(P = 0.034)、纵隔窗不可见(P = 0.003)是恶性亚厘米实性肺结节的独立预测因子。基于以上特征建立预测模型,训练集的AUC、灵敏度和特异度分别为AUC为0.885(95% 置信区间[confidence interval, CI]: 0.830, 0.940),准确率为83.4%,灵敏度为94.7%,特异度为64.7%;内部验证集中模型的AUC为0.769(95% CI: 0.653, 0.886),准确率为76.3%,灵敏度为89.4%,特异度为55.2%;外部验证集中模型的AUC为0.830(95% CI: 0.720, 0.941),准确率为75.4%,灵敏度为85.7%,特异度为68.3%。

结论:本部分研究构建的基于CT影像学特征的风险预测模型对亚厘米实性肺结节的良恶性鉴别具有一定参考价值,但诊断性能仍有待进一步提升。

 

第二部分

 

CT影像组学对亚厘米实性肺结节良恶性鉴别的应用价值

 

目的:初步探讨基于CT的影像组学对亚厘米实性肺结节良恶性诊断的效能和应用价值。

方法:回顾性收集本课题组亚厘米实性肺结节数据库中2020年3月至2023年1月经增强CT检出的肺结节,恶性结节由手术病理证实,良性结节由手术病理或随诊证实。使用深睿医疗多模态科研平台对病灶进行分割及影像组学特征提取,通过组内相关系数、特征性相关分析和最小绝对收缩与选择算子(least absolute shrinkage and selection operator,LASSO)算法进行特征降维,采用五折交叉验证法对模型进行验证。分别建立支持向量机、逻辑回归、线性分类支持向量机、随机森林和梯度提升组学模型并绘制受试者工作特征(receiver operating characteristic,ROC)曲线,采用Delong检验比较不同分类器间的诊断效能,选出效能最佳的模型与中高年资放射科医师的诊断结果进行对比。

结果:共纳入303例肺结节(恶性136例),经特征提取和筛选后,建立组学模型。支持向量机、逻辑回归、线性分类支持向量机、随机森林和梯度提升组学模型在验证集上的ROC曲线下面积分别为0.922(95% CI: 0.893, 0.950)、0.910(95% CI: 0.878, 0.942)、0.905(95% CI: 0.872, 0.938)、0.899(95% CI: 0.865, 0.933)和0.896(95% CI: 0.862, 0.930),Delong检验提示五类模型诊断效能的差异无统计学意义,其中支持向量机模型具有最高的准确率和F1分数。将支持向量机与医师诊断结果对比,其准确率高于医师(83.8% Vs. 55.5%,P < 0.001)。

结论:影像组学模型对亚厘米实性肺结节良恶性诊断具有较高的准确性,有助于减少医师诊断的不确定性。

 

第三部分

 

基于影像组学联合CT特征的亚厘米实性肺结节诊断模型构建

 

目的:建立一个基于影像组学和CT影像特征的联合模型,用于良性和恶性亚厘米实性肺结节的鉴别诊断。

方法:回顾性分析本课题组亚厘米实性肺结节数据库中2017年4月至2022年6月经增强CT检出的肺结节,来自324位患者的324例亚厘米实性肺结节符合纳排标准,被纳入本部分研究。其中恶性结节(158例)均经手术病理证实,良性结节(166例)经手术病理或随诊证实。将肺结节按照7:3的比例随机分配为训练集(226例)和测试集(98例)。记录患者的临床资料和CT影像特征,采用单因素分析和多因素Logistic回归分析后保留的临床和CT特征来建立临床模型。从增强CT图像中共提取了2107个影像组学特征,使用F检验进行特征筛选并建立组学模型。通过Logistic回归机器学习将影像组学特征与临床和CT影像特征相结合,建立联合模型,通过AUC评估每个模型的诊断性能。

结果:直径、结节-肺界面模糊/晕征、毛刺征、空泡征、胸膜凹陷征、空气支气管征共6个CT影像特征为恶性亚厘米实性肺结节的独立预测因子;经降维后保留了4个影像组学特征并计算影像组学评分。在此基础上,分别构建了临床模型、影像组学模型和联合模型,三个模型在测试集的AUC、灵敏度和特异度分别为0.835(95% CI: 0.758, 0.912)、60.4%、88.0%;0.900(95% CI: 0.867, 0.932)、84.8%、78.3%;0.930(95% CI: 0.902, 0.957)、86.7%、84.9%。决策曲线分析表明,联合模型具有临床应用价值。

结论:相比于单一类型模型,融合影像组学特征和CT特征的联合模型诊断效能最优,同时具备较高的灵敏度和特异度,弥补了单一类型模型的不足,有望作为一种非侵入性方法提升早期肺癌的诊断效率。

 

第四部分

 

基于深度学习的方法对亚厘米实性肺结节良恶性诊断价值的研究

 

目的:评估本课题组既往研究中提出的深度学习模型对亚厘米实性肺结节的良恶性诊断效能,并与具有10年以上胸部影像诊断经验(高年资)的放射科医师进行比较。

方法:随机抽取了来自本课题组亚厘米实性肺结节数据库中的200例肺结节(良恶性各100例),恶性经手术病理证实,良性经手术病理或随诊证实。深度学习方法是以DenseNet为框架的过滤器引导的特征金字塔网络(filter-guided pyramid network,FGPN),将肺结节CT图像上传至该模型中,由模型给出每个结节的恶性概率(0 ~ 100%)。根据深度学习模型和放射科医师的诊断结果,将结节分为良性、不确定、恶性三组。使用McNemar-Bowker检验比较模型和医师的准确率和诊断结果构成。将入组的肺结节按照平均径大小分为3 ~ 6 mm组、6 ~ 8 mm组和8 ~ 10 mm组,对每个亚组,将模型的诊断结果与放射科医师的诊断结果进行比较。

结果:深度学习模型对亚厘米实性肺结节的诊断准确率显著高于放射科医师(71.5% Vs. 38.5%, P < 0.001),并且深度学习模型诊断出了更多的良性或恶性的确定结果和更少的不确定结果。将结节按照大小进行亚组分析,在每个亚组中,深度学习模型与放射科医师相比同样具有更高的诊断效能,给出了更少的不确定结果,二者在3 ~ 6 mm组、6 ~ 8 mm组和8 ~ 10 mm组的诊断准确率分别为75.5% Vs. 28.3% (P < 0.001), 62.0% Vs. 28.2%(P < 0.001), 77.6% Vs. 55.3%(P = 0.001),不确定结果分别为3.8% Vs. 66.0%、8.5% Vs. 66.2%、2.6% Vs. 35.5%(所有P < 0.001)。

结论:相比于放射科医师,深度学习模型在亚厘米实性肺结节良恶性诊断方面表现出了更好的效能,可以在提升准确率的同时有效减少不确定诊断,尤其是对于小于8 mm的实性肺结节;该模型具有良好的应用价值,其诊断效能仍具备提升空间。

 

第五部分

 

基于自监督学习的细粒度分类网络对亚厘米实性肺结节良恶性鉴别诊断的应用价值

 

目的:建立并验证一种专用于亚厘米实性肺结节良恶性诊断的深度学习模型。

方法:本部分研究回顾性收集了2015年1月至2021年10月期间于我院行胸部CT扫描检出的亚厘米实性肺结节作为内部数据集。恶性肺结节经手术病理证实,良性肺结节经手术病理或随诊评估证实。开发了一种深度自监督细粒度分类模型(deep self-supervised fine-grained classification,DeepSF),用于肺结节的良恶性分类诊断。预训练模型是基于美国国家肺癌筛查实验(national lung screening trial,NLST)、LUNA 16数据集及既往研究中的5478例肺结节数据库建立的,随后使用亚厘米实性肺结节内部数据集进行微调。DeepSF模型的泛化能力通过来自另一中心的外部数据集进行测试,并将该模型与本课题组既往研究中的过滤器引导的特征金字塔网络(filter-guided pyramid network,FGPN)模型以及放射科医师的诊断结果进行对比。采用准确率、灵敏度、特异度和AUC评估深度学习模型的诊断效能。

结果:本部分研究共纳入了1423名亚厘米实性肺结节患者,其中男性558例,女性865例,平均年龄(55.9 ± 10.4)岁;共纳入了1537例肺结节,其中良性结节699例,恶性结节838例。DeepSF模型在内部测试集(316例肺结节)中的AUC为0.964(95% CI: 0.942, 0.986),准确率为93.4%,灵敏度为96.5%,特异度为90.8%;在外部测试集(202例肺结节)中的AUC为0.945(95% CI: 0.910, 0.979),准确率为91.1%,灵敏度为97.7%,特异度为86.0%。此外,DeepSF模型的诊断准确率分别高于既往研究中的FGPN模型和放射科医师(92.6% Vs. 70.3%,P < 0.001;92.6% Vs. 56.8%,P < 0.001),DeepSF模型的不确定诊断结果分别少于FGPN模型和放射科医师(2.0% Vs. 10.1%,P = 0.008;2.0% Vs. 35.1%,P < 0.001)。

结论:本部分研究针对亚厘米实性肺结节的特点成功建立了DeepSF模型,该模型在内部和外部测试集中均表现出令人满意的诊断效能,超越了既往研究中的FGPN模型和放射科医师,同时显著减少了诊断的不确定性。该方法有望提升亚厘米实性肺结节的整体诊断水平,改善恶性肺结节患者的预后并避免良性肺结节的误诊误治,为放射科医师提供一个可靠且具有临床实用性的辅助诊断工具。

 

论文文摘(外文):

Part Ⅰ

 

A prediction model based on CT characteristics for identifying malignant from benign subcentimeter solid pulmonarnodules

 

Objectives: To investigate the risk factors of malignant subcentimeter (≤10 mm) solid pulmonary nodules and establish and validate a prediction model based on CT characteristics to assist in their early diagnosis.

Methods: The pulmonary nodules detected from January 2015 to July 2022 were randomly selected from the database of subcentimeter solid pulmonary nodules in our research group. Finally, 257 pulmonary nodules from 257 patients met the inclusion and exclusion criteria and were included in this part. Patients were randomly assigned to the training set (n = 181) and validation set (n = 76) according to a 7:3 ratio. Malignant nodules were confirmed by pathology; and benign nodules were confirmed by follow-up or pathology. Clinical data and CT features were collected to estimate the independent predictors of malignancy of the nodules with multivariate logistic analysis. A clinical prediction model was subsequently established by logistic regression. Furthermore, an additional 69 consecutive patients with 69 SSPNs from Center 2 between January 2022 and December 2022 were retrospectively included as an external cohort to validate the predictive efficacy of the model. The performance of the prediction model was assessed by sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC).

Results: There were 113 (62.4%), 47 (61.8%), and 28 (40.6%) malignant nodules in the training, internal and external validation sets, respectively. Multivariate logistic analysis revealed five independent predictors of malignant nodules: tumor-lung interface coarse (P < 0.001), tumor-lung interface unclear/halo (P = 0.011), short spiculation (P = 0.015), air bronchogram (P = 0.034), and invisible at the mediastinal window (P = 0.003). The AUC for the prediction model in the training set was 0.885 (95% CI: 0.830, 0.940); the accuracy, sensitivity, and specificity were 83.4%, 94.7% and 64.7%, respectively. The AUCs in the internal and external validation set were 0.769 (95% CI: 0.653, 0.886) and 0.830 (95% CI: 0.720, 0.941), respectively; the accuracy, sensitivity, and specificity were 76.3%, 89.4%, and 55.2% for the internal validation data, and 75.4%, 85.7%, and 68.3% for the external validation data, respectively.

Conclusions: The prediction model established in this part can assess the risk of malignant subcentimeter solid pulmonary nodules, but the diagnostic efficacy of the model needs to be further improved.

 

Part Ⅱ

 

Application value of CT radiomics models in differentiating malignant from benign subcentimeter solid pulmonary nodules

 

Objectives: To investigate the performance and application value of radiomics models for differentiating malignant from benign subcentimeter solid pulmonary nodules. Methods: The nodules detected by enhanced CT from March 2020 to January 2023 in our subcentimeter solid pulmonary nodules database were retrospectively analyzed. Malignancy was confirmed by surgical pathology, and benignity was confirmed by pathology or follow-up. Image segmentation and radiomics feature extraction were performed using the Deepwise Multimodal Research Platform. The feature dimension was reduced via intraclass correlation coefficient, linear correlation analysis between features and least absolute shrinkage and selection operator (LASSO), and 5-fold cross validation were used to validate the model. Support vector machine, logistic regression, linear support vector classifier, random forest, and gradient boosting were established according to the selected nodules, and receiver operating characteristic (ROC) curves were drawn. Delong test was used to compare the diagnostic performance of the five classifiers, and the optimal model was selected and compared against the radiologists with medium-senior seniority.

Results: Overall, 303 nodules (136 malignant) were retrospectively collected, and radiomics models were established after feature extraction and selection. On test set, the area under the ROC curve of support vector machine, logistic regression, linear support vector classifier, random forest and gradient boosting was 0.922 (95% CI: 0.893, 0.950), 0.910 (95% CI: 0.878, 0.942), 0.905 (95% CI: 0.872, 0.938), 0.899 (95% CI: 0.865, 0.933) and 0.896 (95% CI: 0.862, 0.930), respectively. Delong test indicated that there was no statistically significant difference in performance between the five radiomics models, and the support vector machine showed the highest accuracy and F1 score among the five models. The difference in diagnostic accuracy between the support vector machine (83.8%) and the radiologists (55.5%) was statistically significant (P < 0.001).

Conclusion: The radiomics model achieved a high accuracy, which may help to reduce the uncertainty of radiologists in diagnosis of malignant and benign subcentimeter solid nodules.

 

Part Ⅲ

 

Development of a combined radiomics and CT feature-based model for differentiating malignant from benign subcentimeter solid pulmonary nodules

 

Objectives: We aimed to develop a combined model based on radiomics and CT imaging features for use in differential diagnosis of benign and malignant subcentimeter solid pulmonary nodules.

Methods: The nodules detected by enhanced CT from April 2017 to June 2022 in our subcentimeter solid pulmonary nodules database were retrospectively analyzed. Finally, 324 pulmonary nodules from 324 patients met the inclusion and exclusion criteria and were included in this part. Malignant nodules (n = 158) were confirmed by pathology, and benign nodules (n = 166) were confirmed by follow-up or pathology. The nodules were divided into training (n = 226) and testing (n = 98) cohorts. A total of 2107 radiomics features were extracted from contrast-enhanced CT, and F test was used to select the features. The clinical and CT characteristics retained after univariate and multivariable logistic regression analyses were used to develop the clinical model. The combined model was established by associating radiomics features with CT imaging features using logistic regression. The performance of each model was evaluated using the AUC.

Results: Six CT imaging features including diameter, tumor-lung interface unclear/halo, spiculation, vacuole, pleural indentation, air bronchogram were independent predictors of malignant subcentimeter solid pulmonary nodules, and four radiomics features were selected after a dimensionality reduction. On this basis, the clinical model, radiomics model and combined model were constructed respectively. The AUC, sensitivity, and specificity of the three models in the testing set were 0.835 (95% CI: 0.758, 0.912), 60.4%, 88.0%; 0.900 (95% CI: 0.867, 0.932), 84.8%, 78.3%; 0.930 (95% CI: 0.902, 0.957), 86.7%, 84.9%, respectively. The decision curve analysis showed that the combined model had clinical application value.

Conclusions: The combined model incorporating radiomics and CT imaging features shows good discriminative ability with high sensitivity and specificity, which makes up for the shortcomings of the single type models. The combined model is expected to be a non-invasive method to improve the diagnostic efficiency of early lung cancer.

Part Ⅳ

Diagnostic performance of a deep learning-based method in differentiating malignant from benign subcentimeter solid pulmonary nodules

 

Objectives: This part assessed the diagnostic performance of a deep learning-based model proposed by our research group for differentiating malignant subcentimeter solid pulmonary nodules from benign ones in CT images compared against senior radiologists with more than 10 years of experience in thoracic imaging diagnosis.

Methods: Overall, 200 nodules (100 benign and 100 malignant) were retrospectively collected from the subcentimeter solid pulmonary nodules database of our research group. Malignancy was confirmed by pathology, and benignity was confirmed by follow-up or pathology. The deep learning model is based on the filter-guided pyramid network (FGPN) with DenseNet as the framework. CT images were fed into the model to obtain the probability of malignancy (range, 0 ~ 100%) for each nodule. According to the diagnostic results, enrolled nodules were classified into benign, malignant, or indeterminate. The accuracy and diagnostic composition of the model were compared with those of the radiologists using the McNemar-Bowker test. Enrolled nodules were divided into 3–6-, 6–8-, and 8–10-mm subgroups. For each subgroup, the diagnostic results of the model were compared with those of the radiologists.

Results: The accuracy of the FGPN model, in differentiating malignant and benign SSPNs, was significantly higher than that of the radiologists (71.5% Vs. 38.5%, P < 0.001). The FGPN model reported more benign or malignant deterministic results and fewer indeterminate results. In subgroup analysis of nodule size, the FGPN model also yielded higher performance in comparison with that of the radiologists, providing fewer indeterminate results. The accuracy of the two methods in the 3–6-, 6–8-, and 8–10-mm subgroups was 75.5% Vs. 28.3% (P < 0.001), 62.0% Vs. 28.2% (P < 0.001), and 77.6% Vs. 55.3% (P = 0.001), respectively, and the indeterminate results were 3.8% Vs. 66.0%, 8.5% Vs. 66.2%, and 2.6% Vs. 35.5% (all P < 0.001), respectively.

Conclusions: The FGPN model yielded higher performance in comparison with that of the radiologists in differentiating malignant and benign subcentimeter solid pulmonary nodules, which can reduce uncertainty in diagnosis and improve accuracy, especially for nodules smaller than 8 mm. But the model’s diagnostic efficiency can be further improved.

 

Part Ⅴ

 

A fine-grained classification model for distinguishing malignant from benign subcentimeter solid pulmonary nodules using self-supervised learning

 

Objectives: To develop and validate a deep learning model for differentiating malignant subcentimeter solid pulmonary nodules from benign ones on CT images.

Methods: This part retrospectively reviewed patients with subcentimeter solid pulmonary nodules detected in our hospital between January 2015 and October 2021 as an internal dataset. Malignancy was confirmed pathologically, and benignity was confirmed by follow-up evaluations or pathology. A deep self-supervised fine-grained classification model (DeepSF) was developed for predicting malignancy among the nodules. The pre-trained model was established based on the national lung screening trial (NLST), lung nodule analysis 2016 (LUNA16), and database of 5478 pulmonary nodules from our previous study, followed by retraining using the subcentimeter solid pulmonary nodules. The efficacy of the model was investigated in an external cohort from another center, and the diagnostic results of DeepSF model was compared with those of the filter-guided pyramid network (FGPN) proposed in our previous study and with those of the radiologists. The performance of the model was assessed by accuracy, sensitivity, specificity, and AUC.

Results: Overall, 1423 patients (mean age 55.9 ± 10.4 years, 558 male) with 1537 pulmonary nodules (699 benign) were enrolled in this study. The DeepSF model achieved an AUC of 0.964 (95% CI: 0.942, 0.986) with an accuracy of 93.4%, a sensitivity of 96.5%, and a specificity of 90.8% in the testing set (316 nodules), while demonstrating an AUC of 0.945 (95% CI: 0.910, 0.979) with an accuracy of 91.1%, a sensitivity of 97.7%, and a specificity of 86.0% in the external set (202 nodules). The diagnostic accuracy of the DeepSF was higher than that of FGPN proposed in our previous study and the radiologists (92.6% Vs. 70.3%, P < 0.001; 92.6% Vs. 56.8%, P < 0.001), while the indeterminate results of the DeepSF was less than that of the FGPN model and the radiologists (2.0% Vs. 10.1%, P = 0.008; 2.0% Vs. 35.1%, P < 0.001).

Conclusions: The deep learning model developed in this part provided satisfactory efficacy and robustness in predicting malignancy among subcentimeter solid pulmonary nodules. The performance of the DeepSF exceeds that of the FGPN model and the radiologists, while significantly reducing diagnostic uncertainty. The DeepSF model promises as an effective tool for early diagnosis of lung cancer, potentially improving the overall diagnostic level of subcentimeter solid pulmonary nodules, avoiding mistreatment of benign nodules, and providing optimal prognoses for patients with malignant nodules.

 

开放日期:

 2024-05-31    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号