论文题名(中文): | 基于多组学构建非小细胞肺癌患者新辅助免疫治疗后主要病理缓解及远期生存的预测模型 |
姓名: | |
论文语种: | chi |
学位: | 博士 |
学位类型: | 专业学位 |
学校: | 北京协和医学院 |
院系: | |
专业: | |
指导教师姓名: | |
校内导师组成员姓名(逗号分隔): | |
论文完成日期: | 2025-03-28 |
论文题名(外文): | Construction of Multi-Omics-Based Predictive Models for Major Pathological Response and Long-Term Survival in NSCLC Patients Undergone Neoadjuvant Immunotherapy |
关键词(中文): | |
关键词(外文): | Non-small cell lung cancer Neoadjuvant immunotherapy Radiomics Single-cell transcriptomics Major pathological response B cell Neoadjuvant immunotherapy Programmed cell death Recurrence-free survival Predictive model |
论文文摘(中文): |
第一部分 基于影像组学构建非小细胞肺癌新辅助免疫治疗后主要病理缓解的预测模型
目的:本研究旨在结合多中心数据构建基于影像组学的预测模型,以预测接受新辅助免疫治疗的非小细胞肺癌(non-small cell lung cancer,NSCLC)患者是否达到主要病理缓解(major pathological response,MPR),探索其作为无创性疗效评估工具的临床应用价值。 方法:我们回顾性收集了来自3个医疗中心,自2018年2月至2024年6月的263例接受新辅助免疫治疗的NSCLC患者的影像资料、临床数据以及病理资料。根据新辅助免疫治疗后患者是否达到MPR,将其分为MPR组和非MPR(Non-MPR)组。将1个医疗中心的199例患者作为总训练集(MPR:Non-MPR=98:101),将其余2个医疗中心的64例患者作为外部验证集(MPR:Non-MPR=34:30)。在胸部CT上对勾画病灶的感兴趣区(region of interset,ROI)并从中提取1037个影像组学特征。接下来,我们将总训练集以3:1的比例随机分为内部训练组(n=149)和内部验证组(n=50),并使用显著性检验和机器学习构建如下3种预测模型:(1)基于临床特征构建临床模型;(2)基于影像组学特征构建影像组学模型;(3)基于前两者构建临床-影像组学复合模型。绘制每种模型的受试者工作特征曲线(receiver operating characteristic curve,ROC)并计算相应的曲线下面积(area under curve,AUC),并依次使用Delong检验、决策曲线分析(decision curve analysis,DCA)评估各模型的预测效果后,选取最佳模型构建列线图,同时使用校准曲线(Hosmer-Lemeshow检验)评估其拟合优度,并在外部验证集中验证其预测效能。 结果:在总训练集中,基于显著性检验结果,我们使用“性别”、“N分期”、“吸烟史”、“病理类型”等4个临床特征构建临床模型;使用“original_shape_Flatness”等8个最佳的影像组学特征进行影像组学模型构建;使用这两部分特征构建临床-影像组学复合模型。通过对训练组和内部验证组的200次随机分组后,我们发现与临床模型相比,影像组学模型及临床-影像组学复合模型的预测效能均明显更优。训练组和外部验证集的DCA结果显示,临床-影像组学复合模型的净获益更好且更稳定。在内部训练组中,选取预测效果最好的复合模型(AUC:0.875),发现此时其预测效能显著优于临床模型(AUC:0.750,p=0.005)及影像组学模型(AUC:0.797,p<0.001)。在内部验证组中,复合模型的预测效能(AUC:0.837)明显好于临床模型(AUC:0.692,p=0.011),与影像组学模型无显著差异(AUC:0.814,p=0.719)。在外部验证集中发现,复合模型的预测效能(AUC:0.782)依旧优于临床模型(AUC:0.678,p=0.019),与影像组学模型无显著差异(AUC:0.687,p=0.106),但复合模型的AUC值高于影像组学模型。基于此复合模型构建列线图,校准曲线结果提示此复合模型具有较好的拟合度(p=0.192)。据此,选择此复合模型作为预测NSCLC患者新辅助免疫治疗后MPR的最佳模型,共包含4个临床特征和8个影像组学特征(准确度:0.805,灵敏度:0.789,特异度:0.822)。 结论:本研究构建了一种基于影像组学特征的模型,可有效预测NSCLC患者接受新辅助免疫治疗后的MPR,但不可忽视临床特征的重要价值。基于两类特征构建临床-影像组学复合模型,可为NSCLC患者新辅助免疫治疗的疗效评估提供新的参考。
第二部分 基于单细胞转录组学构建预测非小细胞肺癌新辅助免疫治疗后远期生存的B细胞标志基因模型
目的:本研究旨在整合公共数据库及本地数据,基于单细胞转录组学联合常规转录组学构建B细胞标志基因模型(B cell marker gene signature,BCMGS),以预测NSCLC患者接受新辅助免疫治疗后行根治性手术的无复发生存期(recurrence-free survival, RFS),探索其在预后评估中的应用价值。 方法:NSCLC患者的单细胞RNA测序 (single-cell RNA sequencing, scRNA-seq)数据和常规转录组测序数据从肿瘤基因组图谱(The Cancer Genome Atlas, TCGA)数据库和基因表达综合(Gene Expression Omnibus, GEO)数据库进行获取。基于对 scRNA-seq 数据的综合分析,鉴定与NSCLC相关的 B 细胞标志基因。基于单因素Cox回归分析初步筛选潜在预后相关基因,并进一步通过多因素Cox回归分析明确与预后呈显著相关的标志基因,应用机器学习算法构建预测模型。以TCGA数据集作为训练集,以3个独立的GEO数据集作为外部验证集,使用时间依赖性ROC曲线和AUC值验证模型的预测效能,根据模型评分的中位数进行生存风险分层,使用Kaplan‒Meier 生存曲线及log-rank 检验比较不同风险亚组之间的预后差异。接下来,分析不同风险亚组之间免疫微环境的差异,从而评估模型与免疫治疗应答的相关性。最后,我们收集本地接受新辅助免疫治疗后行根治性手术切除的 NSCLC 患者队列的临床信息及随访数据,并对手术切除标本进行转录组测序,从而验证模型对远期生存的预测能力。 结果:基于GSE131907队列的scRNA-seq 数据,我们鉴定出603个与NSCLC相关的B细胞标志基因。基于TCGA的训练集数据,单因素及多因素Cox回归分析的结果表明,其中49个基因与预后显著相关(p<0.01),并进一步应用最小绝对收缩和选择算子(least absolute shrinkage and selection operator, LASSO)算法构建了由18个B细胞标志基因构成的BCMGS。在内部验证中,时间依赖性ROC曲线显示此模型在1年、3年和5年总体生存期(overall survival,OS)的AUC值分别为0.746、0.739和0.671,提示具有较好的预测效能;Kaplan‒Meier 生存曲线提示基于此模型的生存风险分层,低风险组的OS显著高于高风险组(p<0.0001)。在GSE30219、GSE30210和GSE50081这3个队列进行外部验证时也得到了同样的结果。肿瘤免疫微环境(tumor immune microenvironment,TIME)分析提示,与高风险组相比,低风险组的ESTIMATE免疫评分较高(p=0.0044),程序性死亡配体1(programmed death ligand 1,PD-L1)的免疫表型评分(immunophenoscore,IPS)也较高(p<0.0001),提示其更可能从免疫治疗中获益。基于上述结论,我们在接受新辅助免疫治疗后行根治性手术切除的国家癌症中心(National Cancer Center,NCC)队列(n=27)中验证模型的对新辅助免疫治疗反应及远期预后的预测能力。我们根据是否达到MPR将所有患者分为MPR组(n=12)和非MPR组(n=15),发现MPR组的风险评分显著低于非MPR组(p<0.001);根据是否达到病理完全缓解(pathological complete response, pCR)将所有患者分为pCR组(n=7)和非pCR组(n=20),发现pCR组的风险评分显著低于非应答组(p=0.019)。同时根据模型评分的中位数进行生存风险分层,结果显示低风险组的RFS显著高于高风险组(p=0.037)。 结论:基于单细胞转录组学结合常规转录组学,本研究构建了一个由18个B细胞标志基因构成的预测模型,可准确预测NSCLC患者接受新辅助免疫治疗后行根治性切除术后的远期生存。此模型可辅助临床医师对NSCLC患者新辅助治疗后的远期生存风险进行早期识别,从而实现治疗决策的精准化。
第三部分 基于转录组学构建预测非小细胞肺癌新辅助免疫治疗后远期生存的程序性细胞死亡模型
目的:本研究旨在整合公共数据库及本地数据,基于转录组学构建程序性细胞死亡(programmed cell death,PCD)预测模型,以预测NSCLC患者接受新辅助免疫治疗后行根治性手术切除的 RFS,探索其在预后评估中的应用价值。 方法:我们从TCGA和GEO数据库获取NSCLC患者的RNA-seq数据和临床信息。从PCD相关基因中筛选在肿瘤组织和正常组织的差异表达基因(differential expressed genes,DEGs)。通过单因素、多因素Cox回归分析筛选与预后相关的DEGs,应用机器学习算法构建预测模型。以TCGA数据集作为训练集,以3个独立的GEO数据集作为外部验证集,根据模型计算出的风险评分中位数进行生存风险分层,使用Kaplan‒Meier 生存曲线比较不同风险亚组之间的预后差异。基于PCD模型和临床特征构建预测生存风险的列线图并比较列线图与其它临床特征的预测效果。之后我们使用多种算法评估PCD模型与免疫治疗应答的相关性。接下来,我们收集本地接受新辅助免疫治疗后行根治性手术切除的 NSCLC 患者队列的临床信息及随访数据,并对手术切除标本进行转录组测序,从而验证模型对远期生存的预测能力。最后,我们对上述标本进行多重免疫荧光(multiplex immunofluorescence,mIF)染色,对不同风险亚组之间的肿瘤微环境差异进行进一步探讨。 结果:通过对TCGA-LUAD队列的差异分析,鉴定出423个DEGs。基于单因素Cox回归分析结果,筛选出对预后有显著影响的87个DEGs行进一步分析。随后,应用LASSO算法筛选出ERO1A 等7个基因用于构建PCD模型。无论是在内部验证还是外部验证中,Kaplan‒Meier 生存曲线提示基于此模型进行生存风险分层后,低风险组的OS均显著优于高风险组。基于年龄、临床分期和风险评分构建了预测1年、3年及5年OS的列线图,校准曲线显示该列线图具有较高的预测准确性。DCA分析结果表明列线图的临床净收益显著高于其它临床特征。在训练集及外部验证集中,时间依赖性ROC曲线均提示列线图有良好的预测效能。肿瘤免疫微环境分析结果表明,低风险组的免疫浸润水平显著高于高风险组,提示其更可能从免疫治疗中获益。基于上述结论,我们在接受新辅助免疫治疗后行根治性手术切除的NCC队列(n=27)中验证PCD模型的预测能力。根据模型公式计算风险评分,将NCC队列分为高风险组(n=13)和低风险组(n=14)。我们发现高风险组患者的RFS显著短于低风险组患者,且较低的风险评分与较高的pCR/MPR率相关。此外,mIF染色结果提示低风险组肿瘤内浸润的非耗竭型CD8+T细胞的比例显著高于高风险组。上述结果均表明低风险组患者显示出更好的新辅助免疫治疗疗效。 结论:本研究基于转录组学数据构建了一个由7个PCD相关基因构成的预测模型,可准确预测NSCLC患者接受新辅助免疫治疗后的远期生存,有助于临床医师对患者进行生存风险分层,从而实现治疗的精准化和个体化。
|
论文文摘(外文): |
Part 1 Constructing a Radiomics-based Risk Model to Predict Major Pathological Response for Non-small Cell Lung Cancer Patients Undergone Neoadjuvant Immunotherapy
Objective: Based on multi-center data, this study aims to construct a radiomics-based risk model to predict whether non-small cell lung cancer (NSCLC) patients undergone neoadjuvant immunotherapy can achieve major pathological response (MPR), and to further explore its potential value as a non-invasive tool for therapeutic efficacy assessment. Methods: We retrospectively obtained imaging data, clinical data, and pathological data of 263 NSCLC patients who underwent surgery following neoadjuvant immunochemotherapy at three medical centers in China between February 2018 and June 2024. All patients were divided into the MPR group and the Non-MPR group based on whether they achieved MPR following neoadjuvant immunotherapy. Among them,199 patients (MPR: Non-MPR=98:101) from one medical center were assigned to the training cohort, while 64 patients (MPR: Non-MPR=34:30) from two medical centers were included in the external validation cohort. Following segmentation of the regions of interest (ROI) of all tumors, a total of 1037 radiomics features were extracted. The total training cohort was further randomly divided into a model training set (n=149) and a testing set (n=50) in a 3:1 ratio. Using significance testing and machine learning methods, three predictive models were developed: (1) a clinical model based on clinical features; (2) a radiomics model utilizing radiomics features; (3) a combined model integrating clinical and radiomics features. Subsequently, we evaluated the predictive performance of each model based on receiver operating characteristic (ROC) curves, area under curve (AUC), Delong's test, and decision curve analysis (DCA). The best-performing model was selected for the construction of a nomogram. The goodness-of-fit of the nomogram was evaluated using calibration curves (Hosmer-Lemeshow test), and its predictive accuracy was further validated in the external validation cohort. Results: In the total training cohort, 4 clinical features including age, N stage, smoking history, pathological type showed significant differences and were used to construct the clinical model. A radiomics model was built based on 8 optimal radiomics features. Subsequently, a combined model was developed by integrating both clinical and radiomics features. Results from 200 repeated experiments with randomly divided training and internal validation groups demonstrated that the predictive performance of both the radiomics model and the combined clinical-radiomics model showed higher predictive performance compared to the clinical model. DCA for the training group and external validation cohort indicated that the combined clinical-radiomics model provided higher and more stable net benefits. In the internal training cohort, the combined model with the best predictive performance was selected (AUC: 0.875), and it was found to significantly outperform both the clinical model (AUC: 0.750, p=0.005) and the radiomics model (AUC: 0.797, p<0.001). In the internal validation cohort, the predictive performance of the combined model (AUC: 0.837) was significantly better than that of the clinical model (AUC: 0.692, p=0.011), with no significant difference compared to the radiomics model (AUC: 0.814, p=0.719). In the external validation cohort, the predictive performance of the combined model (AUC: 0.782) was also better than that of the clinical model (AUC: 0.678, p=0.019), with no significant difference compared to the radiomics model (AUC: 0.687, p=0.106). While, the AUC value of the combined model was higher than that of the radiomics model. The nomogram based on this combined model demonstrated good calibration, with the calibration curve indicating a good fit (p=0.192). Therefore, this combined model was chosen as the optimal model for predicting whether NSCLC patients would achieve MPR after neoadjuvant immunotherapy, including 4 clinical features and 8 radiomics features (accuracy: 0.805, sensitivity: 0.789, specificity: 0.822). Conclusion: This study developed a radiomics-based model capable of accurately predicting whether NSCLC patients receiving neoadjuvant immunotherapy achieve MPR. While, the essential value of clinical features should not be ignored. By integrating both radiomics and clinical features, a combined model was constructed, providing valuable clinical insights for assessing the therapeutic efficacy of neoadjuvant immunotherapy in NSCLC patients.
Part 2 Constructing a Single-cell Transcriptomics-based B Cell Marker Gene Signature to Predict Long-term Survival for Non-small Cell Lung Cancer Patients Undergone Neoadjuvant Immunotherapy
Objective: This study aims to construct a B cell marker gene signature (BCMGS) based on single-cell transcriptomics by integrating public databases and local data, in order to predict the recurrence-free survival (RFS) for non-small cell lung cancer (NSCLC) patients undergone radical surgery after neoadjuvant immunotherapy, and to further explore its potential value in prognostic evaluation. Methods:Single-cell RNA sequencing (scRNA-seq) data and bulk RNA sequencing (RNA-seq) data for NSCLC patients were obtained from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) databases. NSCLC-related B cell marker genes were identified by comprehensive analysis of the scRNA-seq data. Univariate and multivariate Cox regression analyses were performed to select marker genes associated with prognosis, and machine learning algorithms were applied to construct a predictive model. The TCGA dataset was served as the training cohort, while three independent GEO datasets were used as external validation cohorts. The predictive performance of BCGMS was validated using time-dependent receiver operating characteristic (ROC) curves and the area under curve (AUC). Based on the median risk score calculated by the model, the training cohort was divided into a high-score group and a low-score group, and Kaplan-Meier survival curves with log-rank test were used to compare the prognostic differences between these two groups. Furthermore, we analyzed the differences in the tumor immune microenvironment among different risk subgroups to assess the correlation between BCGMS and the response to immunotherapy. Finally, we collected clinical information and follow-up data from a cohort of NSCLC patients who underwent radical surgical resection after neoadjuvant immunotherapy. Transcriptomic sequencing of the surgical resection specimens was performed to validate the model's ability to predict long-term survival. Results:603 B cell marker genes associated with NSCLC were identified based on the scRNA-seq data from GSE131907 cohort. Based on the results of univariable cox regression analysis, a total of 49 significant B cell marker genes were selected for further analysis (p<0.01). Next, using least absolute shrinkage and selection operator (LASSO) algorithm, an eighteen-gene prognostic signature based on B cell marker genes was established according to the following formula. In internal validation, the time-dependent ROC curves of the BCMGS revealed that the AUC values at 1 year, 3 years and 5 years were 0.746, 0.739, and 0.671, indicating a relatively good predictive accuracy. The TCGA-LUAD cohort was divided into high-risk and low-risk groups with the median risk score value as the optimal cutoff. Kaplan-Meier survival curves indicated that survival risk stratification based on this model showed significantly better OS in the low-risk group compared to the high-risk group (p<0.0001). Similar results were obtained in external validation cohorts, including GSE30219, GSE30210, and GSE50081. Patients with lower risk scores exhibited significantly higher immune scores (p=0.0044). The immunophenoscore (IPS) of programmed death ligand 1 (PD-L1) between the two groups was assessed in the TCGA-LUAD cohort and we confirmed the relatively higher IPS level in the low-score group (p<0.0001), suggesting that the low-risk group may benefit more from immunotherapy. Based on these findings, we further evaluated the performance of BCMGS in predicting response to neoadjuvant immunotherapy and long-term prognosis in a NCC cohort consisting of 27 NSCLC patients who underwent radical surgery after neoadjuvant immunotherapy. We found that patients in the major pathological response (MPR) group exhibited lower risk scores than those in the non-MPR group (p<0.001). Similarly, patients who achieved pathological complete response (pCR) showed relatively lower risk scores (p=0.019). Furthermore, the NCC cohort was classified it into high-risk and low-risk groups with the median risk score value as the optimal cutoff. Kaplan‒Meier curves revealed that patients in the high-risk group had shorter RFS (p=0.037). Conclusion:This study constructed an 18-gene prognostic signature derived from B cell marker genes by integrating analysis of scRNA-seq data and bulk RNA‑seq data, which had the potential to predict the prognosis and immune response of NSCLC patients receiving neoadjuvant immunotherapy. This signature assists clinicians in stratifying survival risks for NSCLC patients, promoting precision and individualized treatment.
Part 3 Constructing a Transcriptomics-based Programmed Cell Death Risk Model to Predict Long-term Survival for Non-small Cell Lung Cancer Patients Undergone Neoadjuvant Immunotherapy
Objective: This study aims to construct a programmed cell death (PCD) risk model based on transcriptomics by integrating public databases and local data, in order to predict the recurrence-free survival (RFS) for NSCLC patients undergone radical surgery after neoadjuvant immunotherapy, and to further explore its potential value in prognostic evaluation. Methods: RNA sequencing (RNA-seq) data and clinical information for NSCLC patients were collected from TCGA and GEO databases. Differentially expressed genes (DEGs) were confirmed from PCD-related genes between tumor and normal tissue samples. Univariate and multivariate Cox regression analyses were performed to identify DEGs associated with prognosis, and machine learning algorithms were applied to construct a predictive model. The TCGA dataset was served as the training cohort, while three independent GEO datasets were used as external validation cohorts. Based on the median risk score calculated by the model, the training cohort was divided into a high-risk group and a low-risk group. Kaplan-Meier survival curves were employed to compare the prognostic differences between different these two groups (log-rank test). A nomogram for predicting survival risk was established based on the PCD model and clinical features. Subsequently, we evaluated the correlation between the PCD model and immunotherapy response. Next, we collected clinical information and follow-up data from a cohort of NSCLC patients who underwent radical surgical resection after neoadjuvant immunotherapy. Transcriptomic sequencing of the surgical resection specimens was performed to validate the model's predictive ability for long-term survival. Finally, multiplex immunofluorescence (mIF) staining assay was performed on the specimens to further explore the differences in the immune microenvironment across different risk subgroups. Results: A total of 423 DEGs between tumors and normal samples were identified. Based on the results of univariate Cox regression analysis, 87 significant DEGs were selected. Subsequently, 7 genes were chosen to build the model using LASSO regression analysis. Kaplan-Meier survival curves indicated significantly longer OS time in the low-risk group compared to the high-risk group in both internal and external validation cohorts. A nomogram comprising age, stage and risk score was established to predict OS at 1-year, 3 years and 5 years OS. As presented in calibration curves, this nomogram had a favorable prediction accuracy. The results of DCA showed that the clinical net benefit of the nomogram was significantly higher than that of other clinicopathological features. In both internal and external validation cohorts, the results of time-dependent ROC curves indicated that the nomogram demonstrated good predictive performance. The results of tumor immune microenvironment (TIME) analysis suggested that higher infiltration levels of immune cells were observed in the low-risk group, indicating that the low-risk group may be more likely to benefit from immunotherapy. Based on these findings, we assessed the predictive ability of the PCD model in the National Cancer Center (NCC) cohort (n=27) of NSCLC patients undergone neoadjuvant immunotherapy. We found that the RFS of patients in the high-risk group (n=13) was significantly shorter than that of patients in the low-risk group (n=14). Additionally, lower risk scores were associated with higher pCR/MPR rates. Furthermore, the results of mIF staining assay identified the proportion of non-exhausted CD Conclusion: This study constructed a predictive model using 7 genes derived from diverse PCD patterns that had the potential to accurately predict the prognosis for NSCLC patients undergone neoadjuvant immunotherapy. The model assists clinicians in stratifying survival risks for NSCLC patients, promoting precision and individualized treatment.
|
开放日期: | 2025-05-30 |