- 无标题文档
查看论文信息

论文题名(中文):

 第一部分:基于转录组数据构建肺腺癌预后模型 的研究;第二部分:FABP5 在非小细胞肺癌 EMT 及肉瘤样转化中的功能及预后研究    

姓名:

 刘雷    

论文语种:

 chi    

学位:

 博士    

学位类型:

 专业学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院肿瘤医院    

专业:

 临床医学-肿瘤学    

指导教师姓名:

 高树庚    

论文完成日期:

 2022-05-10    

论文题名(外文):

 Construction of Lung Adenocarcinoma Prognostic Model Based on Transcriptome Data;The Function and Prognosis of FABP5 in EMT and Sarcomatoid Transformation of Non-small Cell Lung Cancer    

关键词(中文):

 肺腺癌 预测模型 预后标签 风险评分 上皮间质转化(EMT) 肺肉瘤样癌    

关键词(外文):

 Lung adenocarcinoma (LUAD) Prognostic model Prognostic signature Risk score Mesenchymal Transition EMT Pulmonary sarcomatoid carcinoma PSC    

论文文摘(中文):

第一部分:

背景:肺腺癌(LUAD)患者的预后差异很大,即使是病理 IA 期,5 年总生存率也只有 63-79%。为了识别高危患者并帮助制定临床决策,我们有必要探索新的预后标志物,以联合 TNM 分期达到更优化的风险分层。本世纪以来,基因测序技术发展迅速,使我们能够识别越来越多与肿瘤相关的分子预后标志物,该技术有广阔的应用前景。

方法:我们使用来自癌症基因组图谱的肺腺癌队列(TCGA-LUAD)转录组数据作为训练集,来筛选预后相关基因,并将其与临床数据相结合,以开发新的肺腺癌预后模型。在 GEO 数据库中的独立数据集(GSE50081)中进行验证。Kaplan-Meier 生存曲线、Lasso 回归和 Cox 多因素回归用于确定预后相关基因。用受试者工作特征曲线下面积(AUC-ROC)评估预后模型的区分度,用校准曲线(Calibration)评估模型预测概率的准确性。应用在线分析工具(TIMER2.0) 确定了免疫浸润淋巴细胞相对丰度。

结果:本研究确定了一个四个基因(CENPH、MYLIP、PITX3 和 TRAF3IP3) 的预后模型,CENPH(HR=1.31,p<0.001)和 PITX3(HR=1.24,p<0.001)为风险基因,MYLIP(HR=0.62,p<0.001)和 TRAF3IP3(HR=0.75,p=0.017)为

保护基因。根据四个基因风险系数,计算了四基因风险评分。单因素分析结果显示,四基因风险评分可以预测TCGA-LUAD 队列和GSE50081 队列的总生存(OS),高风险组对比低风险组的 HR 分别为 2.73(p<0.001)和 2.72(p<0.001)。结合临 床因素,进行多因素 Cox 回归分析表明,这四基因的总风险评分是 TCGA-LUAD 队列和 GSE50081 队列影响OS 的独立预后因素,高风险组与低风险组的 HR 分别为 2.34(p<0.001)和 2.10(p=0.017)。我们结合风险分组和临床因素,建立 了预测 OS 的综合预后模型。训练集中 1 年、3 年、5 年 OS 的 AUC 分别为 0.750、0.737 和 0.719;在验证数据集中分别为 0.645、0.766 和 0.725。校准曲线显示预测概率和实际概率之间很好的匹配。另外,我们通过在线分析工具(TIMER2.0) 确定了免疫浸润淋巴细胞相对丰度显示,在两个队列中,与低风险组相比,高风险组的 CD4+T 细胞相对较低(均 p<0.001)。

结论:我们确定了 4 个预后基因(CENPH、MYLIP、PITX3 和 TRAF3IP3),其 mRNA 表达量的风险评分是肺腺癌(OS)的独立预后因素,可用于识别肺腺癌不同 TNM 分期的高危患者。与传统 TNM 分期相比,将预后基因与临床特征相结合的综合预后模型显示出对 OS 更好的预测能力。

第二部分:

背景:肺癌是死亡率最高的癌种。尽管肺癌的治疗方法取得了很大进步,但因为容易发生远处转移或耐药,往往治疗效果不理想,导致非小细胞肺癌的 5 年生存率只有大约 4-17%。上皮间质转化(EMT)是一个动态的过程,与多种肿瘤的侵袭、转移和耐药有关,发生 EMT 后常表现为其标志物 E-钙黏蛋白表达减少和(或)波形蛋白的表达增加。肺肉瘤样癌属于非小细胞肺癌,恶性程度高, 具有上皮与间质成分共存、容易转移、化疗效果差等特点。研究证实肺肉瘤样癌中常常同时含有了经典癌成分和肉瘤样成分,EMT 是肉瘤样成分发生的重要因素。非小细胞肺癌是上皮性来源的肿瘤,发生 EMT 和肉瘤样转化过程中,其获得了更大的侵袭性、转移能力和耐药性,这一转化过程具有重要的研究价值。

方法:根据本课题组肺肉瘤样癌的转录组数据(区分不同成分)和 TCGA 肺腺癌的转录组数据筛选差异表达基因,并进一步筛选和 CDH1 负相关、VIM 正相关(采用 Spearman 相关系数法)的基因,获得候选基因标志物。然后利用GEPIA2 在线分析平台,寻找与肺腺癌预后相关的基因,进一步缩小范围并确定目标基因。然后在 TIMER、LCR、GEPIA2、K-M Plotter 等数据库的转录组数据中对该基因和 EMT 相关性进行验证。最后在本院独立肺腺癌和肺肉瘤样癌队列中,分别进行转录水平和蛋白水平(免疫组化染色)验证,并且分析了蛋白水平对预后的影响。应用 R 软件(Version 4.0.3)进行差异表达基因筛选、基因相关性验证(采用 Spearman 相关系数法)以及绘图。使用 SPSS(Version25.0)软件进行数据分析,包括独立样本 t 检验、KM 生存分析、Cox 多因素回归分析等。生存时间以月为单位。使用 Log-rank 检验进行预后因素的生存分析。对分类变量和连续变量分别采用Fisher 精确检验和Wilcoxon 检验法进行统计分析。P<0.05 被认为具有统计学差异。

结果:肺肉瘤样癌转录组数据来自于 14 例患者,其中包括 7 例腺癌成分和14 例肉瘤样成分样本的数据。对肉瘤样成分和腺癌成分两组进行差异表达分析,得到 1986 个 DEGs(差异表达基因),共筛选出 575 个在肉瘤样成分中上调的基因。下载 TCGA-LUAD 转录组数据,从 575 个基因中筛选出表达量与 CDH1 负相关且与 VIM 正相关的基因,共获得 22 个 EMT 标签基因。利用 GEPIA2 数据库分析候选基因表达对预后的影响,发现只有 FABP5(HR=1.6,p=0.001)和 CTSL

(HR=1.6,p=0.003)的表达与预后显著相关,最终选择与 EMT 标签表达水平 相关系数大的 FABP5(与 CDH1 相关系数为-0.194,与 VIM 相关系数为 0.247) 作为重点研究基因。在公共数据库中进行验证,GSE72094 肺腺癌数据集中的分析得到了相似的结果,而且与 EMT 相关的其他重要基因 SNAI1、SNAI2 和 ZEB1、ZEB2 也有正相关趋势。进一步在泛癌中进行研究,发现 FABP5 在乳腺癌、肾癌、肝癌和胰腺癌中,与 EMT 相关指标也有相似的相关性趋势。然后在本院独立肺腺癌队列中进行再次验证,FABP5 在 VIM 高表达组中表达高,在 CDH1 高表达组中表达低,但都没有达到统计学差异。最后在本院肺肉瘤样癌独立队列中,进行免疫组化染色发现 FABP5 表达与 E-Cadherin 负相关(R= -0.17,p=0.173),与 Vimentin 正相关(R=0.39,p=0.001)。单因素及多因素比例风险模型分析表明, FABP5 的表达量是肺肉瘤样癌独立的预后因素(HR=2.17,CI:1.11-4.30,p=0.024)。

结论:本研究主要通过肺肉瘤样癌不同成分的转录组数据和 TCGA 肺腺癌的转录组数据进行数据分析及挖掘,筛选出了与 EMT 和肉瘤样转化密切相关的FABP5 基因,通过在多个公共数据库和本院独立队列进行验证,得出如下结论:1. FABP5 在非小细胞肺癌中与 EMT 及肉瘤样转化密切相关,但具体分子机制仍需要进一步研究。2. FABP5 是非小细胞肺癌(包括肺肉瘤样癌)的独立预后因素,其高表达与不良预后相关。

 

论文文摘(外文):

Part I:

Background: The prognosis of patients with lung adenocarcinoma (LUAD) is highly variable. The 5-year overall survival rates of stage IA LUAD are only 63-79%. Therefore, in order to identify high-risk patients and help clinical decision making, it is necessary to explore new prognostic markers to achieve more optimal risk stratification in combination with TNM staging system. Gene sequencing technologies have developed rapidly over this century, enabling us to identify an increasing number of cancer-related molecular prognostic markers with promising applications.

Method: We used transcriptomic data from The Cancer Genome Atlas Lung Adenocarcinoma Cohort (TCGA-LUAD) as a training set to identify prognostic genes, and combined these genes with clinical data to develop new prognostic models for LUAD. We validated the model with an independent dataset (GSE50081) in the GEO database. Kaplan-Meier survival curves, Lasso regression and Cox multifactorial regression were used to identify prognostic genes. Area under the subject operating characteristic curve (AUC-ROC) was used to assess the discrimination of the prognostic model, and calibration curve (Calibration) was used to assess the accuracy of the model in predicting probabilities. An online analysis tool (TIMER 2.0) was applied to determine the relative abundance of immune infiltrating lymphocytes.

Result: This study identified a prognostic model for four genes (CENPH, MYLIP, PITX3 and TRAF3IP3), with CENPH (HR=1.31, p<0.001) and PITX3 (HR=1.24, p<0.001) as risk genes and MYLIP (HR=0.62, p<0.001) and TRAF3IP3 (HR=0.75,

p=0.017) as protective genes. A four-gene risk score was calculated based on the coefficients and expression levels. Using univariate COX regression analysis, we found that the four-gene risk score could serve as a prognostic predictor in the TCGA-LUAD cohort and the GSE50081 cohort. Compared to the low-risk group, HRs were 2.73 (p<0.001) and 2.72 (p<0.001) in the TCGA-LUAD and GSE50081

cohorts, respectively. Combining clinical factors, the multifactorial Cox regression analysis showed that the risk score was an independent prognostic factor in the TCGA-LUAD cohort and the GSE50081 cohort. Compared to the low-risk group,

 

HRs were 2.34 (p<0.001) and 2.10 (p=0.017) in the TCGA-LUAD and GSE50081

cohorts, respectively. We combined risk score and clinical factors to develop a comprehensive prognostic model. The AUCs for 1-year, 3-year, and 5-year OS in the training cohort were 0.750, 0.737, and 0.719, respectively; and in the validation dataset were 0.645, 0.766, and 0.725, respectively. Calibration curves showed a good match between predicted and actual probabilities. In addition, we determined the relative abundance of immune infiltrating lymphocytes by an online analysis tool (TIMER 2.0), revealing a more abundance of CD4+ T cells in the low-risk group than the high-risk group in both cohorts (both p<0.001).

Conclusions: We identified four prognostic genes (CENPH, MYLIP, PITX3 and TRAF3IP3) in this study. We have developed the four-gene risk score as an independent prognostic factor for LUAD and the score can be used to identify high-risk patients with different TNM stages. Compared to conventional TNM staging system, a comprehensive prognostic model combining prognostic genes with clinical features showed better predictive ability for OS.

 

Part II:

BACKGROUND: Lung cancer is the cancer with the highest  mortality  rate.  Despite great advances in the treatment of lung cancer, treatment is often suboptimal because of the susceptibility to distant metastasis or drug resistance, resulting in a 5-year survival rate of only approximately 4-17% for non-small cell lung cancer (NSCLC). Epithelial mesenchymal transition (EMT) is a dynamic process associated with invasion, metastasis and drug resistance in a variety of tumors. The occurrence  of EMT is often manifested by a decrease in the expression of its marker E-Cadherin and/or an increase in the expression of Vimentin. Pulmonary sarcomatoid carcinoma (PSC) is a non-small cell lung cancer with high malignancy , which contains epithelial and mesenchymal components. It metastasizes easily and has poor outcomes with chemotherapy. Studies have confirmed that PSCs often contains both classic cancer components and sarcomatoid components, and EMT is an important factor in the occurrence of sarcomatoid components. NSCLC is a tumor of epithelial origin, which acquires greater invasiveness, metastatic ability and drug resistance during the occurrence of EMT and sarcomatoid transformation, and this transformation process has important research value.

METHODS:  Transcriptomic data of PSCs (distinguishing different components)  and TCGA lung adenocarcinoma were used to screen differentially expressed genes, and further screen genes negatively correlated with CDH1 and positively correlated with VIM (using Spearman correlation coefficient method) to obtain candidate gene markers. Then the GEPIA2 online analysis platform was used to find genes associated with prognosis of LUAD to further narrow down and identify target genes. The correlation between target gene and EMT were then validated at the transcriptome level in TIMER, LCR, GEPIA2, K-M Plotter and other databases. Finally, transcriptome and protein level (immunohistochemical staining) validation was performed in our independent LUAD and pulmonary sarcomatoid carcinoma cohorts, respectively, and then evaluate the prognostic impact of protein expression. R software (Version 4.0.3) was applied for differentially expressed gene screening, gene correlation validation (using Spearman correlation coefficient method) and mapping. SPSS (Version 25.0) software was used for data analysis, including independent sample t-test, KM survival analysis, and Cox multi-factor regression analysis. Survival time was measured in months. Survival analysis of prognostic factors was performed using Log-rank test. Fisher exact test and Wilcoxon test were used for statistical analysis of categorical and continuous variables, respectively. P<0.05 was considered statistically different.

RESULTS: Transcriptome samples of pulmonary sarcomatoid carcinoma were obtained from 14 patients, including data from 7 adenocarcinoma components and 14 sarcomatoid component samples. 1986 DEGs (differentially expressed genes) were obtained from the differential expression analysis of the two groups of sarcomatoid component and adenocarcinoma component, and a total of 575 genes were screened for upregulation in the sarcomatoid component. The TCGA-LUAD transcriptome data were downloaded, and genes negatively associated with CDH1 and positively associated with VIM were screened out from the 575 genes, and 22 candidate genes associated with EMT were obtained. The GEPIA2 database was used to analyze the effect of candidate gene expression on prognosis, and only the expression of FABP5 (HR=1.6, p=0.001) and CTSL (HR=1.6, p=0.003) was found to be statistically different from prognosis, and finally FABP5 with a large correlation coefficient with EMT (correlation coefficient with CDH1 was -0.194, correlation coefficient with VIM 0.247) was selected as the key gene for study. Validation was performed in public databases, the analysis of the GSE72094 dataset came to similar results, and there were positive correlation trends for other important genes associated with EMT, SNAI1, SNAI2, and ZEB1 and ZEB2. Further studies in pan-cancer revealed that FABP5 showed similar trends of correlation with EMT-related indicators in breast, kidney, liver and pancreatic cancers. This was then re-validated in our independent adenocarcinoma cohort, where FABP5 was highly expressed in the VIM high expression group and low in the CDH1 high expression group, but neither reached statistical differences. Finally, immunohistochemical staining in an independent cohort of pulmonary sarcomatoid carcinomas at our institution revealed that FABP5 expression was negatively correlated with E-Cadherin (R= -0.17, p=0.173) and positively correlated with Vimentin (R=0.39, p=0.001). Multifactorial proportional risk model analysis showed that the expression of FABP5 was an independent prognostic factor for pulmonary sarcomatoid carcinoma (HR=2.17, CI:1.11-4.30, p=0.024).

CONCLUSION: In this study, we mainly analyzed the transcriptomic data of different components of pulmonary sarcomatoid carcinoma and TCGA-LUAD to screen out FABP5 that was closely associated with EMT and sarcomatoid transformation. With multiple validations in multiple public databases and our independent cohort, we concluded that 1. FABP5 in NSCLC is closely associated with EMT and sarcomatoid transformation, but the exact molecular mechanism still needs further study. 2. FABP5 is an independent prognostic factor in NSCLC, including pulmonary sarcomatoid carcinoma, and its high expression is associated with poor prognosis.

开放日期:

 2022-06-03    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号