查看论文信息

免费浏览

附件下载

查看论文信息

论文题名(中文)：	基于多组学检测的食管鳞状细胞癌及其癌前病变早期诊断的临床研究
姓名：	刘思瑶
论文语种：	chi
学位：	博士
学位类型：	专业学位
学校：	北京协和医学院
院系：	北京协和医学院肿瘤医院
专业：	临床医学-肿瘤学
指导教师姓名：	王贵齐
校内导师组成员姓名(逗号分隔)：	赵东兵贺舜
论文完成日期：	2024-04-09
论文题名(外文)：	Clinical Study on Early Diagnosis of Esophageal Squamous Cell Carcinoma and its Precancerous Lesions Based on Multi-Omics Analysis
关键词(中文)：	食管肿瘤鳞状细胞癌 cfDNA 末端基序 isomiRs
关键词(外文)：	Esophageal tumor Squamous cell carcinoma cfDNA Terminal motifs IsomiR variations
论文文摘（中文）：	︿研究目的通过对食管鳞状细胞癌（esophageal squamous cell carcinoma, ESCC）及其癌前病变（Esophageal squamous precancerous lesions, ESPL）人群的基因组、转录组和蛋白质组学分子检测，筛选差异性标志物，揭示多组学层面上的分子变化与食管鳞癌及其癌前病变的关联；探索并建立基于血液中循环游离DNA（cell-free DNA, cfDNA）、微小RNA（microRNA, miRNA）和蛋白标志物检测的ESCC及其癌前病变的诊断模型，评估其用于ESCC早期检测及分流食管癌高危人群的可行性，为ESCC及其癌前病变的无创筛查、早期诊断及病因学研究提供多组学层面的证据。材料与方法研究分为三个部分进行，分别探讨基于cfDNA末端基序检测的预测模型、联合cfDNA末端基序与蛋白标志物的多组学预测模型，以及基于血浆miRNA检测的microRNA异构体（miRNA isoforms, isomiRs）诊断模型在ESCC早期诊断中的性能和应用价值。基于cfDNA末端基序检测的食管鳞状细胞癌及其癌前病变预测模型建立及验证前瞻性收集2021年8月至2022年11月中国医学科学院肿瘤医院的201例ESCC、46例食管高级别上皮内瘤变（high-grade intraepithelial neoplasia, HGIN）、46例低级别上皮内瘤变（low-grade intraepithelial neoplasia, LGIN）、176例食管良性病变患者和29例健康对照的血浆样本进行cfDNA提取、文库构建、测序与数据分析。将ESCC患者和对照组受试者随机分为训练集（n=284）和验证集（n=122）。对训练集中ESCC及对照组cfDNA末端基序比对到人类基因组hg19的结果进行差异特征筛选，经主成分分析降维、十折交叉验证及随机森林递归特征消除后训练并建立随机森林分类模型Motif-1，在验证集和癌前病变组（n=92）中验证其性能。将癌前病变组患者纳入数据集后重新随机分为训练集（n=243）、验证集（n=105）、测试集（n=150），采用与Motif-1模型相同的方法使用训练集训练并构建随机森林模型Motif-2，在验证集中检验并确定最佳阈值，在测试集中进行ESCC及ESPL诊断性能检测。基于cfDNA末端基序联合肿瘤相关蛋白标志物检测的食管鳞状细胞癌及其癌前病变预测模型建立及验证前瞻性收集2021年8月至2023年1月来源于中国医学科学院肿瘤医院的199例ESCC、45例HGIN、46例LGIN、157例食管良性病变患者和44例健康对照的血浆样本进行cfDNA提取、文库构建与测序，并匹配进行AFP、CA19-9、CA24-2、CA72-4、CEA、Cyfra21-1、SCC、PG I、PG II 9种临床常用消化系统蛋白标志物检测，对上述检测结果进行生物信息学数据分析。将ESCC组、ESPL组及对照组样本随机分为训练集（n=240）、验证集（n=103）、测试集（n=148）。对训练集中ESCC、ESPL及对照组cfDNA末端基序矩阵标准化处理及差异特征筛选，经主成分分析、十折交叉验证及随机森林递归特征消除后训练并建立随机森林Motif模型，在验证集中检验并确定最佳阈值，在测试集中进行诊断性能检测。对训练集中样本的上述9种蛋白标志物浓度及PG I/PG II比值（PG Ⅰ/PG Ⅱ Ratio, PGR）矩阵进行随机森林递归特征消除、十折交叉验证训练出随机森林8-Protein模型，在验证集中验证其性能，并确定模型最佳阈值；在测试集中检验模型在癌前病变及ESCC各组别中的检测效能。通过匹配样本的cfDNA末端基序结果和蛋白含量结果，将Motif建模过程中得到的主成分分析后的30个维度联合10种蛋白标志物同样通过随机森林递归特征消除和十折交叉验证的方式，训练出随机森林Motif联合蛋白模型，随后在验证集中检验并确定阈值，在测试集中进行模型诊断性能检测和验证。基于isomiRs检测的食管鳞状细胞癌及其癌前病变预测模型建立及验证样本来源与第二部分一致，10例样本由于血浆量不足以进行miRNA检测出组，共对195例ESCC、44例HGIN、45例LGIN、154例食管良性病变患者和43例健康对照的血浆样本进行miRNA提取、cDNA文库构建与测序及数据分析及每百万计数法标准化处理。将所有样本随机分为训练集（n=235）、验证集（n=101）、测试集（n=145），对训练集中与人类基因组hg38比对后确认的isomiRs进行t检验分析筛选ESCC及其癌前病变与对照组之间的差异特征，以十折交叉验证和随机森林递归特征消除的方式构建出IsomiRs随机森林模型。以验证集和测试集数据进行模型参数优化及验证。进一步将用于模型构建的isomiRs所属的miRNA家族在miRecords、miRTarBase和TarBase 三个数据库中标注靶基因，并进行通路富集分析及疾病相关通路预测分析。统计学分析方法使用R语言（4.2.0版本）及Python软件（3.8.0版本）进行统计分析及数据可视化。连续变量用中位数（上下四分位距）描述。分类变量用数字（百分比）描述，并通过卡方检验或连续矫正卡方检验进行比较。数值变量两组间比较采用独立样本t检验或Wilcoxon秩和检验，三组间比较采用Kruskal-Wallis检验。以R包（pROC, 1.18.0版本）绘制受试者特征曲线并计算其曲线下面积、灵敏度、特异度和95% 置信区间（CI）以评价模型性能（双侧检验，α＝0.05）。AUC间的比较使用Delong检验。模型的临界值由约登指数确定。研究结果在cfDNA基因组学层面，研究第一部分通过对498例食管鳞癌、癌前病变及对照组样本的血浆cfDNA检测，筛选出了102个ESCC与对照组的差异cfDNA末端基序特征，74个ESCC及其癌前病变与对照组差异cfDNA末端基序特征。研究第一部分基于上述血浆cfDNA差异末端基序特征，创新性地构建了两个基于cfDNA 末端基序检测的ESCC及其癌前病变的预测随机森林模型：模型构建中未纳入癌前病变的Motif-1模型在验证集中的总体灵敏度为90.0%，特异度为77.4%，AUC为0.88；正确识别LGIN、HGIN及T1aN0期食管癌为阳性的平均灵敏度分别为76.1%、80.4%和91.2%，正确识别具有内镜下治疗指征的HGIN及T1aN0期食管癌为阳性的平均灵敏度为85.0%，且预测分值和灵敏度随着癌症的临床分期上升而增加（P＜0.001）；而在模型构建中纳入癌前病变的Motif-2模型在测试集中总体平均灵敏度为87.5%，平均特异度为77.4%，AUC：0.86，正确识别食管癌前病变及ESCC的平均灵敏度分别为80.0%和89.7%，联合预测HGIN及T1aN0期食管癌的平均灵敏度为89.4%。在cfDNA基因组学联合蛋白组学层面，研究第二部分通过对491例食管鳞癌、癌前病变及对照组样本的血浆cfDNA及临床常用的消化系统相关蛋白标志物联合检测，发现203个ESCC及其癌前病变与对照组之间具有显著差异的cfDNA末端基序特征及5个差异表达蛋白标志物（CEA、Cyfra21-1、PG Ⅰ、PG Ⅱ和PGR）。研究第二部分构建了基于血浆cfDNA差异末端基序特征检测的食管鳞状细胞癌及其癌前病变预测随机森林模型Motif模型：在测试集中区分食管鳞癌及其癌前病变和对照组的总体平均灵敏度为89.7%，平均特异度为55.7%，AUC：0.84，识别LGIN、HGIN及ESCC为阳性的平均灵敏度分别为93.3%、100%和86.9%，诊断具有内镜下治疗指征的HGIN及T1aN0期食管癌亚组为阳性的平均灵敏度为90.2%。研究第二部分还构建了基于8种消化道肿瘤相关蛋白标志物（CA24-2、CA72-4、CEA、Cyfra21-1、SCC、PG Ⅰ、PG Ⅱ、PGR）检测的食管鳞状细胞癌及其癌前病变预测随机森林模型8-Proteins模型：在训练集、测试集和验证集中均可准确地区分出食管癌组及癌前病变组血浆和对照组样本（平均AUC=0.80，0.86，0.82），总体灵敏度为81.6%（95% CI：73.5%-89.7%），特异度为68.9%（95% CI：57.2%-80.5%），在癌前病变组平均灵敏度65.4%，所有食管癌中平均灵敏度88.5%；在我们重点关注的HGIN+T1aN0组具有内镜下手术指征的患者中，8-Proteins模型灵敏度为80.5%（95%CI：68.4%-92.6%）。研究第二部分创新性地构建了联合cfDNA末端基序及蛋白标志物检测的多组学食管鳞状细胞癌及其癌前病变预测随机森林模型（Motif联合蛋白模型）：在测试集中以0.90的AUC显示出较为优越的识别食管癌及癌前病变患者能力，总体灵敏度88.5%，特异性75.4%。该模型能够正确识别90.9%（95%CI：73.9%-100%）的HGIN，86.8%（95%CI：76.1%-97.6%）的Ⅰ期食管鳞癌，以及87.8%（95%CI：77.8%-97.8%）的HGIN或T1aN0期食管鳞癌，且其预测值得分能够有效将食管鳞癌癌前病变患者、Ⅰ-Ⅱ期及Ⅲ-Ⅳ期食管癌与对照组区分开，尤其是可以区分低级别食管鳞状上皮内瘤变患者和对照组，突显了其作为一种有前景的早期诊断模型的潜力。 Motif联合蛋白模型（AUC=0.90）、Motif模型（AUC=0.84）、8-Proteins模型（AUC=0.82）三种模型AUC值之间无显著差异（P均＞0.05），但相比于临床常用的消化系统蛋白标志物CEA、SCC、CA19-9、CA24-2、PG Ⅰ、PG Ⅱ和PGR曲线下面积，三种模型均具有显著优势（P均＜0.05），且灵敏度均高于上述单一蛋白标志物的对于食管鳞癌及其癌前病变的诊断能力。包含20个cfDNA 末端基序主成分及6个蛋白特征的Motif联合蛋白模型不仅能准确鉴别食管鳞癌及其癌前病变与对照组，且相比cfDNA 末端基序模型、联合肿瘤蛋白标志物8-Proteins模型显示出对于ESCC及ESPL最佳的检测性能。在miRNA组学方面，第三部分研究通过对481例食管鳞癌、癌前病变及对照组样本的血浆miRNA检测与isomiRs比对，发现了1607个在ESCC及ESPL组中表达上调的isomiRs，1454个下调的isomiRs。通过上述差异表达isomiRs，我们创新性地构建了用于诊断食管鳞癌及其癌前病变的IsomiRs随机森林预测模型，在训练集、验证集和测试集三个数据集上AUC分别为0.88、0.75和0.74，测试集上区分ESCC及ESPL与对照组的平均灵敏度为82.6%，但平均特异度仅为49.2%。在测试集中模型对于ESPL组的灵敏度可达92.0%（95% CI：81.4%-100%），LGIN的预测灵敏度高达100%，HGIN的预测灵敏度为85.7%（95%CI：67.4%-100%）。用于IsomiRs构建的30个isomiRs所属的miRNA家族的靶基因预测结果表明，这149个靶基因富集在PI3K-AKT通路、p53通路、细胞周期相关通路及内源性凋亡信号通路等癌症相关信号通路上。疾病基因网络预测结果提示这些预测靶基因同样富集在舌癌、喉癌、胃癌、结直肠癌、胰腺癌、胆囊癌、肝内胆管癌等恶性肿瘤相关通路上。研究结论创新性构建的三种基于 cfDNA 末端基序的血浆检测模型Motif-1、Motif-2和Motif在ESCC及其癌前病变中具有较高的灵敏度及特异度，或可用于早期ESCC检测。在识别食管鳞癌癌前病变方面，纳入癌前病变用于模型构建能够有效提高预测模型的灵敏度，尤其在可行内镜下切除的HGIN+T1aN0期ESCC中。 CA24-2、CA72-4、CEA、Cyfra21-1、SCC、PG Ⅰ、PG Ⅱ、PGR 8种蛋白标志物联合检测的8-Proteins模型在食管鳞癌及其癌前病变预测中相较单一肿瘤标志物检测具有更好区分患者和对照的诊断效力。这一策略可以促进诊断试剂盒的开发，为临床转化应用提供了方向。联合cfDNA末端基序及蛋白标志物检测能够辅助食管鳞癌及其癌前病变的早期检测，我们创新性构建的包含20个cfDNA 末端基序主成分及6个蛋白特征的Motif联合蛋白模型相比cfDNA 末端基序模型、联合肿瘤蛋白标志物模型显示出对于ESCC及其癌前病变最佳的检测性能，可用于ESCC筛查及高风险食管鳞癌人群内镜检查前分流。差异性isomiRs用于食管鳞癌及其癌前病变的检测可能导致较高的假阳性率，用于IsomiRs模型构建的30种isomiRs将有助于进一步研究中食管鳞癌及其癌前病变的病因学、多组学联合模型构建、疾病诊断和泛癌筛查方面的探索。﹀
论文文摘（外文）：	︿ Objectives Through advanced genomic, transcriptomic, and proteomic analyses of esophageal squamous cell carcinoma (ESCC) and its precancerous lesions (ESPL), our research endeavors to pinpoint distinctive biomarkers and unravel the complex interplay of molecular changes leading to ESCC and ESPL. We are dedicated to constructing and testing diagnostic models for these conditions, utilizing circulating cell-free DNA (cfDNA), microRNA (miRNA), and protein biomarkers found in plasma. These models are designed to evaluate the practicality of early ESCC detection and improve the precision of pre-endoscopic screening processes. Our ultimate objective is to pioneer the development of non-invasive diagnostic techniques, enhance early detection capabilities, and deepen our understanding of the etiology of ESCC and ESPL through multi-omic analyses. Materials and methods The study is structured into three distinct sections, each dedicated to exploring different aspects of early ESCC diagnosis. The first part develops and validates prediction models that rely on the detection of cfDNA terminal motifs. The second segment delves into a multi-omics prediction model that integrates cfDNA terminal motifs with protein markers, aiming to enhance diagnostic precision. Finally, the third section examines a diagnostic model focused on miRNA isoforms (isomiRs) detected in plasma miRNA, assessing its efficacy and potential application value in early ESCC detection. Development and Validation of Predictive Models for Esophageal Squamous Cell Carcinoma and its Precancerous Lesions Using Terminal Motif Analysis in Circulating Cell-Free DNA (cfDNA) Between August 2021 and November 2022, we prospectively collected plasma samples from 448 individuals at the Department of Endoscopy, Cancer Hospital, Chinese Academy of Medical Sciences for cfDNA extraction, library construction, and sequencing. This cohort included patients with ESCC, ESPL, and control subjects. We analyzed 201 cases of ESCC, 46 high-grade intraepithelial neoplasia (HGIN), 46 low-grade intraepithelial neoplasia (LGIN), 176 benign esophageal lesions, and 29 healthy controls. Participants, including ESCC patients and control subjects, were randomly assigned to a training set (n=284) and a validation set (n=122). The training cohort underwent z-score normalization of cfDNA terminal motif matrices and a selection of distinctive features differentiating ESCC cases from controls. The random forest classifier, Motif-1 (M1), was then developed through principal component analysis (PCA), ten-fold cross-validation, and recursive feature elimination (RF-RFE). M1’s efficacy was then validated in the validation and precancerous lesion sets. Subsequently, individuals with precancerous lesions were included in the dataset and participants were randomly allocated to newly formed training (n=243), validation (n=105), and test (n=150) cohorts. Using the same procedure as M1, we developed the Motif-2 (M2) random forest model with the training cohort. The M2 model’s accuracy was then confirmed in the validation cohort to establish the optimal threshold and further tested for performance validation in the test cohort. Development and Validation of Predictive Models for Esophageal Squamous Cell Carcinoma and its Precancerous Lesions Based on Combined Detection of cfDNA Terminal Motifs and Tumor-Associated Protein Markers From August 2021 to January 2023, we prospectively collected plasma samples from 491 individuals at the Cancer Hospital, Chinese Academy of Medical Sciences for cfDNA extraction, library preparation, and sequencing. This cohort included 199 individuals of ESCC, 46 with HGIN, 45 with LGIN, 157 with benign esophageal lesions, and 44 with healthy controls. Additionally, we conducted a paired biomarker analysis, detecting nine clinically prevalent protein markers including AFP, CA19-9, CA24-2, CA72-4, CEA, Cyfra21-1, SCC, PG I, and PG II. The subsequent bioinformatics analysis was dedicated to individually and collectively evaluating the diagnostic significance of these markers. Samples from the ESCC, ESPL, and control groups were randomized into a training set (n=240), a validation set (n=103), and a test set (n=148). In the training set, cfDNA terminal motif matrices for ESCC, ESPL, and controls were z-score normalized and Student’s t-tested to filtrate differential features. Following PCA, ten-fold cross-validation, and RF-RFE, the random forest Motif model was developed and validated in the training set. This model’s efficacy was then evaluated on the validation set to determine the cut-off value, followed by diagnostic performance testing in the test set. In a similar method, the concentration matrices of the nine protein markers and the PG I/PG II ratio (PGR) within the training set underwent RF-RFE and ten-fold cross-validation to develop an 8-Protein model. The efficacy of the model was then validated in the validation set and the test set. By correlating the results of cfDNA terminal motif analysis with protein concentration data, the random forest Motif-Protein model was developed. This method involved integrating 30 dimensions obtained from principal component analysis during the Motif modeling process, together with 10 specific protein markers. This integrated dataset underwent RF-RFE and ten-fold cross-validation as well to train the model. The model’s efficacy was subsequently validated in the validation set to determine the optimal threshold, followed by a thorough evaluation of its diagnostic performance in the test set. Development and Validation of a Predictive Model for Esophageal Squamous Cell Carcinoma and its Precancerous Lesions Based on IsomiR Detection in Circulating Cell-Free RNA (cfRNA) The sample sources were identical to those in part two, with 10 samples excluded due to insufficient plasma volume for miRNA detection. Thus, miRNA extraction, cDNA library construction, sequencing, and data analysis, along with per million counts normalization, were performed on plasma samples from 195 cases of ESCC, 44 cases of HGIN, 45 cases of LGIN, 154 patients with benign esophageal lesions, and 43 healthy controls. All samples were randomly divided into a training set (n=235), a validation set (n=101), and a test set (n=145). Differential feature selection through Student’s t-test was conducted on isomiRs confirmed by alignment with the human genome hg38 within the training set. An IsomiRs random forest model was constructed using ten-fold cross-validation and RF-RFE. Model parameters were optimized with data from the validation set and then evaluated within the test set. Additionally, the miRNA families of the isomiRs used in model construction were annotated for target mRNAs in three databases: miRecords, miRTarBase, and TarBase, followed by pathway enrichment analysis and disease-related predictive analysis. Statistical Analysis Methods Statistical analysis and data visualization were performed using R (version 4.2.0) and Python (version 3.8.0). Continuous variables were described by the median and interquartile range (IQR), while categorical variables were expressed as numbers and percentages. Comparisons of categorical variables were executed via Chi-square or Fisher's exact tests as applicable. For numerical variables, the independent samples Student’s t-test or the Wilcoxon test was employed for two-group comparisons, and the Kruskal-Wallis test was used for analyses involving three groups. Model performance was assessed by plotting receiver operating characteristic (ROC) curves and calculating the area under the curve (AUC), sensitivity, specificity, and 95% confidence intervals (CIs) using the pROC package (version 1.18.0) in R, with a significance level set at α = 0.05 for two-sided tests. AUC comparisons were conducted using DeLong’s test, and the optimal cut-off value for the model was determined based on the Youden index. Results Genomic Analysis Results of cfDNA Motif: In the initial section of the study, analysis of 498 plasma cfDNA samples revealed 102 differential cfDNA terminal motif features between ESCC and control groups, as well as 74 features that distinguish both ESCC and its precancerous lesions from control groups. We innovatively developed two cfDNA terminal motif-based predictive models for esophageal squamous cell carcinoma (ESCC) and associated precancerous conditions. The first model, M1, achieved a sensitivity of 90.0%, a specificity of 77.4%, and an area under the curve (AUC) of 0.88 in the validation cohort. For LGIN, HGIN, and T1aN0 stage esophageal cancers, M1’s sensitivities were 76.1%, 80.4%, and 91.2% respectively. Notably, the sensitivity for jointly predicting HGIN and T1aN0 esophageal cancer, indicative of endoscopic surgical intervention, reached 85.0%. Both the predictive accuracy and sensitivity increased in line with the cancer’s progression (P < 0.001). The second model, M2, exhibited a sensitivity of 87.5%, a specificity of 77.4%, and an AUC of 0.86 in the test cohort. M2’s sensitivities for detecting precancerous lesions and ESCC were 80.0% and 89.7%, respectively, and it showed a combined sensitivity of 89.4% for HGIN and T1aN0 stage esophageal cancer. Integrated Genomic and Proteomic Analysis Results: The second section of our study involved the analysis of plasma cfDNA and 10 selected digestive system protein markers in 491 samples. This integrated approach uncovered 203 distinct cfDNA terminal motif features and 5 differentially expressed protein markers (CEA, Cyfra21-1, PG I, PG II, and PGR) that significantly differentiate ESCC and its precancerous conditions from the control group. We refined a random forest Motif model, using the 203 distinctive motif features above, to detect esophageal squamous cell carcinoma and associated precancerous conditions. This model exhibited a discerning capacity in the test set, achieving an overall sensitivity of 89.7%, a specificity of 55.7%, and an area under the curve of 0.84. Notably, it identified LGIN, HGIN, and ESCC with average sensitivities of 93.3%, 100%, and 86.9%, respectively. The model further achieved a sensitivity of 90.2% in diagnosing subgroups of HGIN and T1aN0 stage ESCC that are suitable for endoscopic treatment. An 8-Proteins model, leveraging a random forest algorithm predicated on eight tumor-associated protein markers (CA24-2, CA72-4, CEA, Cyfra21-1, SCC, PG I, PG II, PGR), was devised to prognosticate esophageal squamous cell carcinoma and precancerous conditions. Demonstrating precision in differentiating between ESCC, precancerous plasma samples, and controls, this model reported average AUC values of 0.80, 0.86, and 0.82 across training, validation, and test cohorts, respectively. It showed an overall sensitivity of 81.6% (95% CI: 73.5%-89.7%) and a specificity of 68.9% (95% CI: 57.2%-80.5%). Specifically, for precancerous lesions, the sensitivity was 65.4%, and for esophageal cancers, it was 88.5%. Importantly, among patients with HGIN and T1aN0 stage ESCC, indicative of endoscopic surgical intervention, the model attained a sensitivity of 80.5% (95% CI: 68.4%-92.6%). We also innovatively developed a combined Motif -Protein model, which exhibited superior capability in identifying patients with ESCC and ESPL in the test set, evidenced by an AUC of 0.90, overall sensitivity of 88.5%, and specificity of 75.4%. The model accurately identified 90.9% (95% CI: 73.9%-100%) of high-grade intraepithelial neoplasia cases, 86.8% (95% CI: 76.1%-97.6%) of stage I ESCC, and 87.8% (95% CI: 77.8%-97.8%) of HGIN or T1aN0 stage ESCC. Moreover, its predictive scoring effectively distinguished between patients with ESPL, stages I-II, stages III-IV esophageal cancers, and the control group, particularly differentiating low-grade squamous intraepithelial lesions from the control group. This underscores its potential as a promising model for early diagnosis. We found no significant differences in the AUC values among the combined Motif and Protein model (AUC=0.90), the Motif model (AUC=0.84), and the 8-Proteins model (AUC=0.82) (P > 0.05). However, when compared to the area under the curve of conventional digestive system protein markers—CEA, SCC, CA19-9, CA24-2, PG I, PG II, and PGR—each of the three models demonstrated a significant advantage (P < 0.05). Additionally, the sensitivities of these three models were consistently superior to those of the individual protein markers in diagnosing ESCC and ESPL. The Motif-Protein model, which incorporates 20 principal components of cfDNA terminal motifs and 6 protein markers, accurately distinguishes esophageal squamous cell carcinoma and its precancerous lesions from controls. This model also surpasses the diagnostic performance of the Motif model and the 8-Proteins model in detecting ESCC and ESPL, offering more precise identification. miRNA Genomics Analysis Results: The third section of our study involved analyzing plasma miRNA and comparing isomiRs across 481 samples. This analysis revealed 1,607 upregulated isomiRs and 1,454 downregulated isomiRs in the ESCC and ESPL groups. The IsomiRs Model, a random forest predictive framework innovatively developed with the differentially expressed isomiRs mentioned above, recorded average AUC of 0.88, 0.75, and 0.74 in the training, validation, and test datasets, respectively. In the test set, the model effectively differentiated ESCC/ESPL from control groups, demonstrating an average sensitivity of 82.6%, despite a notably lower average specificity of 49.2%. Specifically, the model's sensitivity for ESPL reached 92.0% (95% CI: 81.4%-100%), with LGIN predictions achieving 100%, and HGIN sensitivity at 85.7% (95% CI: 67.4%-100%). Target gene prediction for the miRNA families of 30 isomiRs used in the IsomiRs model identified 149 target mRNAs from database intersections. Enrichment analyses indicated these target genes are predominantly involved in cancer-related signaling pathways, including the PI3K-AKT and p53 pathways, as well as those related to the cell cycle and intrinsic apoptosis. Furthermore, disease gene network predictions revealed a significant association of these target mRNAs with pathways implicated in various malignancies, such as cancers of the tongue, larynx, stomach, colorectum, pancreas, gallbladder, and intrahepatic bile ducts. Conclusions The innovatively developed cfDNA terminal motif-based models—Motif-1, Motif-2, and Motif—exhibit significant sensitivity and specificity for ESCC and its precancerous lesions, highlighting their potential for early ESCC detection. In terms of detecting precancerous lesions associated with esophageal squamous cell carcinoma, incorporating these lesions into the model’s construction enhances the sensitivity of the predictive model. This improvement is particularly notable in HGIN+T1aN0 stage ESCC, which is amenable to endoscopic treatments. The 8-Proteins model, which combines eight protein markers (CA24-2, CA72-4, CEA, Cyfra21-1, SCC, PG I, PG II, PGR), surpasses traditional single tumor marker tests in distinguishing between patients and controls within the scope of ESCC and ESPL. This strategy could advance the creation of diagnostic kits, offering potential for clinical translational applications. Integrating cfDNA terminal motifs with protein marker detection can assist the early identification of ESCC and its precancerous states. The Motif-Protein model, which amalgamates 20 principal components of cfDNA terminal motifs and 6 protein markers, achieves superior detection accuracy for both ESCC and ESSPL over models based solely on cfDNA motifs or individual tumor protein markers. Consequently, this innovative model is particularly efficient for screening ESCC and facilitating the pre-endoscopic stratification of individuals at elevated risk. The use of differential isomiRs for detecting ESCC and its precursors may result in an elevated false-positive rate. The 30 isomiRs crafted in the IsomiRs model will contribute to etiological studies and facilitate the development of multi-omics combined models, enhance disease diagnosis, and support broader cancer screening research. ﹀
开放日期：	2024-05-31