亚实性结节的IA期肺癌气道播散预测模型的开发和验证

Development and validation of a predict model for STAS in IA stage lung cancer presenting as subsolid nodules

  • 摘要:
    背景 对于CT影像表现为亚实性结节(subsolid nodule,SSN)的IA期肺癌,如果能术前精准评估发生气道播散(spread through air spaces,STAS)的风险,可以为优化手术方案、改善患者预后提供有力的支撑。
    目的 开发并验证一个基于CT数据和实验室检查结果的机器学习模型,以识别IA期肺癌患者中表现为亚实性结节的高STAS风险病例。
    方法 回顾性分析2021年5月至2025年9月解放军总医院第一医学中心和第四医学中心的2 047例CT表现为亚实性结节的IA期肺癌患者的临床数据,其中第一医学中心1 600例使用随机数法按照7∶3的比例分为训练集(n=1 120)和内部验证集(n=480),第四医学中心的447例为外部验证集。使用单因素逻辑回归、最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)回归和多因素逻辑回归进行特征选择,随后开发朴素贝叶斯(Naive Bayes,NB)、逻辑回归(Logistic Regression,LR)、K近邻(K-Nearest Neighbors,KNN)、随机森林(Random Forest,RF)、单层神经网络(Single-Layer Neural Network,SLNN)、极端梯度增强(Extreme Gradient Boosting,XGBoost)和轻梯度增强机(Light Gradient Boosting Machine,LightGBM)7个机器学习模型。使用受试者工作特征(receiver operating characteristic,ROC)曲线、校准曲线和决策曲线分析(decision curve analysis,DCA)评估模型的性能。模型可视化采用Shapley加性解释(Shapley Additive exPlanations,SHAP),根据SHAP的特征重要性排序构建预测列线图并部署网页。
    结果 在开发的7个机器学习模型中,RF模型表现出最好的预测性能,其曲线下面积(area under the curve,AUC)在训练队列中为0.934(95% CI:0.902 ~ 0.966),在内部验证队列中为0.929(95% CI:0.900 ~ 0.958),在外部验证队列中为0.873(95% CI:0.837 ~ 0.909)。校准曲线表明,模型的预测结果与实际结果吻合较好。DCA曲线显示该模型具有较高的临床净收益。经过SHAP分析,确定了预测CT影像表现为亚实性结节IA期肺癌STAS的最关键因素为:实性成分占比(0.189),肿瘤最大径(0.079),毛刺征(0.037)和瘤肺界面(0.035)。
    结论 基于术前影像语义特征及临床指标的RF模型能精准预测以亚实性结节为表现的IA期肺癌发生STAS的风险,在外部验证中展现了较好的泛化能力。该模型有望通过术前甄别IA期肺癌发生STAS的预测因子,在指导治疗方案决策、改善患者预后等方面起到一定的临床辅助决策价值。

     

    Abstract:
    Background The accurate preoperative risk assessment of spread through air spaces (STAS) for stage IA lung cancer manifesting as subsolid nodules (SSNs) on CT scans provides crucial support for optimizing surgical strategies and enhancing patient outcomes.
    Objective To develop and validate a machine learning-based clinical predict model using CT data and clinical test results for identifying the risk of STAS in patients with stage IA lung cancer presenting as subsolid nodules.
    Methods A retrospective analysis was performed on 2 047 patients with stage IA lung cancer showing subsolid nodules on CT imaging, who were treated at the First and the Fourth Medical Center of PLA General Hospital from May 2021 to September 2025. The 1 600 cases from the First Medical Center of PLA General Hospital were randomly divided into the training set (n=1 120) and the internal validation set (n=480) at a ratio of 7:3 using the random number method. The 447 cases from the Fourth Medical Center of PLA General Hospital were the external validation set. Feature selection was conducted using univariable logistic regression, least absolute shrinkage and selection operator (LASSO) regression, and multivariable logistic regression. Subsequently, seven machine learning models were developed, namely Naive Bayes (NB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), Single-Layer Neural Network (SLNN), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). Model performance was evaluated using receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA) to assess discrimination, calibration, and clinical utility. Model interpretation was achieved using Shapley Additive exPlanations (SHAP). Based on the SHAP-derived feature importance ranking, a predict nomogram was constructed and deployed as a web-based application.
    Results Among the seven machine learning models developed, the RF model demonstrated the best predict performance, with area under the curve (AUC) value of 0.934 (95% CI: 0.902 - 0.966) in the training cohort, 0.929 (95% CI: 0.900 - 0.958) in the internal validation cohort, and 0.873 (95% CI: 0.837 - 0.909) in the external validation cohort. The calibration curve indicated good agreement between the model's predictions and actual outcomes. DCA showed that the model provided high clinical net benefit. SHAP analysis identified the following key factors for predicting STAS in IA-stage lung cancer presenting as subsolid nodules on CT: consolidation-to-tumor ratio (0.189), maximum tumor diameter (0.079), spiculation (0.037), and tumor-lung interface (0.035).
    Conclusion The RF model based on preoperative image semantic features and clinical indicators can accurately predict the risk of STAS in IA-stage lung cancer presenting as subsolid nodules. Its clinical utility lies in screening predictor patients preoperatively, aiding in therapeutic decision-making, and contributing to improved prognosis.

     

/

返回文章
返回