前列腺癌与前列腺增生的分类预测及癌症风险因素分析

Differentiating prostate cancer from prostate hyperplasia and related factors analysis

  • 摘要:
      背景  前列腺癌是男性泌尿生殖系统最常见的恶性肿瘤之一,其与前列腺增生之间的快速鉴别是临床面临的难题之一,需要可靠方法进行分类预测。
      目的  基于XGBoost算法构建前列腺癌与前列腺增生的分类预测模型,识别癌症风险因素并分析其在早期诊断中的应用价值。
      方法  于“前列腺癌数据集”(2019年由解放军总医院国家临床医学科学数据中心,国家人口与健康科学数据共享平台提供)中获取前列腺癌与前列腺增生患者的临床数据,在数据预处理基础上按照7∶3划分训练集和测试集;应用XGBoost算法构建前列腺癌与前列腺增生的分类模型;基于训练集确定模型参数,并在测试集上完成模型的有效性验证,利用SHAP方法分析模型特征的临床意义。
      结果  共纳入前列腺癌患者1 224例、前列腺增生患者1 255例,平均年龄分别为65.86岁、67.70岁;选取年龄、体质量指数、前列腺特异性抗原(prostate specific antigen,PSA)系列指标及其他生化检验指标共23个特征构建分类模型。模型对前列腺癌预测的曲线下面积、准确率、召回率、精确率和F1值分别为0.81、0.74、0.70、0.72、0.74;游离PSA/总PSA、总PSA、无机磷、游离PSA是前列腺早期诊断中最重要的4个指标;SHAP分析结果表明游离PSA/总PSA ≤ 0.132与无机磷 ≥ 1.09 mmol/L是前列腺癌诊断中需要被关注的分界值。
      结论  应用XGBoost算法可构建前列腺癌预测的有效分类模型,利用SHAP分析获取的特征指标分界值可为前列腺癌的临床早期筛查提供有益参考。

     

    Abstract:
      Background  Prostate cancer is one of the most common malignant tumors of the male genitourinary system. The fast differentiation between prostate cancer and prostatic hyperplasia is one of the clinical problems that requires reliable methods for differentiation and prediction.
      Objective  To construct a classification model for prostate cancer and prostate hyperplasia based on XGBoost algorithm, identify the risk indicators of prostate cancer and evaluate their values in clinical application.
      Methods  Clinical data about patients with prostate cancer or prostate hyperplasia were obtained from “Prostate Cancer Dataset” (provided by the National Clinical Medical Science Data Center of Chinese PLA General Hospital and National Population and Health Science Data Sharing Platform in 2019). After data preprocessing, the data set was divided into training set and test set with the ratio of 0.7 and 0.3. XGBoost algorithm was used to construct a classification model of prostate cancer and prostate hyperplasia using the training set, and the effectiveness of the model was verified based on the test set. Finally, the characteristics of the model were explained using SHapley Additive exPlanations (SHAP) analysis method.
      Results  Totally 1 224 patients with prostate cancer (average age of 65.86 years) and 1 255 patients with prostatic hyperplasia (average age of 67.70 years) were included. Twenty-three characteristics including age, BMI, prostate specific antigen (PSA) series indicators and other biochemical test indicators were selected to construct the classification model. The AUC, accuracy, recall, precision and F1 of the model was 0.81, 0.74, 0.70, 0.72, 0.74 respectively. Free-PSA/total-PSA, total PSA, inorganic phosphorus, and free PSA were the four most important factors in early prostate diagnosis. SHAP analysis results showed that Free-PSA/total-PSA ≤ 0.132 and inorganic phosphorus ≥ 1.09 mmol/L was the cut-off value that needed attention in the diagnosis of prostate cancer.
      Conclusion  XGBoost algorithm can help to construct an effective model to classify prostate cancer patients and prostate hyperplasia patients, and the cut off values of important risk indicators using SHAP analysis provide a certain reference for the early diagnosis of prostate cancer.

     

/

返回文章
返回