宋亚男, 武惠韬, 应俊, 李琬悦, 陈康, 刘铁城, 张卯年, 张颖. 基于机器学习算法探讨糖尿病视网膜病变的风险因素[J]. 解放军医学院学报, 2021, 42(9): 906-912. DOI: 10.3969/j.issn.2095-5227.2021.09.003
引用本文: 宋亚男, 武惠韬, 应俊, 李琬悦, 陈康, 刘铁城, 张卯年, 张颖. 基于机器学习算法探讨糖尿病视网膜病变的风险因素[J]. 解放军医学院学报, 2021, 42(9): 906-912. DOI: 10.3969/j.issn.2095-5227.2021.09.003
SONG Ya'nan, WU Huitao, YING Jun, LI Wanyue, CHEN Kang, LIU Tiecheng, ZHANG Maonian, ZHANG Ying. Risk factors analysis of diabetic retinopathy based on machine learning[J]. ACADEMIC JOURNAL OF CHINESE PLA MEDICAL SCHOOL, 2021, 42(9): 906-912. DOI: 10.3969/j.issn.2095-5227.2021.09.003
Citation: SONG Ya'nan, WU Huitao, YING Jun, LI Wanyue, CHEN Kang, LIU Tiecheng, ZHANG Maonian, ZHANG Ying. Risk factors analysis of diabetic retinopathy based on machine learning[J]. ACADEMIC JOURNAL OF CHINESE PLA MEDICAL SCHOOL, 2021, 42(9): 906-912. DOI: 10.3969/j.issn.2095-5227.2021.09.003

基于机器学习算法探讨糖尿病视网膜病变的风险因素

Risk factors analysis of diabetic retinopathy based on machine learning

  • 摘要:
      背景  糖尿病视网膜病变(diabetic retinopathy,DR)是糖尿病患者主要并发症之一,其病程进行性发展可致视功能损伤甚至失明。探索影响DR进展的临床因素对糖尿病患者预防、控制和管理DR具有重要意义。
      目的  通过机器学习算法和沙普利可加性特征解释方法(SHAP)分析探讨2型糖尿病患者并发DR的风险因素。
      方法  回顾性分析“国家人口与健康科学数据共享平台”公布的“解放军总医院糖尿病并发症预警数据集”3000例2型糖尿病患者的临床资料,对58项观察变量在无DR并发症(non diabetic retinopathy,NDR)患者和并发DR患者两组组间进行基线分析以及差异性检验;评判XGBoost、随机森林、logistic回归三种机器学习算法,采用递归特征消除(RFE)和XGBoost机器学习算法选取最优模型预测变量,并对变量特征权重值排序;应用SHAP方法对模型的风险因子进行解释分析。
      结果  DR组的高血压症(收缩压/舒张压)、糖化血红蛋白、血脂水平(总胆固醇、低密度脂蛋白)、脑卒中、肾病(血尿素、血肌酐、血尿酸)、肾衰、下肢动脉病变等并发比例或指标水平高于NDR组(P < 0.05),而年龄、冠心病、心肌梗死、高脂血症、动脉粥样硬化症等低于NDR组(P < 0.05)。XGBoost较其他模型表现更佳,模型中排在前十位的重要区分特征为肾病、冠心病、下肢动脉病变、身高、其他肿瘤、糖化血红蛋白、血尿素、血清白蛋白、肾衰、高脂血症。SHAP集成散点图解释XGBoost模型中变量的重要性依次为糖化血红蛋白(0.59)、肾病(0.44)、血尿素(0.32)、下肢动脉病变(0.25),四项的SHAP值 > 0且绝对值均高。同时SHAP值分布呈现明显分类,即DR的显著危险因素。糖化血红蛋白、肾病、血尿素对DR病程影响呈现潜在交互关系,且血尿素 > 5 mmol/L时DR风险显著升高。
      结论  XGBoost算法和SHAP模型可用于预测糖尿病患者DR的风险因素及解释特征变量交互关系,提示糖化血红蛋白、合并肾病、血尿素水平对DR这一2型糖尿病微血管并发症的高风险预测性。

     

    Abstract:
      Background  Diabetic retinopathy (DR) is one of the main complications in patients with diabetes. The progressive development of DR can lead to visual impairment and even blindness. It is of great significance to explore the clinical factors affecting the progress of DR for its prevention, control and management in diabetic patients.
      Objective  To explore the risk factors of diabetic retinopathy (DR) in patients with type 2 diabetes mellitus by machine learning algorithms and SHAP analysis.
      Methods  A retrospective analysis was performed for the clinical data about 3000 patients with type 2 diabetes mellitus in the early warning data set of diabetes complications of Chinese PLA General Hospital published by ‘The national population and health science data sharing platform’, baseline analysis and difference tests were carried out for 58 observed variables between non diabetic retinopathy (NDR) group and DR group. Three machine learning algorithms including XGBoost, random forest and logistic regression were evaluated. Recursive feature elimination (RFE) and XGBoost, were employed to rank the characteristic weight values of the optimal variables. The risk factors of the model were explained and analyzed by the method of SHAP.
      Results  The incidences or index levels of hypertension (systolic/diastolic blood pressure), glycosylated hemoglobin (HbA1c), blood lipid level (total cholesterol, low density lipoprotein), stroke, kidney disease (blood urea, serum creatinine, serum uric acid), renal failure, lower extremity artery disease in DR group were higher than those in NDR group (all P < 0.05); while the average age and incidences of coronary heart disease, myocardial infarction, hyperlipidemia, atherosclerosis were lower than those in NDR group (P < 0.05). The top ten important distinguishing features of XGBoost model were kidney disease, coronary heart disease, lower extremity artery disease, height, other tumors, HbA1c, blood urea, serum albumin, renal failure and hyperlipidemia. XGBoost model was better than other models. The importance of variables in XGBoost model was explained by SHAP integrated scatter diagram: the SHAP values were > 0 and the mean absolute values were higher in HbA1c (0.59), nephropathy (0.44), blood urea (0.32) and lower extremity arterial disease (0.25), and the distribution of SHAP values showed obvious classification, suggesting that they were the significant risk factors of DR. HbA1c, kidney disease and blood urea had potential interaction on the development of DR, and the risk of DR was significantly increased when blood urea was > 5 mmol/L.
      Conclusion  XGBoost algorithm and SHAP model perform well in predicting the risk factors of DR in patients with diabetes and in explaining the interaction between characteristic variables, suggesting that HbA1c, nephropathy and blood urea level are predictive indicators of DR.

     

/

返回文章
返回