王逸飞, 康季槐, 应俊, 杨俊杰, 陈康. 基于大数据建模的冠心病发病风险指标评估[J]. 解放军医学院学报, 2019, 40(8): 725-729. DOI: 10.3969/j.issn.2095-5227.2019.08.005
引用本文: 王逸飞, 康季槐, 应俊, 杨俊杰, 陈康. 基于大数据建模的冠心病发病风险指标评估[J]. 解放军医学院学报, 2019, 40(8): 725-729. DOI: 10.3969/j.issn.2095-5227.2019.08.005
WANG Yifei, KANG Jihuai, YING Jun, YANG Junjie, CHEN Kang. Risk assessment of coronary heart disease based on big data modeling[J]. ACADEMIC JOURNAL OF CHINESE PLA MEDICAL SCHOOL, 2019, 40(8): 725-729. DOI: 10.3969/j.issn.2095-5227.2019.08.005
Citation: WANG Yifei, KANG Jihuai, YING Jun, YANG Junjie, CHEN Kang. Risk assessment of coronary heart disease based on big data modeling[J]. ACADEMIC JOURNAL OF CHINESE PLA MEDICAL SCHOOL, 2019, 40(8): 725-729. DOI: 10.3969/j.issn.2095-5227.2019.08.005

基于大数据建模的冠心病发病风险指标评估

Risk assessment of coronary heart disease based on big data modeling

  • 摘要:
      目的  基于大样本流行病学调查数据量化评价冠心病的发病风险,筛选风险指标。
      方法  收集2015年解放军总医院开展的社区慢性疾病流行病学调查资料19 021例,包括个人信息及生活习惯、病史及家族史、检验指标和心电图检查指标,剔除完整度不足70%的样本,使用步进式K-最近邻法进行缺失值填补,选用Adaboost算法进行风险评估,并采用10折交叉法进行模型验证。
      结果  年龄、高血压病程、血脂异常、其他共病、糖尿病病程和低密度脂蛋白胆固醇是评估冠心病发病风险的重要指标;模型对冠心病发病风险评估的召回率、准确率、AUC与F1值分别为0.727、0.741、0.796与0.796。
      结论  本研究建立的模型可为预测个体冠心病患病风险提供参考。

     

    Abstract:
      Objective  To quantitatively evaluate the risk of coronary heart disease based on large scale epidemiological surveillance data.
      Methods  Epidemiological data, including demographic information and living habits, medical and family history, testing indicators and electrocardiogram indicators, were collected from 19 021 cases with chronic disease by community survey that was conducted by Chinese PLA General Hospital in 2015. Samples with less than 70% data completeness were eliminated. Stepped K-Nearest Neighbor method was used to fill the missing value, Adaboost algorithm was used to assess the risk of coronary heart disease, and 10-fold crossover method was applied for model validation.
      Results  Age, duration of hypertension, dyslipidemia, presence of other comorbidities, duration of diabetes and low-density lipoprotein cholesterol were important indicators for evaluating the incidence of coronary heart disease. The recall rate, accuracy, AUC and F1 values of the model for evaluating the risk of coronary heart disease were 0.727, 0.741, 0.796 and 0.796, respectively.
      Conclusion  Our model can provide personalized prediction of the risk of coronary heart disease.

     

/

返回文章
返回