基于电子病历数据和知识增强的医疗大语言模型构建方法研究

王博; 于志昊; 张军雁; 石戈; 冯冲; 庄严; 何昆仑

doi:10.12435/j.issn.2095-5227.24070107

基于电子病历数据和知识增强的医疗大语言模型构建方法研究

A paradigm for constructing medical large language models based on electronic health record data and knowledge enhancement

摘要

摘要:
背景电子病历数据在构建医疗领域大规模语言模型中具有关键作用。
目的研究一种基于通用大语言模型的三阶段训练范式，以充分挖掘电子病历数据的价值。
方法第一阶段，利用大规模电子病历文本对预训练的通用模型进行进一步训练，增强其医疗领域的语言知识；第二阶段，利用标注的电子病历数据针对特定临床任务对模型进行微调，从而赋予其专业的任务处理能力；第三阶段，通过结合医师的反馈优化模型输出，进一步提升其决策的准确性和可解释性。
结果该方法显著提升了模型在临床任务中的表现，减少了模型产生幻觉的现象，并增强了输出的可信度。
结论该研究为构建规范化、可信赖的医疗大规模语言模型提供了有效的方法，具有重要的实际应用价值。

Abstract:
Background Electronic Medical Records (EMRs) play a pivotal role in training large-scale language models (LLMs) within the medical domain.
Objective To explore the value of electronic medical record data by studying a three-stage training paradigm based on a general large language model.
Methods Firstly, in the continued training phase, extensive EMR texts were employed to further train the pre-existing general model, thereby enhancing its medical-specific linguistic knowledge. Secondly, during the supervised fine-tuning phase, annotated EMR data were utilized to modify the model for specific clinical tasks such as medical named entity recognition and clinical trial screening, enabling the model to acquire specialized task-oriented skills. Finally, in the reinforcement learning phase, feedback from doctors was integrated to optimize the model's outputs, improving the accuracy and interpretability of decision-making.
Results Experimental results demonstrated that the model's performance in clinical tasks was significantly enhanced, the occurrence of hallucinations was mitigated, and the reliability of its outputs was improved.
Conclusion This study provides an effective approach for constructing standardized and trustworthy medical LLMs, offering substantial practical application value.

HTML全文

参考文献(18)

施引文献

资源附件(0)