Abstract:
Background With the development of medical informatization, medical big data platforms have become an important foundation for clinical research and a key breakthrough point for resource re analysis and utilization. However, the multi source heterogeneity of medical data, inconsistent data standards, and patient privacy data security have increased the difficulty of data collection and application.Objective To analyze the requirements for establishing a medical big data platform, develop a self service, full-process data governance platform and tools, and construct a multi-center medical big data platform.Methods With the development of technologies such as big data and artificial intelligence, we have conducted a comprehensive review of hospital data application needs. By adopting a modular and component-based design approach, we have designed the platform architecture. We have extracted universally applicable components and management tools that are relatively independent of specific application systems. These components and tools have been utilized to build a multi-center, multi-source heterogeneous medical big data platform. Results The electronic medical record data of the outpatient, emergency and inpatient departments of the General Hospital of the People's Liberation Army have been aggregated and processed. A full-process, visual data governance tool has been developed. The defined event schema graph covers 29 ontology categories, 128 concepts, 1 009 relationships and 3 022 attributes, including clinical evidence-based medical knowledge, clinical diagnosis and treatment, Internet of Things, medical imaging and other data. The consistency and traceability of data governance have reached 99.99%, and the knowledge accuracy rate is over 95%. A one-stop data intelligent retrieval and scientific research analysis system and a specialized disease database intelligent analysis system covering the entire scientific research process and different research types have been constructed.Conclusion The platform not only provides data retrieval and analysis capabilities for clinical researchers, but also provides data governance and platform maintenance tools for data engineers, which improves the scalability and flexibility of the platform.