PENG, JinZe YU, Zijing ZENG, Tian YUAN, SHI, PAN, ZHANG. Video-Based differential diagnosis of Parkinson's disease and essential tremor using RTMPose and PatchTST[J]. ACADEMIC JOURNAL OF CHINESE PLA MEDICAL SCHOOL. DOI: 10.12435/j.issn.2095-5227.25010501
Citation: PENG, JinZe YU, Zijing ZENG, Tian YUAN, SHI, PAN, ZHANG. Video-Based differential diagnosis of Parkinson's disease and essential tremor using RTMPose and PatchTST[J]. ACADEMIC JOURNAL OF CHINESE PLA MEDICAL SCHOOL. DOI: 10.12435/j.issn.2095-5227.25010501

Video-Based differential diagnosis of Parkinson's disease and essential tremor using RTMPose and PatchTST

  • Background Parkinson's disease (PD) and essential tremor (ET) share overlapping clinical manifestations. Current diagnostic approaches rely on subjective rating scales by neurologists, which are time-consuming and limited inter-rater consistency. Objective To develop an intelligent classification model based on video analysis, integrating deep learning to achieve efficient and automated differentiation between PD and ET, thereby offering a novel approach for non-invasive diagnosis. Methods A total of 14 PD and 63 ET patients from the Outpatient Department of Chinese PLA General Hospital (2021 — 2024) were enrolled. A dataset comprising 1 136 video clips was collected during the performance of three standardized upper limb motor tasks: finger-tonose, hand pronation-supination, and fist opening-closing. Using the RTMPose model within the MMPose framework, keypoint coordinates of the wrist and fingers were extracted. Kinematic features such as displacement, velocity, and acceleration were computed to construct a dataset of spatiotemporal trajectories and statistical descriptors. A Transformer-based PatchTST model was developed, in which temporal sequences were segmented into patches and processed via global attention mechanisms. Model performance was compared against logistic regression, XGBoost, random forest, support vector machine, Informer, and long shortterm memory (LSTM) networks. Results The PatchTST model achieved the best average performance when combining keypoint coordinates with kinematic features. The highest accuracy was observed in the finger-to-nose task (AUC = 0.957), with an overall average AUC of 0.897 among all three tasks. Among the 21 model-feature combinations, the LSTM model using only kinematic features performed the worst, with an average AUC of 0.691. Conclusion The video-based intelligent diagnostic method for PD and ET differentiation leverages human pose estimation and deep learning technologies, enabling high-precision and efficient contactless remote diagnosis, which provides a reference for early diagnosis and management of movement disorders.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return