Recognition of human movement utilizing time sequence from cell and wearable units is usually used as key context data for varied functions, from well being situation monitoring to sports activities exercise evaluation to person behavior research. Nonetheless, accumulating large-scale movement time sequence information stays difficult attributable to safety or privateness considerations. Within the movement time sequence area, the dearth of datasets and an efficient pre-training job makes it tough to develop comparable fashions that may function with restricted information. Sometimes, current fashions carry out coaching and testing on the identical dataset, and so they wrestle to generalize throughout totally different datasets given three distinctive challenges inside the movement time sequence downside area: First, putting units in several places on the physique—like on the wrist versus the leg—results in very totally different information, which makes it robust to make use of a mannequin skilled for one spot on one other half. Second, since units could be held in varied orientations, it’s problematic as a result of fashions skilled with a tool in a single place usually wrestle when the gadget is held in another way. Lastly, totally different datasets usually concentrate on several types of actions, making it exhausting to check or mix the information successfully.
The traditional movement time sequence classification depends on separate classifiers for every dataset, utilizing strategies like statistical characteristic extraction, CNNs, RNNs, and a focus fashions. Common-purpose fashions like TimesNet and SHARE purpose for job versatility, however they require coaching or testing on the identical dataset; therefore, they restrict adaptability. Self-supervised studying helps in illustration studying, although generalization throughout varied datasets stays difficult. Pretrained fashions like ImageBind and IMU2CLIP contemplate movement and textual content information, however they’re constrained by device-specific coaching. Strategies that use massive language fashions (LLMs) depend on prompts however have problem recognizing complicated actions as they aren’t skilled on uncooked movement time sequence and wrestle with precisely recognizing complicated actions.
A bunch of researchers from UC San Diego, Amazon, and Qualcomm proposed UniMTS as the primary unified pre-training process for movement time sequence that generalizes throughout various gadget latent elements and actions. UniMTS makes use of a contrastive studying framework to hyperlink movement time sequence information with enriched textual content descriptions from massive language fashions (LLMs). This helps the mannequin to know the that means behind totally different actions and permits it to generalize throughout varied actions. For giant-scale pre-training, UniMTS generates movement time sequence information primarily based on current detailed skeleton information, which covers varied physique components. The generated information is then processed utilizing graph networks to seize each spatial and temporal relationships throughout totally different gadget places, serving to the mannequin generalize to information from totally different gadget placements.
The method begins by creating movement information from skeleton actions and adjusting it in response to totally different orientations. It additionally makes use of a graph encoder to know how joints join so it will probably work nicely throughout totally different units. The textual content descriptions are improved utilizing massive language fashions. To create movement information, it calculates the velocities and accelerations of every joint whereas it considers their positions and orientations, including noise to imitate real-world sensor errors. To deal with inconsistencies in gadget orientation, UniMTS makes use of information augmentation to create random orientations throughout pre-training. This technique takes into consideration variations in gadget positions and axis setups. By aligning movement information with textual content descriptions, the mannequin can adapt nicely to totally different orientations and exercise sorts. For coaching, UniMTS employs rotation-invariant information augmentation to deal with gadget positioning variations. It was examined on the HumanML3D dataset and 18 different real-world movement time sequence benchmark datasets, notably with a efficiency enchancment of 340% within the zero-shot setting, 16.3% within the few-shot setting, and 9.2% within the full-shot setting, in contrast with the respective best-performing baselines. The mannequin’s efficiency was in comparison with baselines like ImageBind and IMU2CLIP. Outcomes confirmed UniMTS outperformed different fashions, notably in zero-shot settings, primarily based on statistical exams that confirmed vital enhancements.
In conclusion, the proposed pre-trained mannequin UniMTS is solely primarily based on physics-simulated information, but it reveals exceptional generalization throughout various real-world movement time sequence datasets that includes totally different gadget places, orientations, and actions. Whereas leveraging its efficiency from conventional strategies, UniMTS possesses some limitations, too. In a broader sense, this pre-trained movement time sequence classification mannequin can act as a possible base for the upcoming analysis within the area of human movement recognition!
Try the Paper, GitHub, and Mannequin on Hugging Face. All credit score for this analysis goes to the researchers of this venture. Additionally, don’t overlook to comply with us on Twitter and be a part of our Telegram Channel and LinkedIn Group. In the event you like our work, you’ll love our e-newsletter.. Don’t Overlook to affix our 55k+ ML SubReddit.
[Sponsorship Opportunity with us] Promote Your Analysis/Product/Webinar with 1Million+ Month-to-month Readers and 500k+ Group Members
Divyesh is a consulting intern at Marktechpost. He’s pursuing a BTech in Agricultural and Meals Engineering from the Indian Institute of Expertise, Kharagpur. He’s a Knowledge Science and Machine studying fanatic who desires to combine these main applied sciences into the agricultural area and resolve challenges.