Multiple Human Motion Understanding

Lei Li; Sen Jia; Jenq-Neng Hwang

doi:10.1609/aaai.v40i8.37556

Authors

Lei Li VitaSight University of Washington
Sen Jia VitaSight University of Washington
Jenq-Neng Hwang University of Washington, Seattle

DOI:

https://doi.org/10.1609/aaai.v40i8.37556

Abstract

We introduce LLaMMo (Large Language and Multi-Person Motion Assistant), the first instruction-tuning multimodal framework tailored for multi-human motion analysis. LLaMMo incorporates a novel human-centric and social-temporal learner that models and fuses both intra-person dynamics and inter-person dependencies, yielding robust, context-aware representations of complex group behaviors while maintaining low computational overhead. To support LLaMMo, we construct LLaVerse, a large-scale dataset with fine-grained manual annotations covering diverse multi-person activities spanning daily social interaction and professional team sports. Built on top of LLaVerse, we also propose LLaMI-Bench, a dedicated benchmark for evaluating multi-human behavior understanding across motion and video modalities. Extensive experiments demonstrate that LLaMMo consistently outperforms baselines in understanding multi-person interactions under low-latency settings, with notable gains in both social and sport-specific contexts.

Multiple Human Motion Understanding

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information