BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B7 (1)\, B Block\, Level 7
DTSTART;TZID=Asia/Tokyo:20241204T113100
DTEND;TZID=Asia/Tokyo:20241204T114300
UID:siggraphasia_SIGGRAPH Asia 2024_sess114_papers_382@linklings.com
SUMMARY:Autonomous Character-Scene Interaction Synthesis from Text Instruc
 tion
DESCRIPTION:Technical Papers\n\nNan Jiang (Peking University, Beijing Inst
 itute for General Artificial Intelligence); Zimo He (Peking University); Z
 i Wang (Beijing University of Posts and Telecommunications); Hongjie Li (P
 eking University); Yixin Chen and Siyuan Huang (Beijing Institute for Gene
 ral Artificial Intelligence); and Yixin Zhu (Peking University)\n\nSynthes
 izing human motions in 3D environments, particularly those with complex ac
 tivities such as locomotion, hand-reaching, and human-object interaction, 
 presents substantial demands for user-defined waypoints and stage transiti
 ons. These requirements pose challenges for current models, leading to a n
 otable gap in automating the animation of characters from simple human inp
 uts. This paper addresses this challenge by introducing a comprehensive fr
 amework for synthesizing multi-stage scene-aware interaction motions direc
 tly from a single text instruction and goal location. Our approach employs
  an auto-regressive diffusion model to synthesize the next motion segment,
  along with an autonomous scheduler predicting the transition for each act
 ion stage. To ensure that the synthesized motions are seamlessly integrate
 d within the environment, we propose a scene representation that considers
  the local perception both at the start and the goal location. We further 
 enhance the coherence of the generated motion by integrating frame embeddi
 ngs with language input. Additionally, to support model training, we prese
 nt a comprehensive motion-captured dataset comprising 16 hours of motion s
 equences in 120 indoor scenes covering 40 types of motions, each annotated
  with precise language descriptions. Experimental results demonstrate the 
 efficacy of our method in generating high-quality, multi-stage motions clo
 sely aligned with environmental and textual conditions.\n\nRegistration Ca
 tegory: Full Access, Full Access Supporter\n\nLanguage Format: English Lan
 guage\n\nSession Chair: Kai Wang (Amazon)
URL:https://asia.siggraph.org/2024/program/?id=papers_382&sess=sess114
END:VEVENT
END:VCALENDAR