BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070250Z LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231215T155000 DTEND;TZID=Australia/Melbourne:20231215T160000 UID:siggraphasia_SIGGRAPH Asia 2023_sess139_papers_222@linklings.com SUMMARY:Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture DESCRIPTION:Technical Papers\n\nShaohua Pan, Qi Ma, and Xinyu Yi (Tsinghua University); Weifeng Hu, Xiong Wang, Xingkang ZHOU, and Jijunnan LI (OPPO Research Institute); and Feng Xu (Tsinghua University)\n\nEither RGB imag es or inertial signals have been used for the task of motion capture (moca p), but combining them together is a new and interesting topic. We believe that the combination is complementary and able to solve the inherent diff iculties of using one modality input, including occlusions, extreme lighti ng/texture, and out-of-view for visual mocap and global drifts for inertia l mocap. To this end, we propose a method that fuses monocular images and sparse IMUs for real-time human motion capture. Our method contains a dual coordinate strategy to fully explore the IMU signals with different goals in motion capture. To be specific, besides one branch transforming the IM U signals to the camera coordinate system to combine with the image inform ation, there is another branch to learn from the IMU signals in the body r oot coordinate system to better estimate body poses. Furthermore, a hidden state feedback mechanism is proposed for both two branches to compensate for their own drawbacks in extreme input cases. Thus our method can easily switch between the two kinds of signals or combine them in different case s to achieve a robust mocap. Quantitative and qualitative results demonstr ate that by delicately designing the fusion method, our technique signific antly outperforms the state-of-the-art vision, IMU, and combined methods o n both global orientation and local pose estimation.\n\nRegistration Categ ory: Full Access\n\nSession Chair: Yuting Ye (Reality Labs Research, Meta) URL:https://asia.siggraph.org/2023/full-program?id=papers_222&sess=sess139 END:VEVENT END:VCALENDAR