BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023313Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241206T131400 DTEND;TZID=Asia/Tokyo:20241206T132800 UID:siggraphasia_SIGGRAPH Asia 2024_sess147_papers_977@linklings.com SUMMARY:SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic I njection with Large-Scale Pre-Training Diffusion Models DESCRIPTION:Technical Papers\n\nQingrong Cheng (Tencent AI Lab, Tencent TI MI L1 Studio) and Xu Li and Xinghui Fu (Tencent AI Lab)\n\nThe automated s ynthesis of high-quality 3D gestures from speech holds significant value f or virtual humans and gaming. Previous methods primarily focus on synchron izing gestures with speech rhythm, often neglecting semantic gestures. The se semantic gestures are sparse and follow a long-tailed distribution acro ss the gesture sequence, making them challenging to learn in an end-to-end manner. Additionally, generating rhythmically aligned gestures that gener alize well to in-the-wild speech remains a significant challenge. To addre ss these issues, we introduce SIGGesture, a novel diffusion-based approach for synthesizing realistic gestures that are both high-quality and semant ically pertinent. Specifically, we firstly build a robust diffusion-based foundation model for rhythmical gesture synthesis by pre-training it on a collected large-scale dataset with pseudo labels. Secondly, we leverage t he powerful generalization capabilities of Large Language Models (LLMs) to generate appropriate semantic gestures for various speech transcripts. Fi nally, we propose a semantic injection module to infuse semantic informati on into the synthesized results during the diffusion reverse process. Exte nsive experiments demonstrate that SIGGesture significantly outperforms ex isting baselines, exhibiting excellent generalization and controllability. \n\nRegistration Category: Full Access, Full Access Supporter\n\nLanguage Format: English Language\n\nSession Chair: Yi Zhou (Adobe) URL:https://asia.siggraph.org/2024/program/?id=papers_977&sess=sess147 END:VEVENT END:VCALENDAR