BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023313Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241206T130000 DTEND;TZID=Asia/Tokyo:20241206T131400 UID:siggraphasia_SIGGRAPH Asia 2024_sess147_tog_106@linklings.com SUMMARY:Speed-Aware Audio-Driven Speech Animation using Adaptive Windows DESCRIPTION:Technical Papers\n\nSunjin Jung (KAIST, Visual Media Lab); Yeo ngho Seol (NVIDIA); Kwanggyoon Seo and Hyeonho Na (KAIST, Visual Media Lab ); Seonghyeon Kim (KAIST, Visual Media Lab; Anigma Technologies); and Vane ssa Tan and Junyong Noh (KAIST, Visual Media Lab)\n\nWe present a novel me thod that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic info rmation. Our system consists of three parts. First, the speaking rate is e stimated from the input audio using a neural network trained in a self-sup ervised manner. Second, the appropriate window size that encloses the audi o features is predicted adaptively based on the estimated speaking rate. A nother key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate o n detailed information and a large window to consider broad phonemic infor mation near the center frame. Finally, the speech animation is generated f rom the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fas t raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.\n\nRegistration Category: Full Access, Full Access Supporter\n\nLanguage Format: English Language\n\ nSession Chair: Yi Zhou (Adobe) URL:https://asia.siggraph.org/2024/program/?id=tog_106&sess=sess147 END:VEVENT END:VCALENDAR