BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241205T170500 DTEND;TZID=Asia/Tokyo:20241205T171600 UID:siggraphasia_SIGGRAPH Asia 2024_sess138_papers_215@linklings.com SUMMARY:TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenac tment with Diffusion Model DESCRIPTION:Technical Papers\n\nJiazhi Guan (Tsinghua University); Quanwei Yang (University of Science and Technology of China); Kaisiyuan Wang, Han g Zhou, Shengyi He, Zhiliang Xu, Haocheng Feng, Errui Ding, and Jingdong W ang (Baidu); Hongtao Xie (University of Science and Technology of China); Youjian Zhao (Tsinghua University); and Ziwei Liu (Nanyang Technological U niversity (NTU))\n\nRecently, 2D speaking avatars have increasingly partic ipated in everyday scenarios due to the fast development of facial animati on techniques. However, most existing works neglect the explicit control o f human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by rec ent advances in diffusion models, we propose the Motion-Enhanced Textural- Aware ModeLing for SpeaKing Avatar Reenactment (TALK-Act) framework, which enables high-fidelity avatar reenactment from only short footage of monoc ular video. Our key idea is to enhance the textural awareness with explici t motion guidance in diffusion modeling. Specifically, we carefully constr uct 2D and 3D structural information as intermediate guidance. While recen t diffusion models adopt a side network for control information injection, they fail to synthesize temporally stable results even with person-specif ic fine-tuning. We propose a Motion-Enhanced Textural Alignment module to enhance the bond between driving and target signals. Moreover, we build a Memory-based Hand-Recovering module to help with the difficulties in hand- shape preserving. After pre-training, our model can achieve high-fidelity 2D avatar reenactment with only 30 seconds of person-specific data. Extens ive experiments demonstrate the effectiveness and superiority of our propo sed framework.\n\nRegistration Category: Full Access, Full Access Supporte r\n\nLanguage Format: English Language\n\nSession Chair: Hongbo Fu (Hong K ong University of Science and Technology) URL:https://asia.siggraph.org/2024/program/?id=papers_215&sess=sess138 END:VEVENT END:VCALENDAR