BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241204T130000 DTEND;TZID=Asia/Tokyo:20241204T131100 UID:siggraphasia_SIGGRAPH Asia 2024_sess117_papers_891@linklings.com SUMMARY:FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Exp ression Foundation Model DESCRIPTION:Technical Papers\n\nFeng Qiu and Wei Zhang (Netease); Chen Liu (University of Queensland, Netease); Rudong An, Lincheng Li, Yu Ding, Cha ngjie Fan, and Zhipeng Hu (Netease); and Xin Yu (University of Queensland) \n\nVideo-driven 3D facial animation transfer aims to drive avatars to rep roduce the expressions of actors. Existing methods have achieved remarkabl e results by constraining both geometric and perceptual consistency. Howev er, geometric constraints (like those designed on facial landmarks) are in sufficient to capture subtle emotions, while expression features trained o n classification tasks lack fine granularity for complex emotions. To addr ess this, we propose \textbf{FreeAvatar}, a robust facial animation transf er method that relies solely on our learned expression representation. Spe cifically, FreeAvatar consists of two main components: the expression foun dation model and the facial animation transfer model. In the first compone nt, we initially construct a facial feature space through a face reconstru ction task and then optimize the expression feature space by exploring the similarities among different expressions. Benefiting from training on the amounts of unlabeled facial images and re-collected expression comparison dataset, our model adapts freely and effectively to any in-the-wild input facial images. In the facial animation transfer component, we propose a n ovel Expression-driven Multi-avatar Animator, which first maps expressive semantics to the facial control parameters of 3D avatars and then imposes perceptual constraints between the input and output images to maintain exp ression consistency. To make the entire process differentiable, we employ a trained neural renderer to translate rig parameters into corresponding i mages. Furthermore, unlike previous methods that require separate decoders for each avatar, we propose a dynamic identity injection module that allo ws for the joint training of multiple avatars within a single network. The comparisons show that our method achieves prominent performance even with out introducing any geometric constraints, highlighting the robustness of our FreeAvatar.\n\nRegistration Category: Full Access, Full Access Support er\n\nLanguage Format: English Language\n\nSession Chair: Jungdam Won (Seo ul National University) URL:https://asia.siggraph.org/2024/program/?id=papers_891&sess=sess117 END:VEVENT END:VCALENDAR