BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070240Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_482@linklings.com SUMMARY:AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars DESCRIPTION:Technical Papers\n\nMohit Mendiratta, Xingang Pan, Mohamed Elg harib, Kartik Teotia, and Mallikarjun B R (Max Planck Institute for Inform atics); Ayush Tewari (MIT CSAIL); Vladislav Golyanik (Max Planck Institute for Informatics); Adam Kortylewski (Max Planck Institute for Informatics, University of Freiburg); and Christian Theobalt (Max Planck Institute for Informatics)\n\nCapturing and editing full head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled th rough different input data modalities, including RGB, audio, depth, IMUs a nd others. While these data modalities provide effective means of control, they mostly focus on editing the head movements such as the facial expres sions, head pose and/or camera viewpoint. In this paper, we propose Avatar Studio, a text-based method for editing the appearance of a dynamic full h ead avatar. Our approach builds on existing work to capture dynamic perfor mances of human heads using neural radiance field (NeRF) and edits this re presentation with a text-to-image diffusion model. Specifically, we introd uce an optimization strategy for incorporating multiple keyframes represen ting different camera viewpoints and time stamps of a video performance in to a single diffusion model. Using this personalized diffusion model, we e dit the dynamic NeRF by introducing view-and-time-aware Score Distillation Sampling (VT-SDS) following a model-based guidance approach. Our method e dits the full head in a canonical space, and then propagates these edits t o remaining time steps via a pretrained deformation network. We evaluate o ur method visually and numerically via a user study, and results show that our method outperforms existing approaches. Our experiments validate the design choices of our method and highlight that our edits are genuine, per sonalized, as well as 3D- and time-consistent.\n\nRegistration Category: F ull Access, Enhanced Access, Trade Exhibitor, Experience Hall Exhibitor URL:https://asia.siggraph.org/2023/full-program?id=papers_482&sess=sess209 END:VEVENT END:VCALENDAR