BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070246Z LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231214T092500 DTEND;TZID=Australia/Melbourne:20231214T094000 UID:siggraphasia_SIGGRAPH Asia 2023_sess124_papers_482@linklings.com SUMMARY:AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars DESCRIPTION:Technical Papers, TOG\n\nMohit Mendiratta, Xingang Pan, Mohame d Elgharib, Kartik Teotia, and Mallikarjun B R (Max Planck Institute for I nformatics); Ayush Tewari (MIT CSAIL); Vladislav Golyanik (Max Planck Inst itute for Informatics); Adam Kortylewski (Max Planck Institute for Informa tics, University of Freiburg); and Christian Theobalt (Max Planck Institut e for Informatics)\n\nCapturing and editing full head performances enables the creation of virtual characters with various applications such as exte nded reality and media production. The past few years witnessed a steep ri se in the photorealism of human head avatars. Such avatars can be controll ed through different input data modalities, including RGB, audio, depth, I MUs and others. While these data modalities provide effective means of con trol, they mostly focus on editing the head movements such as the facial e xpressions, head pose and/or camera viewpoint. In this paper, we propose A vatarStudio, a text-based method for editing the appearance of a dynamic f ull head avatar. Our approach builds on existing work to capture dynamic p erformances of human heads using neural radiance field (NeRF) and edits th is representation with a text-to-image diffusion model. Specifically, we i ntroduce an optimization strategy for incorporating multiple keyframes rep resenting different camera viewpoints and time stamps of a video performan ce into a single diffusion model. Using this personalized diffusion model, we edit the dynamic NeRF by introducing view-and-time-aware Score Distill ation Sampling (VT-SDS) following a model-based guidance approach. Our met hod edits the full head in a canonical space, and then propagates these ed its to remaining time steps via a pretrained deformation network. We evalu ate our method visually and numerically via a user study, and results show that our method outperforms existing approaches. Our experiments validate the design choices of our method and highlight that our edits are genuine , personalized, as well as 3D- and time-consistent.\n\nRegistration Catego ry: Full Access\n\nSession Chair: Lin Gao (University of Chinese Academy o f Sciences) URL:https://asia.siggraph.org/2023/full-program?id=papers_482&sess=sess124 END:VEVENT END:VCALENDAR