BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260114T163645Z
LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231214T092500
DTEND;TZID=Australia/Melbourne:20231214T094000
UID:siggraphasia_SIGGRAPH Asia 2023_sess124_papers_482@linklings.com
SUMMARY:AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
DESCRIPTION:Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia
 , and Mallikarjun B R (Max Planck Institute for Informatics); Ayush Tewari
  (MIT CSAIL); Vladislav Golyanik (Max Planck Institute for Informatics); A
 dam Kortylewski (Max Planck Institute for Informatics, University of Freib
 urg); and Christian Theobalt (Max Planck Institute for Informatics)\n\nCap
 turing and editing full head performances enables the creation of virtual 
 characters with various applications such as extended reality and media pr
 oduction. The past few years witnessed a steep rise in the photorealism of
  human head avatars. Such avatars can be controlled through different inpu
 t data modalities, including RGB, audio, depth, IMUs and others. While the
 se data modalities provide effective means of control, they mostly focus o
 n editing the head movements such as the facial expressions, head pose and
 /or camera viewpoint. In this paper, we propose AvatarStudio, a text-based
  method for editing the appearance of a dynamic full head avatar. Our appr
 oach builds on existing work to capture dynamic performances of human head
 s using neural radiance field (NeRF) and edits this representation with a 
 text-to-image diffusion model. Specifically, we introduce an optimization 
 strategy for incorporating multiple keyframes representing different camer
 a viewpoints and time stamps of a video performance into a single diffusio
 n model. Using this personalized diffusion model, we edit the dynamic NeRF
  by introducing view-and-time-aware Score Distillation Sampling (VT-SDS) f
 ollowing a model-based guidance approach. Our method edits the full head i
 n a canonical space, and then propagates these edits to remaining time ste
 ps via a pretrained deformation network. We evaluate our method visually a
 nd numerically via a user study, and results show that our method outperfo
 rms existing approaches. Our experiments validate the design choices of ou
 r method and highlight that our edits are genuine, personalized, as well a
 s 3D- and time-consistent.\n\nRegistration Category: Full Access\n\nSessio
 n Chair: Lin Gao (University of Chinese Academy of Sciences)\n\n
URL:https://asia.siggraph.org/2023/full-program?id=papers_482&sess=sess124
END:VEVENT
END:VCALENDAR
