BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070249Z
LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231215T110000
DTEND;TZID=Australia/Melbourne:20231215T111500
UID:siggraphasia_SIGGRAPH Asia 2023_sess135_tog_105@linklings.com
SUMMARY:CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing
DESCRIPTION:Technical Papers, TOG\n\nAbdul Basit Anees and Ahmet Canberk B
 aykal (Koį University), Duygu Ceylan (Adobe Research), Erkut Erdem (Hacett
 epe University), and Aykut Erdem and Deniz Yuret (Koį University)\n\nResea
 rchers have recently begun exploring the use of StyleGAN-based models for 
 real image editing. One particularly interesting application is using natu
 ral language descriptions to guide the editing process. Existing approache
 s for editing images using language either resort to instance-level latent
  code optimization or map predefined text prompts to some editing directio
 ns in the latent space. However, these approaches have inherent limitation
 s. The former is not very efficient, while the latter often struggles to e
 ffectively handle multi-attribute changes. To address these weaknesses, we
  present CLIPInverter, a new text-driven image editing approach that is ab
 le to efficiently and reliably perform multi-attribute changes. The core o
 f our method is the use of novel, lightweight text-conditioned adapter lay
 ers integrated into pretrained GAN-inversion networks. We demonstrate that
  by conditioning the initial inversion step on the Contrastive Language-Im
 age Pre-training (CLIP) embedding of the target description, we are able t
 o obtain more successful edit directions. Additionally, we use a CLIP-guid
 ed refinement step to make corrections in the resulting residual latent co
 des, which further improves the alignment with the text prompt. Our method
  outperforms competing approaches in terms of manipulation accuracy and ph
 oto-realism on various domains including human faces, cats, and birds, as 
 shown by our qualitative and quantitative results.\n\nRegistration Categor
 y: Full Access\n\nSession Chair: Chongyang Ma (ByteDance)
URL:https://asia.siggraph.org/2023/full-program?id=tog_105&sess=sess135
END:VEVENT
END:VCALENDAR