BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260114T163643Z
LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231215T110000
DTEND;TZID=Australia/Melbourne:20231215T111500
UID:siggraphasia_SIGGRAPH Asia 2023_sess135_tog_105@linklings.com
SUMMARY:CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing
DESCRIPTION:Abdul Basit Anees and Ahmet Canberk Baykal (Koç University), D
 uygu Ceylan (Adobe Research), Erkut Erdem (Hacettepe University), and Ayku
 t Erdem and Deniz Yuret (Koç University)\n\nResearchers have recently begu
 n exploring the use of StyleGAN-based models for real image editing. One p
 articularly interesting application is using natural language descriptions
  to guide the editing process. Existing approaches for editing images usin
 g language either resort to instance-level latent code optimization or map
  predefined text prompts to some editing directions in the latent space. H
 owever, these approaches have inherent limitations. The former is not very
  efficient, while the latter often struggles to effectively handle multi-a
 ttribute changes. To address these weaknesses, we present CLIPInverter, a 
 new text-driven image editing approach that is able to efficiently and rel
 iably perform multi-attribute changes. The core of our method is the use o
 f novel, lightweight text-conditioned adapter layers integrated into pretr
 ained GAN-inversion networks. We demonstrate that by conditioning the init
 ial inversion step on the Contrastive Language-Image Pre-training (CLIP) e
 mbedding of the target description, we are able to obtain more successful 
 edit directions. Additionally, we use a CLIP-guided refinement step to mak
 e corrections in the resulting residual latent codes, which further improv
 es the alignment with the text prompt. Our method outperforms competing ap
 proaches in terms of manipulation accuracy and photo-realism on various do
 mains including human faces, cats, and birds, as shown by our qualitative 
 and quantitative results.\n\nRegistration Category: Full Access\n\nSession
  Chair: Chongyang Ma (ByteDance)\n\n
URL:https://asia.siggraph.org/2023/full-program?id=tog_105&sess=sess135
END:VEVENT
END:VCALENDAR