BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20260114T163643Z LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231215T110000 DTEND;TZID=Australia/Melbourne:20231215T111500 UID:siggraphasia_SIGGRAPH Asia 2023_sess135_tog_105@linklings.com SUMMARY:CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing DESCRIPTION:Abdul Basit Anees and Ahmet Canberk Baykal (Koç University), D uygu Ceylan (Adobe Research), Erkut Erdem (Hacettepe University), and Ayku t Erdem and Deniz Yuret (Koç University)\n\nResearchers have recently begu n exploring the use of StyleGAN-based models for real image editing. One p articularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images usin g language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. H owever, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-a ttribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and rel iably perform multi-attribute changes. The core of our method is the use o f novel, lightweight text-conditioned adapter layers integrated into pretr ained GAN-inversion networks. We demonstrate that by conditioning the init ial inversion step on the Contrastive Language-Image Pre-training (CLIP) e mbedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to mak e corrections in the resulting residual latent codes, which further improv es the alignment with the text prompt. Our method outperforms competing ap proaches in terms of manipulation accuracy and photo-realism on various do mains including human faces, cats, and birds, as shown by our qualitative and quantitative results.\n\nRegistration Category: Full Access\n\nSession Chair: Chongyang Ma (ByteDance)\n\n URL:https://asia.siggraph.org/2023/full-program?id=tog_105&sess=sess135 END:VEVENT END:VCALENDAR