BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070249Z LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231215T110000 DTEND;TZID=Australia/Melbourne:20231215T111500 UID:siggraphasia_SIGGRAPH Asia 2023_sess135_tog_105@linklings.com SUMMARY:CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing DESCRIPTION:Technical Papers, TOG\n\nAbdul Basit Anees and Ahmet Canberk B aykal (Koç University), Duygu Ceylan (Adobe Research), Erkut Erdem (Hacett epe University), and Aykut Erdem and Deniz Yuret (Koç University)\n\nResea rchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natu ral language descriptions to guide the editing process. Existing approache s for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directio ns in the latent space. However, these approaches have inherent limitation s. The former is not very efficient, while the latter often struggles to e ffectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is ab le to efficiently and reliably perform multi-attribute changes. The core o f our method is the use of novel, lightweight text-conditioned adapter lay ers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the Contrastive Language-Im age Pre-training (CLIP) embedding of the target description, we are able t o obtain more successful edit directions. Additionally, we use a CLIP-guid ed refinement step to make corrections in the resulting residual latent co des, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and ph oto-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.\n\nRegistration Categor y: Full Access\n\nSession Chair: Chongyang Ma (ByteDance) URL:https://asia.siggraph.org/2023/full-program?id=tog_105&sess=sess135 END:VEVENT END:VCALENDAR