BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070240Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_tog_105@linklings.com SUMMARY:CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing DESCRIPTION:Technical Papers\n\nAbdul Basit Anees and Ahmet Canberk Baykal (Koç University), Duygu Ceylan (Adobe Research), Erkut Erdem (Hacettepe U niversity), and Aykut Erdem and Deniz Yuret (Koç University)\n\nResearcher s have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural l anguage descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. Th e former is not very efficient, while the latter often struggles to effect ively handle multi-attribute changes. To address these weaknesses, we pres ent CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers i ntegrated into pretrained GAN-inversion networks. We demonstrate that by c onditioning the initial inversion step on the Contrastive Language-Image P re-training (CLIP) embedding of the target description, we are able to obt ain more successful edit directions. Additionally, we use a CLIP-guided re finement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outp erforms competing approaches in terms of manipulation accuracy and photo-r ealism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.\n\nRegistration Category: Fu ll Access, Enhanced Access, Trade Exhibitor, Experience Hall Exhibitor URL:https://asia.siggraph.org/2023/full-program?id=tog_105&sess=sess209 END:VEVENT END:VCALENDAR