BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20260114T163633Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_230@linklings.com SUMMARY:A Neural Space-Time Representation for Text-to-Image Personalizati on DESCRIPTION:Yuval Alaluf, Elad Richardson, Gal Metzer, and Daniel Cohen-Or (Tel Aviv University)\n\nA key aspect of text-to-image personalization me thods is the manner in which the target concept is represented within the generative process. This choice greatly affects the visual fidelity, downs tream editability, and disk space needed to store the learned concept. In this paper, we explore a new text-conditioning space that is dependent on both the denoising process timestep (time) and the denoising U-Net layers (space) and showcase its compelling properties. A single concept in the sp ace-time representation is composed of hundreds of vectors, one for each c ombination of time and space, making this space challenging to optimize di rectly. Instead, we propose to implicitly represent a concept in this spac e by optimizing a small neural mapper that receives the current time and s pace parameters and outputs the matching token embedding. In doing so, the entire personalized concept is represented by the parameters of the learn ed mapper, resulting in a compact, yet expressive, representation. Similar ly to other personalization methods, the output of our neural mapper resid es in the input space of the text encoder. We observe that one can signifi cantly improve the convergence and visual fidelity of the concept by intro ducing a textual bypass, where our neural mapper additionally outputs a re sidual that is added to the output of the text encoder. Finally, we show h ow one can impose an importance-based ordering over our implicit represent ation, providing users control over the reconstruction and editability of the learned concept using a single trained model. We demonstrate the effec tiveness of our approach over a range of concepts and prompts, showing our method's ability to generate high-quality and controllable compositions w ithout fine-tuning any parameters of the generative model itself.\n\nRegis tration Category: Full Access, Enhanced Access, Trade Exhibitor, Experienc e Hall Exhibitor\n\n URL:https://asia.siggraph.org/2023/full-program?id=papers_230&sess=sess209 END:VEVENT END:VCALENDAR