BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070247Z LOCATION:Meeting Room C4.9+C4.10\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231214T140000 DTEND;TZID=Australia/Melbourne:20231214T141500 UID:siggraphasia_SIGGRAPH Asia 2023_sess132_papers_230@linklings.com SUMMARY:A Neural Space-Time Representation for Text-to-Image Personalizati on DESCRIPTION:Technical Papers\n\nYuval Alaluf, Elad Richardson, Gal Metzer, and Daniel Cohen-Or (Tel Aviv University)\n\nA key aspect of text-to-imag e personalization methods is the manner in which the target concept is rep resented within the generative process. This choice greatly affects the vi sual fidelity, downstream editability, and disk space needed to store the learned concept. In this paper, we explore a new text-conditioning space t hat is dependent on both the denoising process timestep (time) and the den oising U-Net layers (space) and showcase its compelling properties. A sing le concept in the space-time representation is composed of hundreds of vec tors, one for each combination of time and space, making this space challe nging to optimize directly. Instead, we propose to implicitly represent a concept in this space by optimizing a small neural mapper that receives th e current time and space parameters and outputs the matching token embeddi ng. In doing so, the entire personalized concept is represented by the par ameters of the learned mapper, resulting in a compact, yet expressive, rep resentation. Similarly to other personalization methods, the output of our neural mapper resides in the input space of the text encoder. We observe that one can significantly improve the convergence and visual fidelity of the concept by introducing a textual bypass, where our neural mapper addit ionally outputs a residual that is added to the output of the text encoder . Finally, we show how one can impose an importance-based ordering over ou r implicit representation, providing users control over the reconstruction and editability of the learned concept using a single trained model. We d emonstrate the effectiveness of our approach over a range of concepts and prompts, showing our method's ability to generate high-quality and control lable compositions without fine-tuning any parameters of the generative mo del itself.\n\nRegistration Category: Full Access\n\nSession Chair: Jun-Ya n Zhu (Carnegie Mellon University) URL:https://asia.siggraph.org/2023/full-program?id=papers_230&sess=sess132 END:VEVENT END:VCALENDAR