BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070247Z
LOCATION:Meeting Room C4.9+C4.10\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231214T140000
DTEND;TZID=Australia/Melbourne:20231214T141500
UID:siggraphasia_SIGGRAPH Asia 2023_sess132_papers_230@linklings.com
SUMMARY:A Neural Space-Time Representation for Text-to-Image Personalizati
 on
DESCRIPTION:Technical Papers\n\nYuval Alaluf, Elad Richardson, Gal Metzer,
  and Daniel Cohen-Or (Tel Aviv University)\n\nA key aspect of text-to-imag
 e personalization methods is the manner in which the target concept is rep
 resented within the generative process. This choice greatly affects the vi
 sual fidelity, downstream editability, and disk space needed to store the 
 learned concept. In this paper, we explore a new text-conditioning space t
 hat is dependent on both the denoising process timestep (time) and the den
 oising U-Net layers (space) and showcase its compelling properties. A sing
 le concept in the space-time representation is composed of hundreds of vec
 tors, one for each combination of time and space, making this space challe
 nging to optimize directly. Instead, we propose to implicitly represent a 
 concept in this space by optimizing a small neural mapper that receives th
 e current time and space parameters and outputs the matching token embeddi
 ng. In doing so, the entire personalized concept is represented by the par
 ameters of the learned mapper, resulting in a compact, yet expressive, rep
 resentation. Similarly to other personalization methods, the output of our
  neural mapper resides in the input space of the text encoder. We observe 
 that one can significantly improve the convergence and visual fidelity of 
 the concept by introducing a textual bypass, where our neural mapper addit
 ionally outputs a residual that is added to the output of the text encoder
 . Finally, we show how one can impose an importance-based ordering over ou
 r implicit representation, providing users control over the reconstruction
  and editability of the learned concept using a single trained model. We d
 emonstrate the effectiveness of our approach over a range of concepts and 
 prompts, showing our method's ability to generate high-quality and control
 lable compositions without fine-tuning any parameters of the generative mo
 del itself.\n\nRegistration Category: Full Access\n\nSession Chair: Jun-Ya
 n Zhu (Carnegie Mellon University)
URL:https://asia.siggraph.org/2023/full-program?id=papers_230&sess=sess132
END:VEVENT
END:VCALENDAR