BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260114T163633Z
LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231212T093000
DTEND;TZID=Australia/Melbourne:20231212T124500
UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_230@linklings.com
SUMMARY:A Neural Space-Time Representation for Text-to-Image Personalizati
 on
DESCRIPTION:Yuval Alaluf, Elad Richardson, Gal Metzer, and Daniel Cohen-Or
  (Tel Aviv University)\n\nA key aspect of text-to-image personalization me
 thods is the manner in which the target concept is represented within the 
 generative process. This choice greatly affects the visual fidelity, downs
 tream editability, and disk space needed to store the learned concept. In 
 this paper, we explore a new text-conditioning space that is dependent on 
 both the denoising process timestep (time) and the denoising U-Net layers 
 (space) and showcase its compelling properties. A single concept in the sp
 ace-time representation is composed of hundreds of vectors, one for each c
 ombination of time and space, making this space challenging to optimize di
 rectly. Instead, we propose to implicitly represent a concept in this spac
 e by optimizing a small neural mapper that receives the current time and s
 pace parameters and outputs the matching token embedding. In doing so, the
  entire personalized concept is represented by the parameters of the learn
 ed mapper, resulting in a compact, yet expressive, representation. Similar
 ly to other personalization methods, the output of our neural mapper resid
 es in the input space of the text encoder. We observe that one can signifi
 cantly improve the convergence and visual fidelity of the concept by intro
 ducing a textual bypass, where our neural mapper additionally outputs a re
 sidual that is added to the output of the text encoder. Finally, we show h
 ow one can impose an importance-based ordering over our implicit represent
 ation, providing users control over the reconstruction and editability of 
 the learned concept using a single trained model. We demonstrate the effec
 tiveness of our approach over a range of concepts and prompts, showing our
  method's ability to generate high-quality and controllable compositions w
 ithout fine-tuning any parameters of the generative model itself.\n\nRegis
 tration Category: Full Access, Enhanced Access, Trade Exhibitor, Experienc
 e Hall Exhibitor\n\n
URL:https://asia.siggraph.org/2023/full-program?id=papers_230&sess=sess209
END:VEVENT
END:VCALENDAR