BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070240Z
LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231212T093000
DTEND;TZID=Australia/Melbourne:20231212T124500
UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_304@linklings.com
SUMMARY:Break-A-Scene: Extracting Multiple Concepts from a Single Image
DESCRIPTION:Technical Papers\n\nOmri Avrahami (The Hebrew University of Je
 rusalem), Kfir Aberman (Google Research), Ohad Fried (Reichman University)
 , Daniel Cohen-Or (Tel Aviv University), and Dani Lischinski (The Hebrew U
 niversity of Jerusalem)\n\nText-to-image model personalization aims to int
 roduce a user-provided concept to the model, allowing its synthesis in div
 erse contexts. However, current methods primarily focus on the case of lea
 rning a single concept from multiple images with variations in backgrounds
  and poses, and struggle when adapted to a different scenario. In this wor
 k, we introduce the task of textual scene decomposition: given a single im
 age of a scene that may contain several concepts, we aim to extract a dist
 inct text token for each concept, enabling fine-grained control over the g
 enerated scenes. To this end, we propose augmenting the input image with m
 asks that indicate the presence of target concepts. These masks can be pro
 vided by the user or generated automatically by a pre-trained segmentation
  model. We then present a novel two-phase customization process that optim
 izes a set of dedicated textual embeddings (handles), as well as the model
  weights, striking a delicate balance between accurately capturing the con
 cepts and avoiding overfitting. We employ a masked diffusion loss to enabl
 e handles to generate their assigned concepts, complemented by a novel los
 s on cross-attention maps to prevent entanglement. We also introduce union
 -sampling, a training strategy aimed to improve the ability of combining m
 ultiple concepts in generated images. We use several automatic metrics to 
 quantitatively compare our method against several baselines, and further a
 ffirm the results using a user study. Finally, we showcase several applica
 tions of our method.\n\nRegistration Category: Full Access, Enhanced Acces
 s, Trade Exhibitor, Experience Hall Exhibitor
URL:https://asia.siggraph.org/2023/full-program?id=papers_304&sess=sess209
END:VEVENT
END:VCALENDAR