BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (2)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241204T130000
DTEND;TZID=Asia/Tokyo:20241204T131100
UID:siggraphasia_SIGGRAPH Asia 2024_sess116_papers_308@linklings.com
SUMMARY:DiffUHaul: A Training-Free Method for Object Dragging in Images
DESCRIPTION:Technical Papers\n\nOmri Avrahami (Hebrew University of Jerusa
 lem), Rinon Gal (Tel Aviv University), Gal Chechik (NVIDIA), Ohad Fried (T
 he Interdisciplinary Center Herzliya), Dani Lischinski (Hebrew University 
 of Jerusalem), and Arash Vahdat and Weili Nie (NVIDIA)\n\nText-to-image di
 ffusion models have proven effective for solving many image editing tasks.
 \n    However, the seemingly straightforward task of seamlessly relocating
  objects within a scene remains surprisingly challenging. Existing methods
  addressing this problem often struggle to function reliably in real-world
  scenarios due to lacking spatial reasoning. \n    In this work, we propos
 e a training-free method, dubbed \emph{DiffUHaul}, that harnesses the spat
 ial understanding of a \emph{localized} text-to-image model, for the objec
 t dragging task.\n    Blindly manipulating layout inputs of the localized 
 model tends to cause low editing performance due to the intrinsic entangle
 ment of object representation in the model. To this end, we first apply at
 tention masking in each denoising step to make the generation more disenta
 ngled across different objects and adopt the self-attention sharing mechan
 ism to preserve the high-level object appearance. Furthermore, we propose 
 a new diffusion anchoring technique: in the early denoising steps, we inte
 rpolate the attention features between source and target images to smoothl
 y fuse new layouts with the original appearance; in the later denoising st
 eps, we pass the localized features from the source images to the interpol
 ated images to retain fine-grained object details. To adapt DiffUHaul to r
 eal-image editing, we apply a DDPM self-attention bucketing that can bette
 r reconstruct real images with the localized model.\n    Finally, we intro
 duce an automated evaluation pipeline for this task and  showcase the effi
 cacy of our method. Our results are reinforced through a user preference s
 tudy.\n\nRegistration Category: Full Access, Full Access Supporter\n\nLang
 uage Format: English Language\n\nSession Chair: Dani Lischinski (Hebrew Un
 iversity of Jerusalem, Google)
URL:https://asia.siggraph.org/2024/program/?id=papers_308&sess=sess116
END:VEVENT
END:VCALENDAR
