BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (2)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241204T132300
DTEND;TZID=Asia/Tokyo:20241204T133400
UID:siggraphasia_SIGGRAPH Asia 2024_sess116_papers_479@linklings.com
SUMMARY:Consolidating Attention Features for Multi-view Image Editing
DESCRIPTION:Technical Papers\n\nOr Patashnik (Tel Aviv University); Rinon 
 Gal (Tel Aviv University, NVIDIA Research); Daniel Cohen-Or (Tel Aviv Univ
 ersity); and Jun-Yan Zhu and Fernando De La Torre (Carnegie Mellon Univers
 ity)\n\nLarge-scale text-to-image models enable a wide range of image edit
 ing techniques, using text prompts or even spatial controls. However, appl
 ying these editing methods to multi-view images depicting a single scene l
 eads to 3D-inconsistent results. In this work, we focus on spatial control
 -based geometric manipulations and introduce a method to consolidate the e
 diting process across various views. We build on two insights: (1) maintai
 ning consistent features throughout the generative process helps attain co
 nsistency in multi-view editing, and (2) the queries in self-attention lay
 ers significantly influence the image structure. Hence, we propose to impr
 ove the geometric consistency of the edited images by enforcing the consis
 tency of the queries. To do so, we introduce QNeRF, a neural radiance fiel
 d trained on the internal query features of the edited images. Once traine
 d, QNeRF can render 3D-consistent queries, which are then softly injected 
 back into the self-attention layers during generation, greatly improving m
 ulti-view consistency. We refine the process through a progressive, iterat
 ive method that better consolidates queries across the diffusion timesteps
 . We compare our method to a range of existing techniques and demonstrate 
 that it can achieve better multi-view consistency and higher fidelity to t
 he input scene. These advantages allow us to train NeRFs with fewer visual
  artifacts, that are better aligned with the target geometry.\n\nRegistrat
 ion Category: Full Access, Full Access Supporter\n\nLanguage Format: Engli
 sh Language\n\nSession Chair: Dani Lischinski (Hebrew University of Jerusa
 lem, Google)
URL:https://asia.siggraph.org/2024/program/?id=papers_479&sess=sess116
END:VEVENT
END:VCALENDAR
