BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (2)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241204T104500
DTEND;TZID=Asia/Tokyo:20241204T105600
UID:siggraphasia_SIGGRAPH Asia 2024_sess113_papers_683@linklings.com
SUMMARY:Quark: Real-time, High-resolution, and General Neural View Synthes
 is
DESCRIPTION:Technical Papers\n\nJohn Flynn, Michael Broxton, Lukas Murmann
 , Lucy Chai, Matthew DuVall, Clément Godard, Kathryn Heal, Srinivas Kaza, 
 Stephen Lombardi, Xuan Luo, Supreeth Achar, Kira Prabhu, Tiancheng Sun, Ly
 nn Tsai, and Ryan Overbeck (Google)\n\nWe present a novel neural algorithm
  for performing high-quality, high-resolution, real-time novel view synthe
 sis. From a sparse set of input RGB images or videos streams, our network 
 both reconstructs the 3D scene and renders novel views at 1080p resolution
  at 30fps on an NVIDIA A100. Our feed-forward network generalizes across a
  wide variety of datasets and scenes and produces state-of-the-art quality
  for a real-time method. Our quality approaches, and in some cases surpass
 es, the quality of some of the top offline methods. In order to achieve th
 ese results we use a novel combination of several key concepts, and tie th
 em together into a cohesive and effective algorithm. We build on previous 
 works that represent the scene using semi-transparent layers and use an it
 erative learned render-and-refine approach to improve those layers. Instea
 d of flat layers, our method reconstructs layered depth maps (LDMs) that e
 fficiently represent scenes with complex depth and occlusions. The iterati
 ve update steps are embedded in a multi-scale, UNet-style architecture to 
 perform as much compute as possible at reduced resolution. Within each upd
 ate step, to better aggregate the information from multiple input views, w
 e use a specialized Transformer-based network component. This allows the m
 ajority of the per-input image processing to be performed in the input ima
 ge space, as opposed to layer space, further increasing efficiency. Finall
 y, due to the real-time nature of our reconstruction and rendering, we dyn
 amically create and discard the internal 3D geometry for each frame, gener
 ating the LDM for each view. Taken together, this produces a novel and eff
 ective algorithm for view synthesis. Through extensive evaluation, we demo
 nstrate that we achieve state-of-the-art quality at real-time rates.\n\nRe
 gistration Category: Full Access, Full Access Supporter\n\nLanguage Format
 : English Language\n\nSession Chair: Forrester Cole (Google)
URL:https://asia.siggraph.org/2024/program/?id=papers_683&sess=sess113
END:VEVENT
END:VCALENDAR
