BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B5 (2)\, B Block\, Level 5 DTSTART;TZID=Asia/Tokyo:20241204T104500 DTEND;TZID=Asia/Tokyo:20241204T105600 UID:siggraphasia_SIGGRAPH Asia 2024_sess113_papers_683@linklings.com SUMMARY:Quark: Real-time, High-resolution, and General Neural View Synthes is DESCRIPTION:Technical Papers\n\nJohn Flynn, Michael Broxton, Lukas Murmann , Lucy Chai, Matthew DuVall, Clément Godard, Kathryn Heal, Srinivas Kaza, Stephen Lombardi, Xuan Luo, Supreeth Achar, Kira Prabhu, Tiancheng Sun, Ly nn Tsai, and Ryan Overbeck (Google)\n\nWe present a novel neural algorithm for performing high-quality, high-resolution, real-time novel view synthe sis. From a sparse set of input RGB images or videos streams, our network both reconstructs the 3D scene and renders novel views at 1080p resolution at 30fps on an NVIDIA A100. Our feed-forward network generalizes across a wide variety of datasets and scenes and produces state-of-the-art quality for a real-time method. Our quality approaches, and in some cases surpass es, the quality of some of the top offline methods. In order to achieve th ese results we use a novel combination of several key concepts, and tie th em together into a cohesive and effective algorithm. We build on previous works that represent the scene using semi-transparent layers and use an it erative learned render-and-refine approach to improve those layers. Instea d of flat layers, our method reconstructs layered depth maps (LDMs) that e fficiently represent scenes with complex depth and occlusions. The iterati ve update steps are embedded in a multi-scale, UNet-style architecture to perform as much compute as possible at reduced resolution. Within each upd ate step, to better aggregate the information from multiple input views, w e use a specialized Transformer-based network component. This allows the m ajority of the per-input image processing to be performed in the input ima ge space, as opposed to layer space, further increasing efficiency. Finall y, due to the real-time nature of our reconstruction and rendering, we dyn amically create and discard the internal 3D geometry for each frame, gener ating the LDM for each view. Taken together, this produces a novel and eff ective algorithm for view synthesis. Through extensive evaluation, we demo nstrate that we achieve state-of-the-art quality at real-time rates.\n\nRe gistration Category: Full Access, Full Access Supporter\n\nLanguage Format : English Language\n\nSession Chair: Forrester Cole (Google) URL:https://asia.siggraph.org/2024/program/?id=papers_683&sess=sess113 END:VEVENT END:VCALENDAR