BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B5 (2)\, B Block\, Level 5 DTSTART;TZID=Asia/Tokyo:20241205T150800 DTEND;TZID=Asia/Tokyo:20241205T151900 UID:siggraphasia_SIGGRAPH Asia 2024_sess134_papers_485@linklings.com SUMMARY:Lumiere: A Space-Time Diffusion Model for Video Generation DESCRIPTION:Technical Papers\n\nOmer Bar-Tal (Google Research, Weizmann In stitute of Science); Hila Chefer (Google Research, Tel Aviv University); O mer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa H ur, Guanghui Liu, Amit Raj, Yuanzhen Li, and Michael Rubinstein (Google Re search); Tomer Michaeli (Google Research, Technion – Israel Institute of T echnology); Oliver Wang and Deqing Sun (Google Research); Tali Dekel (Goog le Research, Weizmann Institute of Science); and Inbar Mosseri (Google Res earch)\n\nWe introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent moti on -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contr ast to existing video models which synthesize distant keyframes followed b y temporal super-resolution -- an approach that inherently makes global te mporal consistency difficult to achieve. By deploying both spatial and (im portantly) temporal down- and up-sampling and leveraging a pre-trained tex t-to-image diffusion model, our model learns to directly generate a full-f rame-rate, low-resolution video by processing it in multiple space-time sc ales. We demonstrate state-of-the-art text-to-video generation results, an d show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inp ainting, and stylized generation.\n\nRegistration Category: Full Access, F ull Access Supporter\n\nLanguage Format: English Language\n\nSession Chair : Nanxuan Zhao (Adobe Research) URL:https://asia.siggraph.org/2024/program/?id=papers_485&sess=sess134 END:VEVENT END:VCALENDAR