BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (2)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241205T144500
DTEND;TZID=Asia/Tokyo:20241205T145600
UID:siggraphasia_SIGGRAPH Asia 2024_sess134_papers_608@linklings.com
SUMMARY:Still-Moving: Customized Video Generation without Customized Video
  Data
DESCRIPTION:Technical Papers\n\nHila Chefer (Google Research, Tel Aviv Uni
 versity); Shiran Zada, Roni Paiss, Ariel Ephrat, Omer Tov, and Michael Rub
 instein (Google Research); Lior Wolf (Tel Aviv University); Tali Dekel (Go
 ogle Research, Weizmann Institute of Science); Tomer Michaeli (Google Rese
 arch, Technion – Israel Institute of Technology); and Inbar Mosseri (Googl
 e Research)\n\nCustomizing text-to-image (T2I) models has seen tremendous 
 progress recently, particularly in areas such as personalization, stylizat
 ion, and conditional generation. However, expanding this progress to video
  generation is still in its infancy, primarily due to the lack of customiz
 ed video data. \nIn this work, we introduce Still-Moving, a novel generic 
 framework for customizing a text-to-video (T2V) model, without requiring a
 ny customized video data. The framework applies to the prominent T2V desig
 n where the video model is built over a text-to-image (T2I) model (e.g., v
 ia inflation). We assume access to a customized version of the T2I model, 
 trained only on still image data (e.g., using DreamBooth or StyleDrop).\nN
 aively plugging in the weights of the customized T2I model into the T2V mo
 del often leads to significant artifacts or insufficient adherence to the 
 customization data. \nTo overcome this issue, we train lightweight Spatial
  Adapters that adjust the features produced by the injected T2I layers.\nI
 mportantly, our adapters are trained on "frozen videos" (i.e., repeated im
 ages), constructed from image samples generated by the customized T2I mode
 l. This training is facilitated by a novel Motion Adapter module, which al
 lows us to train on such static videos while preserving the motion prior o
 f the video model. At test time, we remove the Motion Adapter modules and 
 leave in only the trained Spatial Adapters. This restores the motion prior
  of the T2V model while adhering to the spatial prior of the customized T2
 I model.\nWe demonstrate the effectiveness of our approach on diverse task
 s including personalized, stylized, and conditional generation. In all eva
 luated scenarios, our method seamlessly integrates the spatial prior of th
 e customized T2I model with a motion prior supplied by the T2V model.\n\nR
 egistration Category: Full Access, Full Access Supporter\n\nLanguage Forma
 t: English Language\n\nSession Chair: Nanxuan Zhao (Adobe Research)
URL:https://asia.siggraph.org/2024/program/?id=papers_608&sess=sess134
END:VEVENT
END:VCALENDAR
