BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B5 (2)\, B Block\, Level 5 DTSTART;TZID=Asia/Tokyo:20241205T170500 DTEND;TZID=Asia/Tokyo:20241205T171600 UID:siggraphasia_SIGGRAPH Asia 2024_sess137_papers_915@linklings.com SUMMARY:StyleCrafter: Taming Stylized Video Diffusion with Reference-Augme nted Adapter Learning DESCRIPTION:Technical Papers\n\nGongye Liu (Tsinghua University); Menghan Xia, Yong Zhang, and Haoxin Chen (Tencent AI lab); Jinbo Xing (Chinese Uni versity of Hong Kong); Yibo Wang (Tsinghua University); Xintao Wang and Yi ng Shan (Tencent); and Yujiu Yang (Tsinghua University)\n\nText-to-video ( T2V) models have shown remarkable capabilities in generating diverse video s. However, they struggle to produce user-desired artistic videos due to ( i) text's inherent clumsiness in expressing specific styles and (ii) the g enerally degraded style fidelity. To address these challenges, we introduc e StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, allowing video generation in any style by feedin g a reference image. Considering the scarcity of artistic video data, we p ropose to first train a style control adapter using style-rich image datas ets, then transfer the learned stylization ability to video generation thr ough a tailor-made finetuning paradigm. To promote content-style disentang lement, we employ carefully designed data augmentation strategies to enhan ce decoupled learning. Additionally, we propose a scale-adaptive fusion mo dule to balance the influences of text-based content features and image-ba sed style features, which helps generalization across various text and sty le combinations. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images. Experiments demonstrate that our approach is more fl exible and efficient than existing competitors.\n\nRegistration Category: Full Access, Full Access Supporter\n\nLanguage Format: English Language\n\ nSession Chair: Michael Rubinstein (Google) URL:https://asia.siggraph.org/2024/program/?id=papers_915&sess=sess137 END:VEVENT END:VCALENDAR