BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (2)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241205T170500
DTEND;TZID=Asia/Tokyo:20241205T171600
UID:siggraphasia_SIGGRAPH Asia 2024_sess137_papers_915@linklings.com
SUMMARY:StyleCrafter: Taming Stylized Video Diffusion with Reference-Augme
 nted Adapter Learning
DESCRIPTION:Technical Papers\n\nGongye Liu (Tsinghua University); Menghan 
 Xia, Yong Zhang, and Haoxin Chen (Tencent AI lab); Jinbo Xing (Chinese Uni
 versity of Hong Kong); Yibo Wang (Tsinghua University); Xintao Wang and Yi
 ng Shan (Tencent); and Yujiu Yang (Tsinghua University)\n\nText-to-video (
 T2V) models have shown remarkable capabilities in generating diverse video
 s. However, they struggle to produce user-desired artistic videos due to (
 i) text's inherent clumsiness in expressing specific styles and (ii) the g
 enerally degraded style fidelity. To address these challenges, we introduc
 e StyleCrafter, a generic method that enhances pre-trained T2V models with
  a style control adapter, allowing video generation in any style by feedin
 g a reference image. Considering the scarcity of artistic video data, we p
 ropose to first train a style control adapter using style-rich image datas
 ets, then transfer the learned stylization ability to video generation thr
 ough a tailor-made finetuning paradigm. To promote content-style disentang
 lement, we employ carefully designed data augmentation strategies to enhan
 ce decoupled learning. Additionally, we propose a scale-adaptive fusion mo
 dule to balance the influences of text-based content features and image-ba
 sed style features, which helps generalization across various text and sty
 le combinations. StyleCrafter efficiently generates high-quality stylized 
 videos that align with the content of the texts and resemble the style of 
 the reference images. Experiments demonstrate that our approach is more fl
 exible and efficient than existing competitors.\n\nRegistration Category: 
 Full Access, Full Access Supporter\n\nLanguage Format: English Language\n\
 nSession Chair: Michael Rubinstein (Google)
URL:https://asia.siggraph.org/2024/program/?id=papers_915&sess=sess137
END:VEVENT
END:VCALENDAR
