BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B7 (1)\, B Block\, Level 7
DTSTART;TZID=Asia/Tokyo:20241204T114300
DTEND;TZID=Asia/Tokyo:20241204T115400
UID:siggraphasia_SIGGRAPH Asia 2024_sess114_papers_1085@linklings.com
SUMMARY:Anim-Director: A Large Multimodal Model Powered Agent for Controll
 able Animation Video Generation
DESCRIPTION:Technical Papers\n\nYunxin Li, Haoyuan Shi, and Baotian Hu (Ha
 rbin Institute of Technology); Longyue Wang (Alibaba Group); Jiashun Zhu a
 nd Jinyi Xu (Jilin University); Zhen Zhao (Tencent AILab); and Min Zhang (
 Harbin Institute of Technology)\n\nTraditional animation generation method
 s depend on training generative models with human-labelled data, entailing
  a sophisticated multi-stage pipeline that demands substantial human effor
 t and incurs high training costs. Due to limited prompting plans, these me
 thods typically produce brief, information-poor, and context-incoherent an
 imations. To overcome these limitations and automate the animation process
 , we pioneer the introduction of large multimodal models (LMMs) as the cor
 e processor to build an autonomous animation-making agent, named Anim-Dire
 ctor. This agent mainly harnesses the advanced understanding and reasoning
  capabilities of LMMs and generative AI tools to create animated videos fr
 om concise narratives or simple instructions. Specifically, it operates in
  three main stages: Firstly, the Anim-Director generates a coherent storyl
 ine from user inputs, followed by a detailed director’s script that encomp
 asses settings of character profiles and interior/exterior descriptions, a
 nd context-coherent scene descriptions that include appearing characters, 
 interiors or exteriors, and scene events. Secondly, we employ LMMs with th
 e image generation tool to produce visual images of settings and scenes. T
 hese images are designed to maintain visual consistency across different s
 cenes using a visual-language prompting method that combines scene descrip
 tions and images of the appearing character and setting. Thirdly, scene im
 ages serve as the foundation for producing animated videos, with LMMs gene
 rating prompts to guide this process. The whole process is notably autonom
 ous without manual intervention, as the LMMs interact seamlessly with gene
 rative tools to generate prompts, evaluate visual quality, and select the 
 best one to optimize the final output. To assess the effectiveness of our 
 framework, we collect varied short narratives and incorporate various Imag
 e/video evaluation metrics including visual consistency and video quality.
  The experimental results and case studies demonstrate the Anim-Director’s
  versatility and significant potential to streamline animation creation.\n
 \nRegistration Category: Full Access, Full Access Supporter\n\nLanguage Fo
 rmat: English Language\n\nSession Chair: Kai Wang (Amazon)
URL:https://asia.siggraph.org/2024/program/?id=papers_1085&sess=sess114
END:VEVENT
END:VCALENDAR