BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241204T114300 DTEND;TZID=Asia/Tokyo:20241204T115400 UID:siggraphasia_SIGGRAPH Asia 2024_sess114_papers_1085@linklings.com SUMMARY:Anim-Director: A Large Multimodal Model Powered Agent for Controll able Animation Video Generation DESCRIPTION:Technical Papers\n\nYunxin Li, Haoyuan Shi, and Baotian Hu (Ha rbin Institute of Technology); Longyue Wang (Alibaba Group); Jiashun Zhu a nd Jinyi Xu (Jilin University); Zhen Zhao (Tencent AILab); and Min Zhang ( Harbin Institute of Technology)\n\nTraditional animation generation method s depend on training generative models with human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial human effor t and incurs high training costs. Due to limited prompting plans, these me thods typically produce brief, information-poor, and context-incoherent an imations. To overcome these limitations and automate the animation process , we pioneer the introduction of large multimodal models (LMMs) as the cor e processor to build an autonomous animation-making agent, named Anim-Dire ctor. This agent mainly harnesses the advanced understanding and reasoning capabilities of LMMs and generative AI tools to create animated videos fr om concise narratives or simple instructions. Specifically, it operates in three main stages: Firstly, the Anim-Director generates a coherent storyl ine from user inputs, followed by a detailed director’s script that encomp asses settings of character profiles and interior/exterior descriptions, a nd context-coherent scene descriptions that include appearing characters, interiors or exteriors, and scene events. Secondly, we employ LMMs with th e image generation tool to produce visual images of settings and scenes. T hese images are designed to maintain visual consistency across different s cenes using a visual-language prompting method that combines scene descrip tions and images of the appearing character and setting. Thirdly, scene im ages serve as the foundation for producing animated videos, with LMMs gene rating prompts to guide this process. The whole process is notably autonom ous without manual intervention, as the LMMs interact seamlessly with gene rative tools to generate prompts, evaluate visual quality, and select the best one to optimize the final output. To assess the effectiveness of our framework, we collect varied short narratives and incorporate various Imag e/video evaluation metrics including visual consistency and video quality. The experimental results and case studies demonstrate the Anim-Director’s versatility and significant potential to streamline animation creation.\n \nRegistration Category: Full Access, Full Access Supporter\n\nLanguage Fo rmat: English Language\n\nSession Chair: Kai Wang (Amazon) URL:https://asia.siggraph.org/2024/program/?id=papers_1085&sess=sess114 END:VEVENT END:VCALENDAR