BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070241Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_504@linklings.com SUMMARY:Interactive Story Visualization with Multiple Characters DESCRIPTION:Technical Papers\n\nYuan Gong (Tsinghua University); Youxin Pa ng (MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences, Bei jing, China; School of Artificial Intelligence, University of Chinese Acad emy of Sciences); Xiaodong Cun and Menghan Xia (Tencent); Yingqing He (Hon g Kong University of Science and Technology); Haoxin Chen, Longyue Wang, Y ong Zhang, Xintao Wang, and Ying Shan (Tencent); and Yujiu Yang (Tsinghua University)\n\nAccurate Story visualization requires several necessary ele ments, such as identity consistency across frames, the alignment between p lain text and visual content, and a reasonable layout of objects in images . Most previous works endeavor to meet these requirements by fitting a tex t-to-image (T2I) model on a set of videos in the same style and with the s ame characters, e.g., the FlintstonesSV dataset. However, the learned T2I models typically struggle to adapt to new characters, scenes, and styles, and often lack the flexibility to revise the layout of the synthesized ima ges.\nThis paper proposes a system for generic interactive story visualiza tion, capable of handling multiple novel characters and supporting the edi ting of layout and local structure. It is developed by leveraging the prio r knowledge of large language and T2I models, trained on massive corpora. The system comprises four interconnected components: story-to-prompt gener ation (S2P), text-to-layout generation (T2L), controllable text-to-image g eneration (C-T2I), and image-to-video animation (I2V). First, the S2P modu le converts concise story information into detailed prompts required for s ubsequent stages. Next, T2L generates diverse and reasonable layouts based on the prompts, offering users the ability to adjust and refine the layou t to their preference. The core component, C-T2I, enables the creation of images guided by layouts, sketches, and actor-specific identifiers to main tain consistency and detail across visualizations. Finally, I2V enriches t he visualization process by animating the generated images.\nExtensive exp eriments and a user study are conducted to validate the effectiveness and flexibility of interactive editing of the proposed system.\n\nRegistration Category: Full Access, Enhanced Access, Trade Exhibitor, Experience Hall Exhibitor URL:https://asia.siggraph.org/2023/full-program?id=papers_504&sess=sess209 END:VEVENT END:VCALENDAR