BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070241Z
LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231212T093000
DTEND;TZID=Australia/Melbourne:20231212T124500
UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_504@linklings.com
SUMMARY:Interactive Story Visualization with Multiple Characters
DESCRIPTION:Technical Papers\n\nYuan Gong (Tsinghua University); Youxin Pa
 ng (MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences, Bei
 jing, China; School of Artificial Intelligence, University of Chinese Acad
 emy of Sciences); Xiaodong Cun and Menghan Xia (Tencent); Yingqing He (Hon
 g Kong University of Science and Technology); Haoxin Chen, Longyue Wang, Y
 ong Zhang, Xintao Wang, and Ying Shan (Tencent); and Yujiu Yang (Tsinghua 
 University)\n\nAccurate Story visualization requires several necessary ele
 ments, such as identity consistency across frames, the alignment between p
 lain text and visual content, and a reasonable layout of objects in images
 . Most previous works endeavor to meet these requirements by fitting a tex
 t-to-image (T2I) model on a set of videos in the same style and with the s
 ame characters, e.g., the FlintstonesSV dataset. However, the learned T2I 
 models typically struggle to adapt to new characters, scenes, and styles, 
 and often lack the flexibility to revise the layout of the synthesized ima
 ges.\nThis paper proposes a system for generic interactive story visualiza
 tion, capable of handling multiple novel characters and supporting the edi
 ting of layout and local structure. It is developed by leveraging the prio
 r knowledge of large language and T2I models, trained on massive corpora. 
 The system comprises four interconnected components: story-to-prompt gener
 ation (S2P), text-to-layout generation (T2L), controllable text-to-image g
 eneration (C-T2I), and image-to-video animation (I2V). First, the S2P modu
 le converts concise story information into detailed prompts required for s
 ubsequent stages. Next, T2L generates diverse and reasonable layouts based
  on the prompts, offering users the ability to adjust and refine the layou
 t to their preference. The core component, C-T2I, enables the creation of 
 images guided by layouts, sketches, and actor-specific identifiers to main
 tain consistency and detail across visualizations. Finally, I2V enriches t
 he visualization process by animating the generated images.\nExtensive exp
 eriments and a user study are conducted to validate the effectiveness and 
 flexibility of interactive editing of the proposed system.\n\nRegistration
  Category: Full Access, Enhanced Access, Trade Exhibitor, Experience Hall 
 Exhibitor
URL:https://asia.siggraph.org/2023/full-program?id=papers_504&sess=sess209
END:VEVENT
END:VCALENDAR