BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260114T163645Z
LOCATION:Meeting Room C4.8\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231215T120300
DTEND;TZID=Australia/Melbourne:20231215T121300
UID:siggraphasia_SIGGRAPH Asia 2023_sess156_papers_504@linklings.com
SUMMARY:Interactive Story Visualization with Multiple Characters
DESCRIPTION:Yuan Gong (Tsinghua University); Youxin Pang (MAIS & NLPR, Ins
 titute of Automation, Chinese Academy of Sciences, Beijing, China; School 
 of Artificial Intelligence, University of Chinese Academy of Sciences); Xi
 aodong Cun and Menghan Xia (Tencent); Yingqing He (Hong Kong University of
  Science and Technology); Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wa
 ng, and Ying Shan (Tencent); and Yujiu Yang (Tsinghua University)\n\nAccur
 ate Story visualization requires several necessary elements, such as ident
 ity consistency across frames, the alignment between plain text and visual
  content, and a reasonable layout of objects in images. Most previous work
 s endeavor to meet these requirements by fitting a text-to-image (T2I) mod
 el on a set of videos in the same style and with the same characters, e.g.
 , the FlintstonesSV dataset. However, the learned T2I models typically str
 uggle to adapt to new characters, scenes, and styles, and often lack the f
 lexibility to revise the layout of the synthesized images.\nThis paper pro
 poses a system for generic interactive story visualization, capable of han
 dling multiple novel characters and supporting the editing of layout and l
 ocal structure. It is developed by leveraging the prior knowledge of large
  language and T2I models, trained on massive corpora. The system comprises
  four interconnected components: story-to-prompt generation (S2P), text-to
 -layout generation (T2L), controllable text-to-image generation (C-T2I), a
 nd image-to-video animation (I2V). First, the S2P module converts concise 
 story information into detailed prompts required for subsequent stages. Ne
 xt, T2L generates diverse and reasonable layouts based on the prompts, off
 ering users the ability to adjust and refine the layout to their preferenc
 e. The core component, C-T2I, enables the creation of images guided by lay
 outs, sketches, and actor-specific identifiers to maintain consistency and
  detail across visualizations. Finally, I2V enriches the visualization pro
 cess by animating the generated images.\nExtensive experiments and a user 
 study are conducted to validate the effectiveness and flexibility of inter
 active editing of the proposed system.\n\nRegistration Category: Full Acce
 ss\n\nSession Chair: Sergi Pujades (National Institute for Research in Com
 puter Science and Automation (INRIA), Université Grenoble Alpes)\n\n
URL:https://asia.siggraph.org/2023/full-program?id=papers_504&sess=sess156
END:VEVENT
END:VCALENDAR
