BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070250Z
LOCATION:Meeting Room C4.8\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231215T120300
DTEND;TZID=Australia/Melbourne:20231215T121300
UID:siggraphasia_SIGGRAPH Asia 2023_sess156_papers_504@linklings.com
SUMMARY:Interactive Story Visualization with Multiple Characters
DESCRIPTION:Technical Communications, Technical Papers\n\nYuan Gong (Tsing
 hua University); Youxin Pang (MAIS & NLPR, Institute of Automation, Chines
 e Academy of Sciences, Beijing, China; School of Artificial Intelligence, 
 University of Chinese Academy of Sciences); Xiaodong Cun and Menghan Xia (
 Tencent); Yingqing He (Hong Kong University of Science and Technology); Ha
 oxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, and Ying Shan (Tencent);
  and Yujiu Yang (Tsinghua University)\n\nAccurate Story visualization requ
 ires several necessary elements, such as identity consistency across frame
 s, the alignment between plain text and visual content, and a reasonable l
 ayout of objects in images. Most previous works endeavor to meet these req
 uirements by fitting a text-to-image (T2I) model on a set of videos in the
  same style and with the same characters, e.g., the FlintstonesSV dataset.
  However, the learned T2I models typically struggle to adapt to new charac
 ters, scenes, and styles, and often lack the flexibility to revise the lay
 out of the synthesized images.\nThis paper proposes a system for generic i
 nteractive story visualization, capable of handling multiple novel charact
 ers and supporting the editing of layout and local structure. It is develo
 ped by leveraging the prior knowledge of large language and T2I models, tr
 ained on massive corpora. The system comprises four interconnected compone
 nts: story-to-prompt generation (S2P), text-to-layout generation (T2L), co
 ntrollable text-to-image generation (C-T2I), and image-to-video animation 
 (I2V). First, the S2P module converts concise story information into detai
 led prompts required for subsequent stages. Next, T2L generates diverse an
 d reasonable layouts based on the prompts, offering users the ability to a
 djust and refine the layout to their preference. The core component, C-T2I
 , enables the creation of images guided by layouts, sketches, and actor-sp
 ecific identifiers to maintain consistency and detail across visualization
 s. Finally, I2V enriches the visualization process by animating the genera
 ted images.\nExtensive experiments and a user study are conducted to valid
 ate the effectiveness and flexibility of interactive editing of the propos
 ed system.\n\nRegistration Category: Full Access\n\nSession Chair: Sergi P
 ujades (National Institute for Research in Computer Science and Automation
  (INRIA), Université Grenoble Alpes)
URL:https://asia.siggraph.org/2023/full-program?id=papers_504&sess=sess156
END:VEVENT
END:VCALENDAR