BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (2)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241205T154300
DTEND;TZID=Asia/Tokyo:20241205T155400
UID:siggraphasia_SIGGRAPH Asia 2024_sess134_papers_785@linklings.com
SUMMARY:TrailBlazer: Trajectory Control for Diffusion-Based Video Generati
 on
DESCRIPTION:Technical Papers\n\nWan-Duo Kurt Ma (Victoria University of We
 llington), J. P. Lewis (NVIDIA Research), and W. Bastiaan Kleijn (Victoria
  University of Wellington)\n\nLarge text-to-video (T2V) models such as Sor
 a have the potential to revolutionize visual effects and the creation of s
 ome types of movies. Current T2V models require tedious trial-and-error ex
 perimentation to achieve desired results, however. This motivates the sear
 ch for methods to directly control desired attributes. In this work, we ta
 ke a step toward this goal, introducing a method for high-level, temporall
 y-coherent control over the basic trajectories and appearance of objects. 
 Our algorithm, TrailBlazer, allows the general positions and (optionally) 
 appearance of objects to be controlled simply by keyframing approximate bo
 unding boxes and (optionally) their corresponding prompts. Importantly, ou
 r method does not require a pre-existing control video signal that already
  contains an accurate outline of the desired motion, yet the synthesized m
 otion is surprisingly natural with emergent effects including perspective 
 and movement toward the virtual camera as the box size increases. The meth
 od is efficient, making use of a pre-trained T2V model and requiring no tr
 aining or fine-tuning, with negligible additional computation. Specificall
 y, the bounding box controls are used as soft masks to guide manipulation 
 of the self-attention and cross-attention modules in the video diffusion m
 odel. While our visual results are limited by those of the underlying mode
 l, the algorithm may generalize to future models that use standard self- a
 nd cross-attention components.\n\nRegistration Category: Full Access, Full
  Access Supporter\n\nLanguage Format: English Language\n\nSession Chair: N
 anxuan Zhao (Adobe Research)
URL:https://asia.siggraph.org/2024/program/?id=papers_785&sess=sess134
END:VEVENT
END:VCALENDAR
