BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023313Z
LOCATION:Hall B7 (1)\, B Block\, Level 7
DTSTART;TZID=Asia/Tokyo:20241206T134200
DTEND;TZID=Asia/Tokyo:20241206T135600
UID:siggraphasia_SIGGRAPH Asia 2024_sess147_papers_168@linklings.com
SUMMARY:Dance-to-Music Generation with Encoder-based Textual Inversion
DESCRIPTION:Technical Papers\n\nSifei Li, Weiming Dong, and Yuxin Zhang (M
 AIS, Institute of Automation, Chinese Academy of Sciences; School of Artif
 icial Intelligence, University of Chinese Academy of Sciences); Fan Tang (
 University of Chinese Academy of Sciences); Chongyang Ma (Kuaishou Technol
 ogy); Oliver Deussen (University of Konstanz); Tong-Yee Lee (National Chen
 g-Kung University); and Changsheng Xu (MAIS, Institute of Automation, Chin
 ese Academy of Sciences; School of Artificial Intelligence, University of 
 Chinese Academy of Sciences)\n\nThe seamless integration of music with dan
 ce movements is essential for communicating the artistic intent of a dance
  piece. This alignment also significantly improves the immersive quality o
 f gaming experiences and animation productions. Although there has been re
 markable advancement in creating high-fidelity music from textual descript
 ions, current methodologies mainly focus on modulating overall characteris
 tics such as genre and emotional tone. They often overlook the nuanced man
 agement of temporal rhythm, which is indispensable in crafting music for d
 ance, since it intricately aligns the musical beats with the dancers' move
 ments. Recognizing this gap, we propose an encoder-based textual inversion
  technique to augment text-to-music models with visual control, facilitati
 ng personalized music generation. Specifically, we develop dual-path rhyth
 m-genre inversion to effectively integrate the rhythm and genre of a dance
  motion sequence into the textual space of a text-to-music model. Contrary
  to traditional textual inversion methods, which directly update text embe
 ddings to reconstruct a single target object, our approach utilizes separa
 te rhythm and genre encoders to obtain text embeddings for two pseudo-word
 s, adapting to the varying rhythms and genres. We collect a new dataset ca
 lled In-the-wild Dance Videos (InDV) and demonstrate that our approach out
 performs state-of-the-art methods across multiple evaluation metrics. Furt
 hermore, our method is able to adapt to changes in tempo} and effectively 
 integrates with the inherent text-guided generation capability of the pre-
 trained model. Our source code and demo videos are available at https://gi
 thub.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024.\n\nRegistration Ca
 tegory: Full Access, Full Access Supporter\n\nLanguage Format: English Lan
 guage\n\nSession Chair: Yi Zhou (Adobe)
URL:https://asia.siggraph.org/2024/program/?id=papers_168&sess=sess147
END:VEVENT
END:VCALENDAR
