BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070242Z
LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231212T093000
DTEND;TZID=Australia/Melbourne:20231212T124500
UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_509@linklings.com
SUMMARY:What is the Best Automated Metric for Text to Motion Generation?
DESCRIPTION:Technical Papers\n\nJordan Voas, Yili Wang, Qixing Huang, and 
 Raymond Mooney (University of Texas at Austin)\n\nThere is growing interes
 t in generating skeleton-based human motions from natural language descrip
 tions. While most efforts have focused on developing better neural archite
 ctures for this task, there has been no significant work on determining th
 e proper evaluation metric. Human evaluation is the ultimate accuracy meas
 ure for this task, and automated metrics should correlate well with human 
 quality judgments. Since descriptions are compatible with many motions, de
 termining the right metric is critical for evaluating and designing effect
 ive generative models. This paper systematically studies which metrics bes
 t align with human evaluations and proposes new metrics that align even be
 tter. Our findings indicate that none of the metrics currently used for th
 is task show even a moderate correlation with human judgments on a sample 
 level. However, for assessing average model performance, commonly used met
 rics such as R-Precision and less-used coordinate errors show strong corre
 lations. Additionally, several recently developed metrics are not recommen
 ded due to their low correlation compared to alternatives. We also introdu
 ce a novel metric based on a multimodal BERT-like model,  MoBERT, which of
 fers strongly human-correlated sample-level evaluations while maintaining 
 near-perfect model-level correlation. Our results demonstrate that this ne
 w metric exhibits extensive benefits over all current alternatives.\n\nReg
 istration Category: Full Access, Enhanced Access, Trade Exhibitor, Experie
 nce Hall Exhibitor
URL:https://asia.siggraph.org/2023/full-program?id=papers_509&sess=sess209
END:VEVENT
END:VCALENDAR