BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070245Z
LOCATION:Meeting Room C4.8\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231213T181100
DTEND;TZID=Australia/Melbourne:20231213T182100
UID:siggraphasia_SIGGRAPH Asia 2023_sess147_papers_509@linklings.com
SUMMARY:What is the Best Automated Metric for Text to Motion Generation?
DESCRIPTION:Technical Communications, Technical Papers\n\nJordan Voas, Yil
 i Wang, Qixing Huang, and Raymond Mooney (University of Texas at Austin)\n
 \nThere is growing interest in generating skeleton-based human motions fro
 m natural language descriptions. While most efforts have focused on develo
 ping better neural architectures for this task, there has been no signific
 ant work on determining the proper evaluation metric. Human evaluation is 
 the ultimate accuracy measure for this task, and automated metrics should 
 correlate well with human quality judgments. Since descriptions are compat
 ible with many motions, determining the right metric is critical for evalu
 ating and designing effective generative models. This paper systematically
  studies which metrics best align with human evaluations and proposes new 
 metrics that align even better. Our findings indicate that none of the met
 rics currently used for this task show even a moderate correlation with hu
 man judgments on a sample level. However, for assessing average model perf
 ormance, commonly used metrics such as R-Precision and less-used coordinat
 e errors show strong correlations. Additionally, several recently develope
 d metrics are not recommended due to their low correlation compared to alt
 ernatives. We also introduce a novel metric based on a multimodal BERT-lik
 e model,  MoBERT, which offers strongly human-correlated sample-level eval
 uations while maintaining near-perfect model-level correlation. Our result
 s demonstrate that this new metric exhibits extensive benefits over all cu
 rrent alternatives.\n\nRegistration Category: Full Access\n\nSession Chair
 : Sheng Li (Peking University)
URL:https://asia.siggraph.org/2023/full-program?id=papers_509&sess=sess147
END:VEVENT
END:VCALENDAR