BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20260114T163708Z LOCATION:Meeting Room C4.8\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231213T181100 DTEND;TZID=Australia/Melbourne:20231213T182100 UID:siggraphasia_SIGGRAPH Asia 2023_sess147_papers_509@linklings.com SUMMARY:What is the Best Automated Metric for Text to Motion Generation? DESCRIPTION:Jordan Voas, Yili Wang, Qixing Huang, and Raymond Mooney (Univ ersity of Texas at Austin)\n\nThere is growing interest in generating skel eton-based human motions from natural language descriptions. While most ef forts have focused on developing better neural architectures for this task , there has been no significant work on determining the proper evaluation metric. Human evaluation is the ultimate accuracy measure for this task, a nd automated metrics should correlate well with human quality judgments. S ince descriptions are compatible with many motions, determining the right metric is critical for evaluating and designing effective generative model s. This paper systematically studies which metrics best align with human e valuations and proposes new metrics that align even better. Our findings i ndicate that none of the metrics currently used for this task show even a moderate correlation with human judgments on a sample level. However, for assessing average model performance, commonly used metrics such as R-Preci sion and less-used coordinate errors show strong correlations. Additionall y, several recently developed metrics are not recommended due to their low correlation compared to alternatives. We also introduce a novel metric ba sed on a multimodal BERT-like model, MoBERT, which offers strongly human- correlated sample-level evaluations while maintaining near-perfect model-l evel correlation. Our results demonstrate that this new metric exhibits ex tensive benefits over all current alternatives.\n\nRegistration Category: Full Access\n\nSession Chair: Sheng Li (Peking University)\n\n URL:https://asia.siggraph.org/2023/full-program?id=papers_509&sess=sess147 END:VEVENT END:VCALENDAR