BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070245Z LOCATION:Meeting Room C4.8\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231213T181100 DTEND;TZID=Australia/Melbourne:20231213T182100 UID:siggraphasia_SIGGRAPH Asia 2023_sess147_papers_509@linklings.com SUMMARY:What is the Best Automated Metric for Text to Motion Generation? DESCRIPTION:Technical Communications, Technical Papers\n\nJordan Voas, Yil i Wang, Qixing Huang, and Raymond Mooney (University of Texas at Austin)\n \nThere is growing interest in generating skeleton-based human motions fro m natural language descriptions. While most efforts have focused on develo ping better neural architectures for this task, there has been no signific ant work on determining the proper evaluation metric. Human evaluation is the ultimate accuracy measure for this task, and automated metrics should correlate well with human quality judgments. Since descriptions are compat ible with many motions, determining the right metric is critical for evalu ating and designing effective generative models. This paper systematically studies which metrics best align with human evaluations and proposes new metrics that align even better. Our findings indicate that none of the met rics currently used for this task show even a moderate correlation with hu man judgments on a sample level. However, for assessing average model perf ormance, commonly used metrics such as R-Precision and less-used coordinat e errors show strong correlations. Additionally, several recently develope d metrics are not recommended due to their low correlation compared to alt ernatives. We also introduce a novel metric based on a multimodal BERT-lik e model, MoBERT, which offers strongly human-correlated sample-level eval uations while maintaining near-perfect model-level correlation. Our result s demonstrate that this new metric exhibits extensive benefits over all cu rrent alternatives.\n\nRegistration Category: Full Access\n\nSession Chair : Sheng Li (Peking University) URL:https://asia.siggraph.org/2023/full-program?id=papers_509&sess=sess147 END:VEVENT END:VCALENDAR