BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070242Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_509@linklings.com SUMMARY:What is the Best Automated Metric for Text to Motion Generation? DESCRIPTION:Technical Papers\n\nJordan Voas, Yili Wang, Qixing Huang, and Raymond Mooney (University of Texas at Austin)\n\nThere is growing interes t in generating skeleton-based human motions from natural language descrip tions. While most efforts have focused on developing better neural archite ctures for this task, there has been no significant work on determining th e proper evaluation metric. Human evaluation is the ultimate accuracy meas ure for this task, and automated metrics should correlate well with human quality judgments. Since descriptions are compatible with many motions, de termining the right metric is critical for evaluating and designing effect ive generative models. This paper systematically studies which metrics bes t align with human evaluations and proposes new metrics that align even be tter. Our findings indicate that none of the metrics currently used for th is task show even a moderate correlation with human judgments on a sample level. However, for assessing average model performance, commonly used met rics such as R-Precision and less-used coordinate errors show strong corre lations. Additionally, several recently developed metrics are not recommen ded due to their low correlation compared to alternatives. We also introdu ce a novel metric based on a multimodal BERT-like model, MoBERT, which of fers strongly human-correlated sample-level evaluations while maintaining near-perfect model-level correlation. Our results demonstrate that this ne w metric exhibits extensive benefits over all current alternatives.\n\nReg istration Category: Full Access, Enhanced Access, Trade Exhibitor, Experie nce Hall Exhibitor URL:https://asia.siggraph.org/2023/full-program?id=papers_509&sess=sess209 END:VEVENT END:VCALENDAR