BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241204T110800 DTEND;TZID=Asia/Tokyo:20241204T111900 UID:siggraphasia_SIGGRAPH Asia 2024_sess114_papers_508@linklings.com SUMMARY:CPoser: An Optimization-after-Parsing Approach for Text-to-Pose Ge neration Using Large Language Models. DESCRIPTION:Technical Papers\n\nYumeng Li, Bohong Chen, Zhong Ren, and Yao -Xiang Ding (Zhejiang University); Libin Liu (Peking University); and Tian jia Shao and Kun Zhou (Zhejiang University)\n\nText-to-pose generation is challenging due to the complexity of natural language and human posture se mantics. Utilizing large language models (LLMs) for text-to-pose generatio n is appealing due to their strong capabilities in text understanding and reasoning. However, as LLMs are designed for general-purpose language proc essing and not specifically trained for pose generation, it remains nontri vial to generate precise articulation targets for the full body using LLMs directly. To this end, we propose CPoser, a novel approach to harness the power of LLMs for text-to-pose generation, featuring a prompt parsing sta ge and a pose optimization stage. The parsing stage utilizes LLMs to turn text prompts into pose intermediate representations (Pose-IRs) through a s et of predefined structured queries. These Pose-IRs explicitly describe sp ecific pose conditions, such as squatting depth and knee bending angle, na turally forming an objective function that a target pose should satisfy. T he optimization stage solves for expressive poses and hand gestures based on the Pose-IR objective function via robust optimization in a quantized p ose prior space. The results are further refined to enhance naturalness an d incorporate facial expressions. Experiments show that our approach effec tively understands diverse text prompts for pose generation, surpassing ex isting text-to-pose methods.\n\nRegistration Category: Full Access, Full A ccess Supporter\n\nLanguage Format: English Language\n\nSession Chair: Kai Wang (Amazon) URL:https://asia.siggraph.org/2024/program/?id=papers_508&sess=sess114 END:VEVENT END:VCALENDAR