BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070248Z LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231214T154500 DTEND;TZID=Australia/Melbourne:20231214T160000 UID:siggraphasia_SIGGRAPH Asia 2023_sess123_papers_316@linklings.com SUMMARY:Scene-aware Activity Program Generation with Language Guidance DESCRIPTION:Technical Communications, Technical Papers, TOG\n\nZejia Su (S henzhen University), Qingnan Fan (Vivo), Xuelin Chen (Tencent AI Lab), Oli ver van Kaick (Carleton University), and Hui Huang and Ruizhen Hu (Shenzhe n University)\n\nWe address the problem of scene-aware activity program ge neration, which requires decomposing a given activity task into instructio ns that can be sequentially performed within a target scene to complete th e activity. While existing methods have shown the ability to generate rati onal or executable programs, generating programs with both high rationalit y and executability still remains a challenge. Hence, we propose a novel m ethod where the key idea is to explicitly combine the language rationality of a powerful language model with dynamic perception of the target scene where instructions are executed, to generate programs with high rationalit y and executability. Our method iteratively generates instructions for the activity program. Specifically, a two-branch feature encoder operates on a language-based and graph-based representation of the current generation progress to extract category-aware language features and instance-aware sc ene graph features, respectively. These features are then used by a predic tor to generate the next instruction in the program. Subsequently, another module performs the predicted action and updates the scene for perception in the next iteration. Extensive evaluations are conducted on the Virtual Home-Env dataset, showing the advantages of our method over previous work. Key algorithmic designs are validated through ablation studies, and resul ts on other types of inputs are also presented to show the generalizabilit y of our method.\n\nRegistration Category: Full Access\n\nSession Chair: S ai-Kit Yeung (Hong Kong University of Science and Technology) URL:https://asia.siggraph.org/2023/full-program?id=papers_316&sess=sess123 END:VEVENT END:VCALENDAR