BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070241Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_316@linklings.com SUMMARY:Scene-aware Activity Program Generation with Language Guidance DESCRIPTION:Technical Papers\n\nZejia Su (Shenzhen University), Qingnan Fa n (Vivo), Xuelin Chen (Tencent AI Lab), Oliver van Kaick (Carleton Univers ity), and Hui Huang and Ruizhen Hu (Shenzhen University)\n\nWe address the problem of scene-aware activity program generation, which requires decomp osing a given activity task into instructions that can be sequentially per formed within a target scene to complete the activity. While existing meth ods have shown the ability to generate rational or executable programs, ge nerating programs with both high rationality and executability still remai ns a challenge. Hence, we propose a novel method where the key idea is to explicitly combine the language rationality of a powerful language model w ith dynamic perception of the target scene where instructions are executed , to generate programs with high rationality and executability. Our method iteratively generates instructions for the activity program. Specifically , a two-branch feature encoder operates on a language-based and graph-base d representation of the current generation progress to extract category-aw are language features and instance-aware scene graph features, respectivel y. These features are then used by a predictor to generate the next instru ction in the program. Subsequently, another module performs the predicted action and updates the scene for perception in the next iteration. Extensi ve evaluations are conducted on the VirtualHome-Env dataset, showing the a dvantages of our method over previous work. Key algorithmic designs are va lidated through ablation studies, and results on other types of inputs are also presented to show the generalizability of our method.\n\nRegistratio n Category: Full Access, Enhanced Access, Trade Exhibitor, Experience Hall Exhibitor URL:https://asia.siggraph.org/2023/full-program?id=papers_316&sess=sess209 END:VEVENT END:VCALENDAR