BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20260114T163632Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_316@linklings.com SUMMARY:Scene-aware Activity Program Generation with Language Guidance DESCRIPTION:Zejia Su (Shenzhen University), Qingnan Fan (Vivo), Xuelin Che n (Tencent AI Lab), Oliver van Kaick (Carleton University), and Hui Huang and Ruizhen Hu (Shenzhen University)\n\nWe address the problem of scene-aw are activity program generation, which requires decomposing a given activi ty task into instructions that can be sequentially performed within a targ et scene to complete the activity. While existing methods have shown the a bility to generate rational or executable programs, generating programs wi th both high rationality and executability still remains a challenge. Henc e, we propose a novel method where the key idea is to explicitly combine t he language rationality of a powerful language model with dynamic percepti on of the target scene where instructions are executed, to generate progra ms with high rationality and executability. Our method iteratively generat es instructions for the activity program. Specifically, a two-branch featu re encoder operates on a language-based and graph-based representation of the current generation progress to extract category-aware language feature s and instance-aware scene graph features, respectively. These features ar e then used by a predictor to generate the next instruction in the program . Subsequently, another module performs the predicted action and updates t he scene for perception in the next iteration. Extensive evaluations are c onducted on the VirtualHome-Env dataset, showing the advantages of our met hod over previous work. Key algorithmic designs are validated through abla tion studies, and results on other types of inputs are also presented to s how the generalizability of our method.\n\nRegistration Category: Full Acc ess, Enhanced Access, Trade Exhibitor, Experience Hall Exhibitor\n\n URL:https://asia.siggraph.org/2023/full-program?id=papers_316&sess=sess209 END:VEVENT END:VCALENDAR