BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070248Z
LOCATION:Meeting Room C4.11\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231214T154500
DTEND;TZID=Australia/Melbourne:20231214T160000
UID:siggraphasia_SIGGRAPH Asia 2023_sess123_papers_316@linklings.com
SUMMARY:Scene-aware Activity Program Generation with Language Guidance
DESCRIPTION:Technical Communications, Technical Papers, TOG\n\nZejia Su (S
 henzhen University), Qingnan Fan (Vivo), Xuelin Chen (Tencent AI Lab), Oli
 ver van Kaick (Carleton University), and Hui Huang and Ruizhen Hu (Shenzhe
 n University)\n\nWe address the problem of scene-aware activity program ge
 neration, which requires decomposing a given activity task into instructio
 ns that can be sequentially performed within a target scene to complete th
 e activity. While existing methods have shown the ability to generate rati
 onal or executable programs, generating programs with both high rationalit
 y and executability still remains a challenge. Hence, we propose a novel m
 ethod where the key idea is to explicitly combine the language rationality
  of a powerful language model with dynamic perception of the target scene 
 where instructions are executed, to generate programs with high rationalit
 y and executability. Our method iteratively generates instructions for the
  activity program. Specifically, a two-branch feature encoder operates on 
 a language-based and graph-based representation of the current generation 
 progress to extract category-aware language features and instance-aware sc
 ene graph features, respectively. These features are then used by a predic
 tor to generate the next instruction in the program. Subsequently, another
  module performs the predicted action and updates the scene for perception
  in the next iteration. Extensive evaluations are conducted on the Virtual
 Home-Env dataset, showing the advantages of our method over previous work.
  Key algorithmic designs are validated through ablation studies, and resul
 ts on other types of inputs are also presented to show the generalizabilit
 y of our method.\n\nRegistration Category: Full Access\n\nSession Chair: S
 ai-Kit Yeung (Hong Kong University of Science and Technology)
URL:https://asia.siggraph.org/2023/full-program?id=papers_316&sess=sess123
END:VEVENT
END:VCALENDAR