BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023313Z
LOCATION:Hall B7 (1)\, B Block\, Level 7
DTSTART;TZID=Asia/Tokyo:20241206T153100
DTEND;TZID=Asia/Tokyo:20241206T154300
UID:siggraphasia_SIGGRAPH Asia 2024_sess150_papers_211@linklings.com
SUMMARY:PuzzleAvatar: Assembling 3D Avatars from Personal Albums
DESCRIPTION:Technical Papers\n\nYuliang Xiu (Max Planck Institute for Inte
 lligent Systems); Yufei Ye (Carnegie Mellon University); Zhen Liu (Max Pla
 nck Institute for Intelligent Systems; Mila, Université de Montréal); Dimi
 tris Tzionas (University of Amsterdam); and Michael J. Black (Max Planck I
 nstitute for Intelligent Systems)\n\nGenerating personalized 3D avatars is
  crucial for AR/VR. However, recent text-to-3D methods that generate avata
 rs for celebrities or fictional characters, struggle with everyday people.
  Methods for faithful reconstruction typically require full-body images in
  controlled settings. What if a user could just upload their personal "OOT
 D" (Outfit Of The Day) photo collection and get a faithful avatar in retur
 n? The challenge is that such casual photo collections contain diverse pos
 es, challenging viewpoints, cropped views, and occlusion (albeit with a co
 nsistent outfit, accessories and hairstyle). We address this novel "Album2
 Human" task by developing PuzzleAvatar, a novel model that generates a fai
 thful 3D avatar (in a canonical pose) from a personal OOTD album, while by
 passing the challenging estimation of body and camera pose. To this end, w
 e fine-tune a foundational vision-language model (VLM) on such photos, enc
 oding the appearance, identity, garments, hairstyles, and accessories of a
  person into (separate) learned tokens and instilling these cues into the 
 VLM. In effect, we exploit the learned tokens as "puzzle pieces" from whic
 h we assemble a faithful, personalized 3D avatar. Importantly, we can cust
 omize avatars by simply inter-changing tokens. As a benchmark for this new
  task, we collect a new dataset, called PuzzleIOI, with 41 subjects in a t
 otal of nearly 1K OOTD configurations, in challenging partial photos with 
 paired ground-truth 3D bodies. Evaluation shows that PuzzleAvatar not only
  has high reconstruction accuracy, outperforming TeCH and MVDreamBooth, bu
 t also a unique scalability to album photos, and strong robustness. Our mo
 del and data will be public.\n\nRegistration Category: Full Access, Full A
 ccess Supporter\n\nLanguage Format: English Language\n\nSession Chair: Li-
 Yi Wei (Adobe Research)
URL:https://asia.siggraph.org/2024/program/?id=papers_211&sess=sess150
END:VEVENT
END:VCALENDAR