BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (1)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241206T091100
DTEND;TZID=Asia/Tokyo:20241206T092300
UID:siggraphasia_SIGGRAPH Asia 2024_sess139_papers_1298@linklings.com
SUMMARY:SPARK: Self-supervised Personalized Real-time Monocular Face Captu
 re
DESCRIPTION:Technical Papers\n\nKelian Baert (Technicolor Group, Institut 
 national de recherche en informatique et en automatique (INRIA) Rennes); S
 hrisha Bharadwaj (Max Planck Institute for Intelligent Systems); Fabien Ca
 stan and Benoit Maujean (Technicolor Group); Marc Christie (Institut natio
 nal de recherche en informatique et en automatique (INRIA)); Victoria Fern
 ández Abrevaya (Max Planck Institute for Intelligent Systems); and Adnane 
 Boukhayma (Institut national de recherche en informatique et en automatiqu
 e (INRIA))\n\nFeedforward monocular face capture methods seek to reconstru
 ct posed faces from a single image of a person. Current state of the art a
 pproaches have the ability to regress parametric 3D face models in real-ti
 me across a wide range of identities, lighting conditions and poses by lev
 eraging large image datasets of human faces. These methods however suffer 
 from clear limitations in that the underlying parametric face model only p
 rovides a coarse estimation of the face shape, thereby limiting their prac
 tical applicability in tasks that require precise 3D reconstruction (aging
 , face swapping, digital make-up,...).\n\nIn this paper, we propose a meth
 od for high-precision 3D face capture taking advantage of a collection of 
 unconstrained videos of a subject as prior information.  Our proposal buil
 ds on a two stage approach. We start with the reconstruction of a detailed
  3D face avatar of the person, capturing both precise geometry and appeara
 nce from a collection of videos. We then use the encoder from a pre-traine
 d monocular face reconstruction method, substituting its decoder with our 
 personalized model, and proceed with transfer learning on the video collec
 tion. Using our pre-estimated image formation model, we obtain a more prec
 ise self-supervision objective, enabling improved expression and pose alig
 nment. This results in a trained encoder capable of efficiently regressing
  pose and expression parameters in real-time from previously unseen images
 , which combined with our personalized geometry model yields more accurate
  and high fidelity  mesh inference.   \n    \nThrough extensive qualitativ
 e and quantitative evaluation, we showcase the superiority of our final mo
 del as compared to state-of-the-art baselines, and demonstrate its general
 ization ability to unseen pose, expression and lighting.\n\nRegistration C
 ategory: Full Access, Full Access Supporter\n\nLanguage Format: English La
 nguage\n\nSession Chair: Kui Wu (LightSpeed Studios)
URL:https://asia.siggraph.org/2024/program/?id=papers_1298&sess=sess139
END:VEVENT
END:VCALENDAR
