BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B5 (1)\, B Block\, Level 5 DTSTART;TZID=Asia/Tokyo:20241206T091100 DTEND;TZID=Asia/Tokyo:20241206T092300 UID:siggraphasia_SIGGRAPH Asia 2024_sess139_papers_1298@linklings.com SUMMARY:SPARK: Self-supervised Personalized Real-time Monocular Face Captu re DESCRIPTION:Technical Papers\n\nKelian Baert (Technicolor Group, Institut national de recherche en informatique et en automatique (INRIA) Rennes); S hrisha Bharadwaj (Max Planck Institute for Intelligent Systems); Fabien Ca stan and Benoit Maujean (Technicolor Group); Marc Christie (Institut natio nal de recherche en informatique et en automatique (INRIA)); Victoria Fern ández Abrevaya (Max Planck Institute for Intelligent Systems); and Adnane Boukhayma (Institut national de recherche en informatique et en automatiqu e (INRIA))\n\nFeedforward monocular face capture methods seek to reconstru ct posed faces from a single image of a person. Current state of the art a pproaches have the ability to regress parametric 3D face models in real-ti me across a wide range of identities, lighting conditions and poses by lev eraging large image datasets of human faces. These methods however suffer from clear limitations in that the underlying parametric face model only p rovides a coarse estimation of the face shape, thereby limiting their prac tical applicability in tasks that require precise 3D reconstruction (aging , face swapping, digital make-up,...).\n\nIn this paper, we propose a meth od for high-precision 3D face capture taking advantage of a collection of unconstrained videos of a subject as prior information. Our proposal buil ds on a two stage approach. We start with the reconstruction of a detailed 3D face avatar of the person, capturing both precise geometry and appeara nce from a collection of videos. We then use the encoder from a pre-traine d monocular face reconstruction method, substituting its decoder with our personalized model, and proceed with transfer learning on the video collec tion. Using our pre-estimated image formation model, we obtain a more prec ise self-supervision objective, enabling improved expression and pose alig nment. This results in a trained encoder capable of efficiently regressing pose and expression parameters in real-time from previously unseen images , which combined with our personalized geometry model yields more accurate and high fidelity mesh inference. \n \nThrough extensive qualitativ e and quantitative evaluation, we showcase the superiority of our final mo del as compared to state-of-the-art baselines, and demonstrate its general ization ability to unseen pose, expression and lighting.\n\nRegistration C ategory: Full Access, Full Access Supporter\n\nLanguage Format: English La nguage\n\nSession Chair: Kui Wu (LightSpeed Studios) URL:https://asia.siggraph.org/2024/program/?id=papers_1298&sess=sess139 END:VEVENT END:VCALENDAR