BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070250Z LOCATION:Meeting Room C4.8\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231215T112500 DTEND;TZID=Australia/Melbourne:20231215T114000 UID:siggraphasia_SIGGRAPH Asia 2023_sess156_papers_265@linklings.com SUMMARY:Decaf: Monocular Deformation Capture for Face and Hand Interaction s DESCRIPTION:Technical Communications, Technical Papers\n\nSoshi Shimada (M ax-Planck-Institut für Informatik; Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence); Vladislav Golyanik ( Max-Planck-Institut für Informatik); Patrick Pérez (Valeo); and Christian Theobalt (Max-Planck-Institut für Informatik; Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence)\n\nExistin g methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects (e.g., two hands or humans interacting with rigid environments). Modelling dense non-rigid object deformations in th is setting (e.g., when hand are interacting with a face), remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR, 3D virtual avatar communications, a nd character animations. This is due to the severe ill-posedness of the mo nocular view setting and the associated challenges (e.g., in acquiring a d ataset for training and evaluation or obtaining the reasonable non-uniform stiffness of the deformable object). While it is possible to na\"{i}vely track multiple non-rigid objects independently using 3D templates or para metric 3D models, such an approach would suffer from multiple artefacts in the resulting 3D estimates such as depth ambiguity, unnatural intra-objec t collisions and missing or implausible deformations. \n \nHence, this pa per introduces the first method that addresses the fundamental challenges depicted above and that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articula ted objects inducing non-rigid face deformations during an active interact ion. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi- view camera system. As a pivotal step in its creation, we process the reco nstructed raw 3D shapes with position-based dynamics and an approach for n on-uniform stiffness estimation of the head tissues, which results in plau sible annotations of the surface deformations, hand-face contact regions a nd head-hand positions. At the core of our neural approach are a variation al auto-encoder supplying the hand-face depth prior and modules that guide the 3D tracking by estimating the contacts and the deformations.\n\nRegis tration Category: Full Access\n\nSession Chair: Sergi Pujades (National In stitute for Research in Computer Science and Automation (INRIA), Universit é Grenoble Alpes) URL:https://asia.siggraph.org/2023/full-program?id=papers_265&sess=sess156 END:VEVENT END:VCALENDAR