BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070240Z LOCATION:Darling Harbour Theatre\, Level 2 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231212T093000 DTEND;TZID=Australia/Melbourne:20231212T124500 UID:siggraphasia_SIGGRAPH Asia 2023_sess209_papers_265@linklings.com SUMMARY:Decaf: Monocular Deformation Capture for Face and Hand Interaction s DESCRIPTION:Technical Papers\n\nSoshi Shimada (Max-Planck-Institut für Inf ormatik; Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence); Vladislav Golyanik (Max-Planck-Institut für In formatik); Patrick Pérez (Valeo); and Christian Theobalt (Max-Planck-Insti tut für Informatik; Saarbrücken Research Center for Visual Computing, Inte raction and Artificial Intelligence)\n\nExisting methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid obj ects (e.g., two hands or humans interacting with rigid environments). Mod elling dense non-rigid object deformations in this setting (e.g., when han d are interacting with a face), remained largely unaddressed so far, altho ugh such effects can improve the realism of the downstream applications su ch as AR/VR, 3D virtual avatar communications, and character animations. T his is due to the severe ill-posedness of the monocular view setting and t he associated challenges (e.g., in acquiring a dataset for training and ev aluation or obtaining the reasonable non-uniform stiffness of the deformab le object). While it is possible to na\"{i}vely track multiple non-rigid objects independently using 3D templates or parametric 3D models, such an approach would suffer from multiple artefacts in the resulting 3D estimate s such as depth ambiguity, unnatural intra-object collisions and missing o r implausible deformations. \n \nHence, this paper introduces the first m ethod that addresses the fundamental challenges depicted above and that al lows tracking human hands interacting with human faces in 3D from single m onocular RGB videos. We model hands as articulated objects inducing non-r igid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic fac e deformations acquired with a markerless multi-view camera system. As a p ivotal step in its creation, we process the reconstructed raw 3D shapes wi th position-based dynamics and an approach for non-uniform stiffness estim ation of the head tissues, which results in plausible annotations of the s urface deformations, hand-face contact regions and head-hand positions. At the core of our neural approach are a variational auto-encoder supplying the hand-face depth prior and modules that guide the 3D tracking by estima ting the contacts and the deformations.\n\nRegistration Category: Full Acc ess, Enhanced Access, Trade Exhibitor, Experience Hall Exhibitor URL:https://asia.siggraph.org/2023/full-program?id=papers_265&sess=sess209 END:VEVENT END:VCALENDAR