BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023309Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241203T134600 DTEND;TZID=Asia/Tokyo:20241203T135800 UID:siggraphasia_SIGGRAPH Asia 2024_sess105_papers_181@linklings.com SUMMARY:Customizing Text-to-Image Diffusion with Object Viewpoint Control DESCRIPTION:Technical Papers\n\nNupur Kumari and Grace Su (Carnegie Mellon Uniersity); Richard Zhang, Taesung Park, and Eli Shechtman (Adobe Researc h); and Jun-Yan Zhu (Carnegie Mellon Uniersity)\n\nModel customization int roduces new concepts to existing text-to-image models, enabling the genera tion of these new concepts/objects in novel contexts.\nHowever, such metho ds lack accurate camera view control with respect to the new object, and u sers must resort to prompt engineering (e.g., adding "top-view'") to achie ve coarse view control. In this work, we introduce a new task -- enabling explicit control of the object viewpoint in the customization of text-to-i mage diffusion models. This allows us to modify the custom object's proper ties and generate it in various background scenes via text prompts, all wh ile incorporating the object viewpoint as an additional control. This new task presents significant challenges, as one must harmoniously merge a 3D representation from the multi-view images with the 2D pre-trained model. T o bridge this gap, we propose to condition the diffusion process on the 3D object features rendered from the target viewpoint. During training, we f ine-tune the 3D feature prediction modules to reconstruct the object's app earance and geometry, while reducing overfitting to the input multi-view i mages. Our method outperforms existing image editing and model customizati on baselines in preserving the custom object's identity while following th e target object viewpoint and the text prompt.\n\nRegistration Category: F ull Access, Full Access Supporter\n\nLanguage Format: English Language\n\n Session Chair: Kfir Aberman (Snap) URL:https://asia.siggraph.org/2024/program/?id=papers_181&sess=sess105 END:VEVENT END:VCALENDAR