BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023309Z
LOCATION:Hall B7 (1)\, B Block\, Level 7
DTSTART;TZID=Asia/Tokyo:20241203T134600
DTEND;TZID=Asia/Tokyo:20241203T135800
UID:siggraphasia_SIGGRAPH Asia 2024_sess105_papers_181@linklings.com
SUMMARY:Customizing Text-to-Image Diffusion with Object Viewpoint Control
DESCRIPTION:Technical Papers\n\nNupur Kumari and Grace Su (Carnegie Mellon
  Uniersity); Richard Zhang, Taesung Park, and Eli Shechtman (Adobe Researc
 h); and Jun-Yan Zhu (Carnegie Mellon Uniersity)\n\nModel customization int
 roduces new concepts to existing text-to-image models, enabling the genera
 tion of these new concepts/objects in novel contexts.\nHowever, such metho
 ds lack accurate camera view control with respect to the new object, and u
 sers must resort to prompt engineering (e.g., adding "top-view'") to achie
 ve coarse view control. In this work, we introduce a new task -- enabling 
 explicit control of the object viewpoint in the customization of text-to-i
 mage diffusion models. This allows us to modify the custom object's proper
 ties and generate it in various background scenes via text prompts, all wh
 ile incorporating the object viewpoint as an additional control. This new 
 task presents significant challenges, as one must harmoniously merge a 3D 
 representation from the multi-view images with the 2D pre-trained model. T
 o bridge this gap, we propose to condition the diffusion process on the 3D
  object features rendered from the target viewpoint. During training, we f
 ine-tune the 3D feature prediction modules to reconstruct the object's app
 earance and geometry, while reducing overfitting to the input multi-view i
 mages. Our method outperforms existing image editing and model customizati
 on baselines in preserving the custom object's identity while following th
 e target object viewpoint and the text prompt.\n\nRegistration Category: F
 ull Access, Full Access Supporter\n\nLanguage Format: English Language\n\n
 Session Chair: Kfir Aberman (Snap)
URL:https://asia.siggraph.org/2024/program/?id=papers_181&sess=sess105
END:VEVENT
END:VCALENDAR