BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (1)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241204T134600
DTEND;TZID=Asia/Tokyo:20241204T135800
UID:siggraphasia_SIGGRAPH Asia 2024_sess115_papers_702@linklings.com
SUMMARY:BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation 
 with 3D Blobs
DESCRIPTION:Technical Papers\n\nChao Liu, Weili Nie, Sifei Liu, Abhishek B
 adki, Hang Su, Morteza Mardani, Benjamin Eckart, and Arash Vahdat (NVIDIA)
 \n\nRecent advances in text-to-image diffusion models have significantly e
 nhanced image generation quality, when trained on internet-scale data. How
 ever, existing methods are constrained by their reliance on image or scene
 -level conditions, limiting their ability to synthesize composable 3D obje
 cts in a complex scene. To address these limitations, we propose BlobGEN-3
 D, a novel approach that decouples compositional 3D scene representation f
 rom 2D image generation, enabling direct controllability in the 3D space w
 hile fully leveraging the capabilities of 2D diffusion models. Specificall
 y, BlobGEN-3D utilizes object-level 3D blobs with rich textual description
 s as the 3D scene representation, which is amenable to 2D projection, and 
 is seamlessly integrable with 2D diffusion models. Based on this represent
 ation, we introduce an auto-regressive pipeline for freeview image generat
 ion, by conditioning the pretrained blob-grounded 2D text-to-image\ndiffus
 ion model on the previously generated image. Our method has three key feat
 ures: (i) it enables modular representation of 3D scene elements; (ii) coh
 erent cross-view 2D generation; and (iii) manipulation of object appearanc
 e in the generated image sequences. Our method not only competes with the 
 existing multi-view and optimization-based approaches, but also offers obj
 ect-level appearance control, which was not possible before with alternati
 ves that solely rely on scene-level descriptions, or image captions.\n\nRe
 gistration Category: Full Access, Full Access Supporter\n\nLanguage Format
 : English Language\n\nSession Chair: Peng-Shuai Wang (Peking University)
URL:https://asia.siggraph.org/2024/program/?id=papers_702&sess=sess115
END:VEVENT
END:VCALENDAR