BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B5 (1)\, B Block\, Level 5 DTSTART;TZID=Asia/Tokyo:20241204T134600 DTEND;TZID=Asia/Tokyo:20241204T135800 UID:siggraphasia_SIGGRAPH Asia 2024_sess115_papers_702@linklings.com SUMMARY:BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation with 3D Blobs DESCRIPTION:Technical Papers\n\nChao Liu, Weili Nie, Sifei Liu, Abhishek B adki, Hang Su, Morteza Mardani, Benjamin Eckart, and Arash Vahdat (NVIDIA) \n\nRecent advances in text-to-image diffusion models have significantly e nhanced image generation quality, when trained on internet-scale data. How ever, existing methods are constrained by their reliance on image or scene -level conditions, limiting their ability to synthesize composable 3D obje cts in a complex scene. To address these limitations, we propose BlobGEN-3 D, a novel approach that decouples compositional 3D scene representation f rom 2D image generation, enabling direct controllability in the 3D space w hile fully leveraging the capabilities of 2D diffusion models. Specificall y, BlobGEN-3D utilizes object-level 3D blobs with rich textual description s as the 3D scene representation, which is amenable to 2D projection, and is seamlessly integrable with 2D diffusion models. Based on this represent ation, we introduce an auto-regressive pipeline for freeview image generat ion, by conditioning the pretrained blob-grounded 2D text-to-image\ndiffus ion model on the previously generated image. Our method has three key feat ures: (i) it enables modular representation of 3D scene elements; (ii) coh erent cross-view 2D generation; and (iii) manipulation of object appearanc e in the generated image sequences. Our method not only competes with the existing multi-view and optimization-based approaches, but also offers obj ect-level appearance control, which was not possible before with alternati ves that solely rely on scene-level descriptions, or image captions.\n\nRe gistration Category: Full Access, Full Access Supporter\n\nLanguage Format : English Language\n\nSession Chair: Peng-Shuai Wang (Peking University) URL:https://asia.siggraph.org/2024/program/?id=papers_702&sess=sess115 END:VEVENT END:VCALENDAR