BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B5 (2)\, B Block\, Level 5 DTSTART;TZID=Asia/Tokyo:20241204T172800 DTEND;TZID=Asia/Tokyo:20241204T174000 UID:siggraphasia_SIGGRAPH Asia 2024_sess122_papers_667@linklings.com SUMMARY:Camera Settings as Tokens: Modeling Photography on Latent Diffusio n Models DESCRIPTION:Technical Papers\n\nI-Sheng Fang, Yue-Hua Han, and Jun-Cheng C hen (Academia Sinica)\n\nText-to-image models have revolutionized content creation, enabling users to generate images from natural language prompts. While recent advancements in conditioning these models offer more control over the generated results, photography—a significant artistic domain—rem ains inadequately integrated into these systems. Our research identifies c ritical gaps in modeling camera settings and photographic terms within tex t-to-image synthesis. Vision-language models (VLMs) like CLIP and OpenCLIP , which typically drive the text conditions through cross-attention mechan isms of conditional diffusion models, struggle to represent numerical data like camera settings effectively in their textual space. To address these challenges, we present CameraSettings20k, a new dataset aggregated from R AISE, DDPD, and PPR10K.Our curated dataset offers normalized camera settin gs for over 20,000 raw-format images, providing equivalent values standard ized to a full-frame sensor. Furthermore, we introduce Camera Settings as Tokens, an embedding approach leveraging the LoRA adapter of Latent Diffus ion Models (LDMs) to numerically control image generation based on photogr aphic principles like focal length, aperture, film speed, and exposure tim e. Our experimental results demonstrate the effectiveness of the proposed approach to generate promising synthesized images obeying the photographic principles given the specified numerical camera settings. Furthermore, ou r work not only bridges the gap between camera settings and user-friendly photographic control in image synthesis but also sets the stage for future explorations into more physics-aware generative models.\n\nRegistration C ategory: Full Access, Full Access Supporter\n\nLanguage Format: English La nguage\n\nSession Chair: Minhyuk Sung (Korea Advanced Institute of Science and Technology (KAIST)) URL:https://asia.siggraph.org/2024/program/?id=papers_667&sess=sess122 END:VEVENT END:VCALENDAR