BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
X-LIC-LOCATION:Asia/Tokyo
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
DTSTART:18871231T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250110T023312Z
LOCATION:Hall B5 (2)\, B Block\, Level 5
DTSTART;TZID=Asia/Tokyo:20241204T172800
DTEND;TZID=Asia/Tokyo:20241204T174000
UID:siggraphasia_SIGGRAPH Asia 2024_sess122_papers_667@linklings.com
SUMMARY:Camera Settings as Tokens: Modeling Photography on Latent Diffusio
 n Models
DESCRIPTION:Technical Papers\n\nI-Sheng Fang, Yue-Hua Han, and Jun-Cheng C
 hen (Academia Sinica)\n\nText-to-image models have revolutionized content 
 creation, enabling users to generate images from natural language prompts.
  While recent advancements in conditioning these models offer more control
  over the generated results, photography—a significant artistic domain—rem
 ains inadequately integrated into these systems. Our research identifies c
 ritical gaps in modeling camera settings and photographic terms within tex
 t-to-image synthesis. Vision-language models (VLMs) like CLIP and OpenCLIP
 , which typically drive the text conditions through cross-attention mechan
 isms of conditional diffusion models, struggle to represent numerical data
  like camera settings effectively in their textual space. To address these
  challenges, we present CameraSettings20k, a new dataset aggregated from R
 AISE, DDPD, and PPR10K.Our curated dataset offers normalized camera settin
 gs for over 20,000 raw-format images, providing equivalent values standard
 ized to a full-frame sensor. Furthermore, we introduce Camera Settings as 
 Tokens, an embedding approach leveraging the LoRA adapter of Latent Diffus
 ion Models (LDMs) to numerically control image generation based on photogr
 aphic principles like focal length, aperture, film speed, and exposure tim
 e. Our experimental results demonstrate the effectiveness of the proposed 
 approach to generate promising synthesized images obeying the photographic
  principles given the specified numerical camera settings. Furthermore, ou
 r work not only bridges the gap between camera settings and user-friendly 
 photographic control in image synthesis but also sets the stage for future
  explorations into more physics-aware generative models.\n\nRegistration C
 ategory: Full Access, Full Access Supporter\n\nLanguage Format: English La
 nguage\n\nSession Chair: Minhyuk Sung (Korea Advanced Institute of Science
  and Technology (KAIST))
URL:https://asia.siggraph.org/2024/program/?id=papers_667&sess=sess122
END:VEVENT
END:VCALENDAR
