BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260114T163641Z
LOCATION:Meeting Room C4.9+C4.10\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231214T120000
DTEND;TZID=Australia/Melbourne:20231214T121500
UID:siggraphasia_SIGGRAPH Asia 2023_sess170_papers_742@linklings.com
SUMMARY:Enhancing Diffusion Models with 3D Perspective Geometry Constraint
 s
DESCRIPTION:Rishi Upadhyay and Howard Zhang (University of California, Los
  Angeles); Yunhao Ba (University of California, Los Angeles; Sony); Ethan 
 Yang, Blake Gella, and Sicheng Jiang (University of California, Los Angele
 s); Alex Wong (Yale University); and Achuta Kadambi (University of Califor
 nia, Los Angeles)\n\nWhile perspective is a well-studied topic in art, it 
 is generally taken for granted in images. However, for the recent wave of 
 high-quality image synthesis methods such as latent diffusion models, pers
 pective accuracy is not an explicit requirement. Since these methods are c
 apable of outputting a wide gamut of possible images, it is difficult for 
 these synthesized images to adhere to the principles of linear perspective
 . We introduce a novel geometric constraint in the training process of gen
 erative models to enforce perspective accuracy. We show that outputs of mo
 dels trained with this constraint both appear more realistic and improve p
 erformance of downstream models trained on generated images. Subjective hu
 man trials show that images generated with latent diffusion models trained
  with our constraint are preferred over images from the Stable Diffusion V
 2 model 70% of the time. SOTA monocular depth estimation models such as DP
 T and PixelFormer, fine-tuned on our images, outperform the original model
 s trained on real images by up to 7.03% in RMSE and 19.3% in SqRel on the 
 KITTI test set for zero-shot transfer.\n\nRegistration Category: Full Acce
 ss\n\nSession Chair: Xiangyu Xu (Xi'an Jiaotong University)\n\n
URL:https://asia.siggraph.org/2023/full-program?id=papers_742&sess=sess170
END:VEVENT
END:VCALENDAR