BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240214T070247Z
LOCATION:Meeting Room C4.9+C4.10\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231214T120000
DTEND;TZID=Australia/Melbourne:20231214T121500
UID:siggraphasia_SIGGRAPH Asia 2023_sess170_papers_742@linklings.com
SUMMARY:Enhancing Diffusion Models with 3D Perspective Geometry Constraint
 s
DESCRIPTION:Technical Papers\n\nRishi Upadhyay and Howard Zhang (Universit
 y of California, Los Angeles); Yunhao Ba (University of California, Los An
 geles; Sony); Ethan Yang, Blake Gella, and Sicheng Jiang (University of Ca
 lifornia, Los Angeles); Alex Wong (Yale University); and Achuta Kadambi (U
 niversity of California, Los Angeles)\n\nWhile perspective is a well-studi
 ed topic in art, it is generally taken for granted in images. However, for
  the recent wave of high-quality image synthesis methods such as latent di
 ffusion models, perspective accuracy is not an explicit requirement. Since
  these methods are capable of outputting a wide gamut of possible images, 
 it is difficult for these synthesized images to adhere to the principles o
 f linear perspective. We introduce a novel geometric constraint in the tra
 ining process of generative models to enforce perspective accuracy. We sho
 w that outputs of models trained with this constraint both appear more rea
 listic and improve performance of downstream models trained on generated i
 mages. Subjective human trials show that images generated with latent diff
 usion models trained with our constraint are preferred over images from th
 e Stable Diffusion V2 model 70% of the time. SOTA monocular depth estimati
 on models such as DPT and PixelFormer, fine-tuned on our images, outperfor
 m the original models trained on real images by up to 7.03% in RMSE and 19
 .3% in SqRel on the KITTI test set for zero-shot transfer.\n\nRegistration
  Category: Full Access\n\nSession Chair: Xiangyu Xu (Xi'an Jiaotong Univer
 sity)
URL:https://asia.siggraph.org/2023/full-program?id=papers_742&sess=sess170
END:VEVENT
END:VCALENDAR