BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Australia/Melbourne X-LIC-LOCATION:Australia/Melbourne BEGIN:DAYLIGHT TZOFFSETFROM:+1000 TZOFFSETTO:+1100 TZNAME:AEDT DTSTART:19721003T020000 RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU END:DAYLIGHT BEGIN:STANDARD DTSTART:19721003T020000 TZOFFSETFROM:+1100 TZOFFSETTO:+1000 TZNAME:AEST RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20240214T070247Z LOCATION:Meeting Room C4.9+C4.10\, Level 4 (Convention Centre) DTSTART;TZID=Australia/Melbourne:20231214T120000 DTEND;TZID=Australia/Melbourne:20231214T121500 UID:siggraphasia_SIGGRAPH Asia 2023_sess170_papers_742@linklings.com SUMMARY:Enhancing Diffusion Models with 3D Perspective Geometry Constraint s DESCRIPTION:Technical Papers\n\nRishi Upadhyay and Howard Zhang (Universit y of California, Los Angeles); Yunhao Ba (University of California, Los An geles; Sony); Ethan Yang, Blake Gella, and Sicheng Jiang (University of Ca lifornia, Los Angeles); Alex Wong (Yale University); and Achuta Kadambi (U niversity of California, Los Angeles)\n\nWhile perspective is a well-studi ed topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent di ffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principles o f linear perspective. We introduce a novel geometric constraint in the tra ining process of generative models to enforce perspective accuracy. We sho w that outputs of models trained with this constraint both appear more rea listic and improve performance of downstream models trained on generated i mages. Subjective human trials show that images generated with latent diff usion models trained with our constraint are preferred over images from th e Stable Diffusion V2 model 70% of the time. SOTA monocular depth estimati on models such as DPT and PixelFormer, fine-tuned on our images, outperfor m the original models trained on real images by up to 7.03% in RMSE and 19 .3% in SqRel on the KITTI test set for zero-shot transfer.\n\nRegistration Category: Full Access\n\nSession Chair: Xiangyu Xu (Xi'an Jiaotong Univer sity) URL:https://asia.siggraph.org/2023/full-program?id=papers_742&sess=sess170 END:VEVENT END:VCALENDAR