BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Tokyo X-LIC-LOCATION:Asia/Tokyo BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:JST DTSTART:18871231T000000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20250110T023312Z LOCATION:Hall B7 (1)\, B Block\, Level 7 DTSTART;TZID=Asia/Tokyo:20241204T105600 DTEND;TZID=Asia/Tokyo:20241204T110800 UID:siggraphasia_SIGGRAPH Asia 2024_sess114_papers_716@linklings.com SUMMARY:SGEdit: Bridging LLM with Text2Image Generative Model for Scene Gr aph-based Image Editing DESCRIPTION:Technical Papers\n\nZhiyuan Zhang (City University of Hong Kon g), DongDong Chen (Microsoft GenAI), and Jing Liao (City University of Hon g Kong)\n\nScene graphs offer a structured, hierarchical representation of images, with nodes and edges symbolizing objects and the relationships am ong them. It can serve as a natural interface for image editing, dramatica lly improving precision and flexibility. Leveraging this benefit, we intro duce a new framework that integrates large language model (LLM) with Text2 Image generative model for scene graph-based image editing. This integrati on enables precise modifications at the object level and creative recompos ition of scenes without compromising overall image integrity. Our approach involves two primary stages: 1) Utilizing a LLM-driven scene parser, we c onstruct an image's scene graph, capturing key objects and their interrela tionships, as well as parsing fine-grained attributes such as object masks and descriptions. These annotations facilitate concept learning with a fi ne-tuned diffusion model, representing each object with an optimized token and detailed description prompt. 2) During the image editing phase, a LLM editing controller guides the edits towards specific areas. These edits a re then implemented by an attention-modulated diffusion editor, utilizing the fine-tuned model to perform object additions, deletions, replacements, and adjustments. Through extensive experiments, we demonstrate that our f ramework significantly outperforms existing image editing methods in terms of editing precision and scene aesthetics. Our code will be made publicly available.\n\nRegistration Category: Full Access, Full Access Supporter\n \nLanguage Format: English Language\n\nSession Chair: Kai Wang (Amazon) URL:https://asia.siggraph.org/2024/program/?id=papers_716&sess=sess114 END:VEVENT END:VCALENDAR