AI for Content Creation Workshop

June 12th @ CVPR 2025

Karl F. Dean Grand Ballroom A1, 4th Floor, Music City Center, Nashville, TN, USA

Remote (Zoom): Via CVPR site



Summary

Content creation plays a crucial role in domains such as photography, videography, virtual reality, gaming, art, design, fashion, and advertising design. Recent progress in machine learning and AI has transformed hours of manual, painstaking content creation work into minutes or seconds of automated or interactive work. For instance, generative modeling approaches can produce photorealistic images of 2D and 3D items such as humans, landscapes, interior scenes, virtual environments, clothing, or even industrial designs. New large text, image, and video models that share latent spaces let us imaginatively describe scenes and have them realized automatically—with new multi-modal approaches able to generate consistent video and audio across long timeframes. Such approaches can also super-resolve and super-slomo videos, interpolate and extrapolate between photos and videos with intermediate novel views, decompose scene objects and appearance, and transfer styles to convincingly render and reinterpret content. Learned priors of images, videos, and 3D data can also be combined with explicit appearance and geometric constraints, perceptual understanding, or even functional and semantic constraints of objects. While often creating awe-inspiring artistic images, such techniques offer unique opportunities for generating diverse synthetic training data for downstream computer vision tasks, both in 2D, video, and 3D domains.

The AI for Content Creation workshop explores this exciting and fast-moving research area. We bring together invited speakers of world-class expertise in content creation, up-and-coming researchers, and authors of submitted workshop papers, to engage in a day filled with learning, discussion, and network building.

Welcome! -
Deqing Sun (Google)
Lingjie Liu (University of Pennsylvania)
Krishna Kumar Singh (Adobe)
Lu Jiang (ByteDance)
Jun-Yan Zhu (Carnegie Mellon University)
James Tompkin (Brown University)



Firefly Video (Adobe, 2025), Genie 2 (DeepMind, 2024), SORA (OpenAI, 2024).

2025 Schedule

Morning session:
Time CDT
08:45 Welcome and introductions 👋
09:00 Maneesh Agrawala  (Stanford University) Bluesky
09:30 Kai Zhang  (Adobe)
10:00 Coffee break
10:30 Charles Herrmann  (Google)
11:00 Mark Boss  (Stability AI)
11:30 Poster session 1 - ExHall D #412-431
  1. Parallel Rescaling: Rebalancing Consistency Guidance for Personalized Diffusion Models
    JungWoo Chae (Nexon Korea); Jiyoon Kim (LGCNS); Sangheum Hwang (Seoul National University of Science and Technology)
  2. EOPose : Exemplar-based object reposing using Generalized Pose Correspondences
    Sarthak Mehrotra (Indian Institute of Technology Bombay); Rishabh Jain (Adobe); Mayur Hemani (Adobe); Balaji Krishnamurthy (Adobe); Mausoom Sarkar (Adobe)
  3. Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLM
    Maximilian Mews (HU Berlin); Ansar Aynetdinov (HU Berlin); Vivian Schiller (RWTH Aachen); Peter Eisert (HU Berlin); Alan Akbik (HU Berlin)
  4. Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling
    Junhong Lee (POSTECH); Seungwook Kim (POSTECH, bytedance); Minsu Cho (POSTECH)
  5. MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
    Sihyun Yu (KAIST); Meera Hahn (Google DeepMind); Dan Kondratyuk (Luma AI); Jinwoo Shin (KAIST); Agrim Gupta (Google DeepMind); José Lezama (Google DeepMind); Irfan Essa (Google DeepMind); David Ross (Google DeepMind); Jonathan Huang (Scaled Foundations)
  6. Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality
    Pramook Khungurn (pixiv, Inc.); Phonphrm Thawatdamrongkit (VISTEC); Sukit Seripanitkarn (VISTEC); Supasorn Suwajanakorn (VISTEC)
  7. Generating Animated Layouts as Structured Text Representations
    Yeonsang Shin (Seoul National University); Jihwan Kim (Seoul National University); Yumin Song (Seoul National University); Kyungseung Lee (SK telecom); Hyunhee Chung (SK telecom); Taeyoung Na (SK telecom)
  8. CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
    Dejia Xu (University of Texas at Austin); Weili Nie (NVIDIA); Chao Liu (NVIDIA); Sifei Liu (NVIDIA); Jan Kautz (NVIDIA); Zhangyang Wang (University of Texas at Austin); Arash Vahdat (NVIDIA ) [https://ir1d.github.io/CamCo/]
  9. LiftRefine: Progressively Refined View Synthesis from 3D Lifting with Volume-Triplane Representations
    Tung Do (Movian Research); Thuan Nguyen (MBZUAI); Anh Tran (Movian Research); Rang Nguyen (VinAI Research); Binh-Son Hua (Trinity College Dublin)
  10. Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
    Liu He (Purdue University); Yizhi Song (Purdue University); Hejun Huang (University of Michigan); Pinxin Liu (University of Rochester); Yunlong Tang (University of Rochester); Daniel Aliaga (Purdue University); Xin Zhou (Baidu USA)
  11. VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment
    Wenyan Cong (University of Texas at Austin); Hanqing Zhu (University of Texas at Austin); Kevin Wang (University of Texas at Austin); Jiahui Lei (University of Pennsylvania); Colton Stearns (Stanford University); Yuanhao Cai (Johns Hopkins University); Dilin Wang (Meta); Rakesh Ranjan (Meta); Matt Feiszli (Meta); Leonidas Guibas (Stanford University); Atlas Wang (University of Texas at Austin); Weiyao Wang (Meta); Zhiwen Fan (University of Texas at Austin) [https://videolifter.github.io/]
  12. HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
    Jinbin Bai (National University of Singapore); Wei Chow (National University of Singapore); Ling Yang (Peking University); Xiangtai Li (Skywork AI); Juncheng Li (National University of Singapore); Hanwang Zhang (Nanyang Technological University ); Shuicheng Yan (National University of Singapore) [https://github.com/viiika/HumanEdit]
  13. DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description
    Adrienne Deganutti (University of Surrey); Simon Hadfield (University of Surrey); Andrew Gilbert (University of Surrey) [https://andrewjohngilbert.github.io/DANTE-AD/]
  1. Stable Flow: Vital Layers for Training-Free Image Editing
    Omri Avrahami (The Hebrew University of Jerusalem); Or Patashnik (Tel Aviv University); Ohad Fried (Reichman University); Egor Nemchinov (Snap); Kfir Aberman (Snap); Dani Lischinski (The Hebrew University of Jerusalem); Daniel Cohen-Or (Tel Aviv University) — CVPR 2025
  2. HyperGS: Hyperspectral 3D Gaussian Splatting
    Christopher Thirgood (University of Surrey); Oscar Mendez (University of Surrey); Erin Ling (University of Surrey); Jon Storey (i3D Robotics); Simon Hadfield (University of Surrey) — CVPR 2025
  3. Tiled Diffusion
    Or Madar (Reichman University); Ohad Fried (Reichman University) [https://madaror.github.io/tiled-diffusion.github.io/] — CVPR 2025
  4. DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
    Shwetha Ram (Amazon) — WACV 2025
  5. VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
    Juil Koo (KAIST); Paul Guerrero (Adobe Research); Chun-Hao Huang (Adobe Research); Duygu Ceylan (Adobe Research); Minhyuk Sung (KAIST) — CVPR 2025
  6. HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks
    Maria Pilligua Costa (Computer Vision Center (CVC), Universitat Autònoma Barcelona (UAB)); Danna Xue (Northwestern Polytechnical University, Computer Vision Center (CVC), Universitat Autònoma Barcelona (UAB)); Javier Vazquez-Corral (Computer Vision Center (CVC), Universitat Autònoma Barcelona (UAB)) — CVPR 2025
  7. Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation
    Rajeev Goel (Arizona State University); Utkarsh Nath (Arizona State University); Eun Som Jeon (Seoul National University of Science and Technology); Kyle Min (Intel Labs); Changhoon Kim (Arizona State University); Pavan Turaga (Arizona State Univerisity) [https://moment-3d.github.io/] — WACV 2025
12:30 Lunch break - ExHall C 🥪


Cat4D (Google, 2024), AssetGen (Meta, 2024), DreamFusion (Google, 2022).


Afternoon session:
Time CDT
13:30 Oral session + best paper announcement + best presentation competition
14:00 Yutong Bai  (UC Berkeley) X
14:30 Nanxuan (Cherry) Zhao  (Adobe)
15:00 Coffee break
15:30 Ishan Misra (+Rohit Girdhar)  (Meta) X
16:00 Panel discussion — Open Source in AI and the Creative Industry 🗣️
17:00 Poster session 2 - ExHall D #412-431
  1. Towards Flim-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks
    shiyu xia (AI Lab, Giant Network); Junjie Zheng (AI Lab, Giant Network); chaoyi wang (Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences); Zihao Chen (AI Lab, Giant Network); chaofan ding (AI Lab, Giant Network); Xiaohao Zhang (AI Lab, Giant Network); Xi Tao (AI Lab, Giant Network); Xiaoming He (School of life sciences, Fudan University); XINHAN DI (Deepearthgo)
  2. Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion
    Minseo Kim (KAIST); MINCHAN KWON (KAIST); dongyeun Lee (KAIST); Yunho Jeon (Hanbat University); junmo Kim (KAIST)
  3. Vectorized Region Based Brush Strokes for Artistic Rendering
    Jeripothula Prudviraj (TCS Research); Vikram Jamwal (TCS Research)
  4. GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting
    Anushka Agarwal (University of Massachusetts Amherst); Yusuf Hassan (University of Massachusetts Amherst); Talha Chafekar (University of Massachusetts Amherst)
  5. DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing
    Qi Li (Amazon); shuwen qiu (UCLA); Kee Kiat Koo (Amazon); Julien Han (UCLA); Karim Bouyarmane (Amazon)
  6. Is Concatenation Really All You Need? Efficient Concatenation-Based Pose Conditioning and Pose Control for Virtual Try On
    Qi Li (Amazon); Shuwen Qiu (UCLA); Kee Kiat Koo (Amazon); Julien Han (Amazon)
  7. Art3D: Training-Free 3D Generation from Flat-Colored Illustration
    Xiaoyan Cong (Brown University); Jiayi Shen (Brown University); Zekun Li (Brown University); Rao Fu (Brown University); Tao Lu (Brown University); Srinath Sridhar (Brown University) [https://joy-jy11.github.io/]
  8. Is Your Text-to-Image Model Robust to Caption Noise?
    Weichen Yu (University of Chinese Academy of Sciences); Ziyang Yang (ByteDance); Shanchuan Lin (ByteDance); Qi Zhao (ByteDance); Jianyi Wang (ByteDance); Liangke Gui (ByteDance); Matt Fredrikson (CMU); Lu Jiang (ByteDance)
  9. Training-Free Sketch-Guided Diffusion with Latent Optimization
    Sandra Zhang Ding (The University of Tokyo); Kiyoharu AIZAWA (The University of Tokyo); Jiafeng MAO (The University of Tokyo)
  10. InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On
    Meng Han (Amazon.com); Shuwen Qiu (UCLA); Qi Li (Amazon.com); Xingzi Xu (Duke University); Kavosh Asadi (Amazon.com); Karim Bouyarmane (Amazon.com)
  11. Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
    Ketan Suhaas Saichandran (Boston University); Xavier Thomas (Boston University); Prakhar Kaushik (Johns Hopkins University); Deepti Ghadiyaram (Boston University)
  1. Enhancing Creative Generation on Stable Diffusion-based Models
    Jiyeon Han (Korea Advanced Institute of Science and Technology); Dahee Kwon (Korea Advanced Institute of Science and Technology); Gayoung Lee (NAVER AI Lab); Junho Kim (NAVER AI Lab); Jaesik Choi (Korea Advanced Institute of Science and Technology) — CVPR 2025
  2. NamedCurves: Learned Image Enhancement via Color Naming
    David Serrano-Lozano (Computer Vision Center); Luis Herranz (Universidad Autónoma de Madrid); Michael S. Brown (York University); Javier Vazquez-Corral (Computer Vision Center) — ECCV 2024
  3. ScribbleLight: Single Image Indoor Relighting with Scribbles
    Jun Myeong Choi (University of North Carolina at Chapel Hill); Annie Wang (University of North Carolina at Chapel Hill); Pieter Peers (College of William & Mary); Anand Bhattad (Toyota Technological Institute at Chicago); Roni Sengupta (University of North Carolina at Chapel Hill) — CVPR 2025
  4. 4K4DGen: Panoramic 4D Generation at 4K Resolution
    Renjie Li (Texas A&M University); Bangbang Yang (ByteDance); Zhiwen Fan (The University of Texas at Austin); Dejia Xu (The University of Texas at Austin); Tingting Shen (XMU); Xuanyang Zhang (StepFun AI); Shijie Zhou (UCLA); Zeming Li (ByteDance); Achuta Kadambi (UCLA); Zhangyang Wang ( The University of Texas at Austin); Zhengzhong Tu (Texas A&M University); Panwang Pan (ByteDance) — ICLR 2025
  5. LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
    Xiaoyan Xing (University of Amsterdam); Konrad Groh (Bosch); Sezer Karaoglu (University of Amsterdam); Theo Gevers (University of Amsterdam); Anand Bhattad (Toyota Technological Institute at Chicago) — CVPR 2025
  6. Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
    Lee Chae-Yeon (POSTECH); Oh Hyun-Bin (POSTECH); Han EunGi (POSTECH); Kim Sung-Bin (POSTECH); Suekyeong Nam (KRAFTON); Tae-Hyun Oh (KAIST) — CVPR 2025
  7. T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
    Kaiyue Sun (The University of Hong Kong) [https://t2v-compbench-2025.github.io/] — CVPR 2025
  8. MixerMDM: Learnable Composition of Human Motion Diffusion Models
    Pablo Ruiz Ponce (University of Alicante) — CVPR 2025
  9. Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
    YU Yuan (Purdue University); Xijun Wang (Purdue University); Yichen Sheng (Nvidia); Prateek Chennuri (Purdue University); Xingguang Zhang (Purdue University); Stanley Chan (Purdue University) [https://generative-photography.github.io/project/] — CVPR 2025


Dall-E 2 (OpenAI, 2022), Imagen (Google, 2022), GauGAN2 (NVIDIA, 2021).

Previous Workshops (including session videos)