AI for Content Creation Workshop

June 12th @ CVPR 2025

Music City Center, Nashville, TN, USA



Summary

The AI for Content Creation (AI4CC) workshop at CVPR brings together researchers in computer vision, machine learning, and AI. Content creation is required for simulation and training data generation, media like photography and videography, virtual reality and gaming, art and design, and documents and advertising (to name just a few application domains). Recent progress in machine learning, deep learning, and AI techniques has allowed us to turn hours of manual, painstaking content creation work into minutes or seconds of automated or interactive work. For instance, generative adversarial networks (GANs) can produce photorealistic images of 2D and 3D items such as humans, landscapes, interior scenes, virtual environments, or even industrial designs. Neural networks can super-resolve and super-slomo videos, interpolate between photos with intermediate novel views and even extrapolate, and transfer styles to convincingly render and reinterpret content. In addition to creating awe-inspiring artistic images, these offer unique opportunities for generating additional and more diverse training data. Learned priors can also be combined with explicit appearance and geometric constraints, perceptual understanding, or even functional and semantic constraints of objects.

AI for content creation lies at the intersection of the graphics, the computer vision, and the design community. However, researchers and professionals in these fields may not be aware of its full potential and inner workings. As such, the workshop is comprised of two parts: techniques for content creation and applications for content creation. The workshop has three goals:

  1. To cover introductory concepts to help interested researchers from other fields start in this exciting area.
  2. To present success stories to show how deep learning can be used for content creation.
  3. To discuss pain points that designers face using content creation tools.

More broadly, we hope that the workshop will serve as a forum to discuss the latest topics in content creation and the challenges that vision and learning researchers can help solve.

Welcome! -
Deqing Sun (Google)
Lingjie Liu (University of Pennsylvania)
Krishna Kumar Singh (Adobe)
Fitsum Reda (NVIDIA)
Lu Jiang (ByteDance)
Jun-Yan Zhu (Carnegie Mellon University)
James Tompkin (Brown University)



Cat4D (Google, 2024), AssetGen (Meta, 2024), DreamFusion (Google, 2022).

2025 Tentative Speakers



Firefly Video (Adobe, 2025), Genie 2 (DeepMind, 2024), SORA (OpenAI, 2024).

2025 Schedule

Morning session:
Time CDT
09:00 Welcome and introductions 👋
09:10 Speaker 1
09:40 Speaker 2
10:10 Coffee break
10:20 Speaker 3
10:50 Speaker 4
11:20 Poster session 1
  1. Parallel Rescaling: Rebalancing Consistency Guidance for Personalized Diffusion Models
    JungWoo Chae (Nexon Korea); Jiyoon Kim (LGCNS); Sangheum Hwang (Seoul National University of Science and Technology)
  2. EOPose : Exemplar-based object reposing using Generalized Pose Correspondences
    Sarthak Mehrotra (Indian Institute of Technology Bombay); Rishabh Jain (Adobe); Mayur Hemani (Adobe); Balaji Krishnamurthy (Adobe); Mausoom Sarkar (Adobe)
  3. Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLM
    Maximilian Mews (HU Berlin); Ansar Aynetdinov (HU Berlin); Vivian Schiller (RWTH Aachen); Peter Eisert (HU Berlin); Alan Akbik (HU Berlin)
  4. Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling
    Junhong Lee (POSTECH); Seungwook Kim (POSTECH, bytedance); Minsu Cho (POSTECH)
  5. MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
    Sihyun Yu (KAIST); Meera Hahn (Google DeepMind); Dan Kondratyuk (Luma AI); Jinwoo Shin (KAIST); Agrim Gupta (Google DeepMind); José Lezama (Google DeepMind); Irfan Essa (Google DeepMind); David Ross (Google DeepMind); Jonathan Huang (Scaled Foundations)
  6. Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality
    Pramook Khungurn (pixiv, Inc.); Phonphrm Thawatdamrongkit (VISTEC); Sukit Seripanitkarn (VISTEC); Supasorn Suwajanakorn (VISTEC)
  7. Generating Animated Layouts as Structured Text Representations
    Yeonsang Shin (Seoul National University); Jihwan Kim (Seoul National University); Yumin Song (Seoul National University); Kyungseung Lee (SK telecom); Hyunhee Chung (SK telecom); Taeyoung Na (SK telecom)
  8. CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
    Dejia Xu (University of Texas at Austin); Weili Nie (NVIDIA); Chao Liu (NVIDIA); Sifei Liu (NVIDIA); Jan Kautz (NVIDIA); Zhangyang Wang (University of Texas at Austin); Arash Vahdat (NVIDIA ) [https://ir1d.github.io/CamCo/]
  9. LiftRefine: Progressively Refined View Synthesis from 3D Lifting with Volume-Triplane Representations
    Tung Do (Movian Research); Thuan Nguyen (MBZUAI); Anh Tran (Movian Research); Rang Nguyen (VinAI Research); Binh-Son Hua (Trinity College Dublin)
  10. Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
    Liu He (Purdue University); Yizhi Song (Purdue University); Hejun Huang (University of Michigan); Pinxin Liu (University of Rochester); Yunlong Tang (University of Rochester); Daniel Aliaga (Purdue University); Xin Zhou (Baidu USA)
  11. VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment
    Wenyan Cong (University of Texas at Austin); Hanqing Zhu (University of Texas at Austin); Kevin Wang (University of Texas at Austin); Jiahui Lei (University of Pennsylvania); Colton Stearns (Stanford University); Yuanhao Cai (Johns Hopkins University); Dilin Wang (Meta); Rakesh Ranjan (Meta); Matt Feiszli (Meta); Leonidas Guibas (Stanford University); Atlas Wang (University of Texas at Austin); Weiyao Wang (Meta); Zhiwen Fan (University of Texas at Austin) [https://videolifter.github.io/]
  12. HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
    Jinbin Bai (National University of Singapore); Wei Chow (National University of Singapore); Ling Yang (Peking University); Xiangtai Li (Skywork AI); Juncheng Li (National University of Singapore); Hanwang Zhang (Nanyang Technological University ); Shuicheng Yan (National University of Singapore) [https://github.com/viiika/HumanEdit]
  13. DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description
    Adrienne Deganutti (University of Surrey); Simon Hadfield (University of Surrey); Andrew Gilbert (University of Surrey) [https://andrewjohngilbert.github.io/DANTE-AD/]
  1. Stable Flow: Vital Layers for Training-Free Image Editing
    Omri Avrahami (The Hebrew University of Jerusalem); Or Patashnik (Tel Aviv University); Ohad Fried (Reichman University); Egor Nemchinov (Snap); Kfir Aberman (Snap); Dani Lischinski (The Hebrew University of Jerusalem); Daniel Cohen-Or (Tel Aviv University) — CVPR 2025
  2. HyperGS: Hyperspectral 3D Gaussian Splatting
    Christopher Thirgood (University of Surrey); Oscar Mendez (University of Surrey); Erin Ling (University of Surrey); Jon Storey (i3D Robotics); Simon Hadfield (University of Surrey) — CVPR 2025
  3. Tiled Diffusion
    Or Madar (Reichman University); Ohad Fried (Reichman University) [https://madaror.github.io/tiled-diffusion.github.io/] — CVPR 2025
  4. DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
    Shwetha Ram (Amazon) — WACV 2025
  5. VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
    Juil Koo (KAIST); Paul Guerrero (Adobe Research); Chun-Hao Huang (Adobe Research); Duygu Ceylan (Adobe Research); Minhyuk Sung (KAIST) — CVPR 2025
  6. HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks
    Maria Pilligua Costa (Computer Vision Center (CVC), Universitat Autònoma Barcelona (UAB)); Danna Xue (Northwestern Polytechnical University, Computer Vision Center (CVC), Universitat Autònoma Barcelona (UAB)); Javier Vazquez-Corral (Computer Vision Center (CVC), Universitat Autònoma Barcelona (UAB)) — CVPR 2025
  7. Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation
    Rajeev Goel (Arizona State University); Utkarsh Nath (Arizona State University); Eun Som Jeon (Seoul National University of Science and Technology); Kyle Min (Intel Labs); Changhoon Kim (Arizona State University); Pavan Turaga (Arizona State Univerisity) [https://moment-3d.github.io/] — WACV 2025
12:30 Lunch break 🥪


Afternoon session:
Time CDT
13:30 Oral session + best paper announcement
14:00 Speaker 5
14:30 Speaker 6
15:00 Coffee break
15:15 Late Breaking Speaker 7
15:45 Panel discussion
🗣️
16:45 Poster session 2
  1. Towards Flim-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks
    shiyu xia (AI Lab, Giant Network); Junjie Zheng (AI Lab, Giant Network); chaoyi wang (Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences); Zihao Chen (AI Lab, Giant Network); chaofan ding (AI Lab, Giant Network); Xiaohao Zhang (AI Lab, Giant Network); Xi Tao (AI Lab, Giant Network); Xiaoming He (School of life sciences, Fudan University); XINHAN DI (Deepearthgo)
  2. Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion
    Minseo Kim (KAIST); MINCHAN KWON (KAIST); dongyeun Lee (KAIST); Yunho Jeon (Hanbat University); junmo Kim (KAIST)
  3. Vectorized Region Based Brush Strokes for Artistic Rendering
    Jeripothula Prudviraj (TCS Research); Vikram Jamwal (TCS Research)
  4. GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting
    Anushka Agarwal (University of Massachusetts Amherst); Yusuf Hassan (University of Massachusetts Amherst); Talha Chafekar (University of Massachusetts Amherst)
  5. DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing
    Qi Li (Amazon); shuwen qiu (UCLA); Kee Kiat Koo (Amazon); Julien Han (UCLA); Karim Bouyarmane (Amazon)
  6. Is Concatenation Really All You Need? Efficient Concatenation-Based Pose Conditioning and Pose Control for Virtual Try On
    Qi Li (Amazon); Shuwen Qiu (UCLA); Kee Kiat Koo (Amazon); Julien Han (Amazon)
  7. Art3D: Training-Free 3D Generation from Flat-Colored Illustration
    Xiaoyan Cong (Brown University); Jiayi Shen (Brown University); Zekun Li (Brown University); Rao Fu (Brown University); Tao Lu (Brown University); Srinath Sridhar (Brown University)
  8. Is Your Text-to-Image Model Robust to Caption Noise?
    Weichen Yu (University of Chinese Academy of Sciences); Ziyang Yang (ByteDance); Shanchuan Lin (ByteDance); Qi Zhao (ByteDance); Jianyi Wang (ByteDance); Liangke Gui (ByteDance); Matt Fredrikson (CMU); Lu Jiang (ByteDance)
  9. Training-Free Sketch-Guided Diffusion with Latent Optimization
    Sandra Zhang Ding (The University of Tokyo); Kiyoharu AIZAWA (The University of Tokyo); Jiafeng MAO (The University of Tokyo)
  10. InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On
    Meng Han (Amazon.com); Shuwen Qiu (UCLA); Qi Li (Amazon.com); Xingzi Xu (Duke University); Kavosh Asadi (Amazon.com); Karim Bouyarmane (Amazon.com)
  11. Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
    Ketan Suhaas Saichandran (Boston University); Xavier Thomas (Boston University); Prakhar Kaushik (Johns Hopkins University); Deepti Ghadiyaram (Boston University)
  1. Enhancing Creative Generation on Stable Diffusion-based Models
    Jiyeon Han (Korea Advanced Institute of Science and Technology); Dahee Kwon (Korea Advanced Institute of Science and Technology); Gayoung Lee (NAVER AI Lab); Junho Kim (NAVER AI Lab); Jaesik Choi (Korea Advanced Institute of Science and Technology) — CVPR 2025
  2. NamedCurves: Learned Image Enhancement via Color Naming
    David Serrano-Lozano (Computer Vision Center); Luis Herranz (Universidad Autónoma de Madrid); Michael S. Brown (York University); Javier Vazquez-Corral (Computer Vision Center) — ECCV 2024
  3. ScribbleLight: Single Image Indoor Relighting with Scribbles
    Jun Myeong Choi (University of North Carolina at Chapel Hill); Annie Wang (University of North Carolina at Chapel Hill); Pieter Peers (College of William & Mary); Anand Bhattad (Toyota Technological Institute at Chicago); Roni Sengupta (University of North Carolina at Chapel Hill) — CVPR 2025
  4. 4K4DGen: Panoramic 4D Generation at 4K Resolution
    Renjie Li (Texas A&M University); Bangbang Yang (ByteDance); Zhiwen Fan (The University of Texas at Austin); Dejia Xu (The University of Texas at Austin); Tingting Shen (XMU); Xuanyang Zhang (StepFun AI); Shijie Zhou (UCLA); Zeming Li (ByteDance); Achuta Kadambi (UCLA); Zhangyang Wang ( The University of Texas at Austin); Zhengzhong Tu (Texas A&M University); Panwang Pan (ByteDance) — ICLR 2025
  5. LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
    Xiaoyan Xing (University of Amsterdam); Konrad Groh (Bosch); Sezer Karaoglu (University of Amsterdam); Theo Gevers (University of Amsterdam); Anand Bhattad (Toyota Technological Institute at Chicago) — CVPR 2025
  6. Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
    Lee Chae-Yeon (POSTECH); Oh Hyun-Bin (POSTECH); Han EunGi (POSTECH); Kim Sung-Bin (POSTECH); Suekyeong Nam (KRAFTON); Tae-Hyun Oh (KAIST) — CVPR 2025
  7. T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
    Kaiyue Sun (The University of Hong Kong) [https://t2v-compbench-2025.github.io/] — CVPR 2025
  8. MixerMDM: Learnable Composition of Human Motion Diffusion Models
    Pablo Ruiz Ponce (University of Alicante) — CVPR 2025
  9. Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
    YU Yuan (Purdue University); Xijun Wang (Purdue University); Yichen Sheng (Nvidia); Prateek Chennuri (Purdue University); Xingguang Zhang (Purdue University); Stanley Chan (Purdue University) [https://generative-photography.github.io/project/] — CVPR 2025


Dall-E 2 (OpenAI, 2022), Imagen (Google, 2022), GauGAN2 (NVIDIA, 2021).

Previous Workshops (including session videos)