AI for Content Creation Workshop

@ CVPR 2024

Mon 17th June 2024 — 9am PDT
Seattle Convention Center — Summit 342



Summary

The AI for Content Creation (AI4CC) workshop at CVPR brings together researchers in computer vision, machine learning, and AI. Content creation is required for simulation and training data generation, media like photography and videography, virtual reality and gaming, art and design, and documents and advertising (to name just a few application domains). Recent progress in machine learning, deep learning, and AI techniques has allowed us to turn hours of manual, painstaking content creation work into minutes or seconds of automated or interactive work. For instance, generative adversarial networks (GANs) can produce photorealistic images of 2D and 3D items such as humans, landscapes, interior scenes, virtual environments, or even industrial designs. Neural networks can super-resolve and super-slomo videos, interpolate between photos with intermediate novel views and even extrapolate, and transfer styles to convincingly render and reinterpret content. In addition to creating awe-inspiring artistic images, these offer unique opportunities for generating additional and more diverse training data. Learned priors can also be combined with explicit appearance and geometric constraints, perceptual understanding, or even functional and semantic constraints of objects.

AI for content creation lies at the intersection of the graphics, the computer vision, and the design community. However, researchers and professionals in these fields may not be aware of its full potential and inner workings. As such, the workshop is comprised of two parts: techniques for content creation and applications for content creation. The workshop has three goals:

  1. To cover introductory concepts to help interested researchers from other fields start in this exciting area.
  2. To present success stories to show how deep learning can be used for content creation.
  3. To discuss pain points that designers face using content creation tools.

More broadly, we hope that the workshop will serve as a forum to discuss the latest topics in content creation and the challenges that vision and learning researchers can help solve.

Welcome! -
Deqing Sun (Google)
Lingjie Liu (University of Pennsylvania)
Yuanzhen Li (Google)
Sergey Tulyakov (Snap)
Huiwen Chang (OpenAI)
Lu Jiang (ByteDance)
Yijun Li (Adobe)
Jun-Yan Zhu (Carnegie Mellon University)
James Tompkin (Brown University)




Dall-E 2 (OpenAI, 2022), SuperSlomo (NVIDIA, 2018), GauGAN2 (NVIDIA, 2021), Imagen (Google, 2022).

2024


Awards


2024 Schedule and Video Recording

Click ▶ to jump to each talk!

Morning session:
Time PDT
09:00 Welcome and introductions 👋
09:10 Tali Dekel (Weizmann Institute)
09:40 Noah Snavely (Cornell)
10:10 Coffee break
10:20 Tim Brooks (OpenAI) — Sora
10:50 Diyi Yang (Stanford)
11:20 Poster session 1 - Arch Building Exhibit Hall 4E #298-322
  1. LocInv: Localization-aware Inversion for Text-Guided Image Editing
    Chuanming Tang, Kai Wang, Fei Yang, Joost van de Weijer [https://github.com/wangkai930418/DPL]
  2. ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer
    Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seo [https://gh-bumsookim.github.io/ToonAging/]
  3. Customize Your Own Paired Data via Few-shot Way
    Jinshu Chen, Bingchuan Li, Miao Hua, XU Panpan, Qian HE
  4. NeRFFaceSpeech: One-shot Audio-diven 3D Talking Head Synthesis via Generative Prior
    Gihoon Kim, Kwanggyoon Seo, Sihun Cha, Junyong Noh [https://rlgnswk.github.io/NeRFFaceSpeech_ProjectPage/]
  5. FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
    Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric Wang
  6. VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
    Yumeng Li, William H. Beluch, Margret Keuper, Dan Zhang, Anna Khoreva [https://yumengli007.github.io/VSTAR/]
  7. LOVECon: Text-driven Training-free Long Video Editing with ControlNet
    Zhenyi Liao, Zhijie Deng [https://github.com/zhijie-group/LOVECon]
  8. The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective
    Andrew Shin, Yusuke Mori, Kunitake Kaneko
  9. Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap
    Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang [https://arxiv.org/abs/2307.10584]
  10. CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
    Andrew Marmon, Grant Schindler, Jose Lezama, Dan Kondratyuk, Bryan Seybold, Irfan Essa
  11. ICE-G: Image Conditional Editing of 3D Gaussian Splats
    Vishnu Jaganathan, Hanyun Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira [https://ice-gaussian.github.io/]
  12. TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image
    Chengcheng Feng, mu he, Xiaofang Zhao, haojie yin, Qiuyu Tian, Tang Hongwei, Xing Qiang Wei
  13. My Body My Choice: Human-Centric Full-Body Anonymization
    Umur A. Ciftci, Ali Kemal Tanriverdi, Ilke Demir
  1. Automated Virtual Product Placement and Assessment in Images using Diffusion Models
    Mohammad Mahmudul Alam, Negin Sokhandan, Emmett D. Goodman — CVIV24
  2. As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
    Seungwoo Yoo, Kunho Kim, Vladimir Kim, Minhyuk Sung [https://as-plausible-as-possible.github.io/] — CVPR24
  3. Posterior Distillation Sampling
    Juil Koo, Chanho Park, Minhyuk Sung [https://posterior-distillation-sampling.github.io/] — CVPR24
  4. Seamless Human Motion Composition with Blended Positional Encodings
    German Barquero, Sergio Escalera, Cristina Palmero [https://barquerogerman.github.io/FlowMDM/] — CVPR24
  5. SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
    Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han SJTU Pan, Zhongliang Jing [https://github.com/Xiaojiu-z/SSR_Encoder] — CVPR24
  6. DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
    Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera [https://arxiv.org/pdf/2312.16256] — CVPR24
  7. ElasticDiffusion: Training-free Arbitrary Size Image Generation
    Moayed Haji-Ali, Guha Balakrishnan, Vicente Ordonez [https://elasticdiffusion.github.io/] — CVPR24
12:30 Lunch break 🥪


Afternoon session:
Time PDT
13:30 Oral session + best paper announcement
14:00 Aaron Hertzmann (Adobe)
14:30 Ziwei Liu (Nanyang Technological University)
15:00 Coffee break
15:15 Yuge Jimmy Shi and Jack Parker-Holder (DeepMind) — Genie
15:45 Robin Rombach (Stability AI)
16:15 Panel discussion — Surviving (and Thriving) in GenAI Industry 🗣️
17:15 Poster session 2 - Arch Building Exhibit Hall 4E #298-322
  1. Visual Style Prompting with Swapping Self-Attention
    Jaeseok Jeong, Junho Kim, Youngjung Uh, Yunjey Choi, Gayoung Lee
  2. ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
    Ying Jin, Pengyang Ling, Xiaoyi Dong, Pan Zhang, Dahua Lin, Jiaqi Wang
  3. Pix2Gif: Motion-Guided Diffusion for GIF Generation
    Hitesh Kandala, Jianfeng Gao, Jianwei Yang
  4. LoopDraw: a Loop-Based Autoregressive Model for Shape Synthesis and Editing
    Nam Anh Dinh, Haochen Wang, Greg Shakhnarovich, Rana Hanocka [https://threedle.github.io/LoopDraw]
  5. Text Prompting for Multi-Concept Video Customization by Autoregressive Generation
    Divya Kothandaraman, Kihyuk Sohn, Ruben Villegas, Paul Voigtlaender, Dinesh Manocha, Mohammad Babaeizadeh [https://github.com/divyakraman/MultiConceptVideo2024]
  6. Towards Safer AI Content Creation by Immunizing Text-to-image Models
    Amber Yijia Zheng, Raymond A. Yeh [https://arxiv.org/abs/2311.18815]
  7. InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning
    Tiancheng Li, Jinxiu Liu, Chen Huajun, Qi Liu
  8. CustomText: Customized Textual Image Generation using Diffusion Models
    Shubham Paliwal, Arushi Jain, Monika Sharma, Vikram Jamwal, Lovekesh Vig
  9. The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP
    Hidir Yesiltepe, Yusuf Dalva, Pinar Yanardag
  10. Temporally Consistent Object Editing in Videos using Extended Attention
    AmirHossein Zamani, Amir Aghdam, Tiberiu Popa, Eugene Belilovsky
  11. ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing
    Alec Helbling, Seongmin Lee, Duen Horng Chau [https://github.com/poloclub/ClickDiffusion/tree/main]
  12. EraseDraw: Learning to Insert Objects by Erasing Them from Images
    Alper Canberk, Maksym Bondarenko, Ege Ozguroglu, Ruoshi Liu, Carl Vondrick [https://erasedraw.cs.columbia.edu/]
  13. LEAST: Local text-conditioned image style transfer
    Silky Singh, Surgan Jandial, Simra Shahid, Abhinav Java [https://github.com/silky1708/local-style-transfer]
  1. VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
    Yuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Dechau Tang [https://videoswap.github.io/] — CVPR24
  2. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
    Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou [https://showlab.github.io/magicanimate/] — CVPR24
  3. Towards a Perceptual Evaluation Framework for Lighting Estimation
    Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Javier Vazquez-Corral, Jean-Francois Lalonde [https://github.com/JustineGiroux/Lightsome] — CVPR24
  4. EverLight: Indoor-Outdoor Editable HDR Lighting Estimation
    Mohammad Reza Karimi Dastjerdi, Jonathan Eisenmann, Yannick Hold-Geoffroy, Jean-Francois Lalonde [https://lvsn.github.io/everlight/] — ICCV23
  5. Spatial Steerability of GANs via Self-Supervision from Discriminator
    Jianyuan Wang, Lalit Bhagat, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou [https://genforce.github.io/SpatialGAN/] — CVPR22
  6. Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
    Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll [https://kim-youwang.github.io/paint-it] — CVPR24
  7. ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
    Jeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang Moo Yi [https://ubc-vision.github.io/vivid123/] — CVPR24

Previous Workshops (including session videos)