视频生成【文章汇总】SVD, Sora, Latte, VideoCrafter12, DiT...
创始人
2024-12-01 13:05:23
0

视频生成【文章汇总】SVD, Sora, Latte, VideoCrafter12, DiT...

    • 数据集
    • 指标
  • 【arXiv 2024】MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
  • 【CVPR 2024】VBench : Comprehensive Benchmark Suite for Video Generative Models
  • 【arxiv 2024】T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
  • 【arxiv 2024】Latte: Latent Diffusion Transformer for Video Generation
  • 【arxiv 2024】xxx
  • 【arxiv 2024】xxx
  • 【arxiv 2024】xxx
  • 【arxiv 2024】xxx

数据集

指标

【arXiv 2024】MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

Authors: Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan

Abstract Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. We curate MiraData from diverse, manually selected sources and meticulously process the data to obtain semantically consistent clips. GPT-4V is employed to annotate structured captions, providing detailed descriptions from four different perspectives along with a summarized dense caption. To better assess temporal consistency and motion intensity in video generation, we introduce MiraBench, which enhances existing benchmarks by adding 3D consistency and tracking-based motion strength metrics. MiraBench includes 150 evaluation prompts and 17 metrics covering temporal consistency, motion strength, 3D consistency, visual quality, text-video alignment, and distribution similarity. To demonstrate the utility and effectiveness of MiraData, we conduct experiments using our DiT-based video generation model, MiraDiT. The experimental results on MiraBench demonstrate the superiority of MiraData, especially in motion strength.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

【CVPR 2024】VBench : Comprehensive Benchmark Suite for Video Generative Models

Authors: Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

Abstract Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has three appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读】在这里插入图片描述
在这里插入图片描述

【arxiv 2024】T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Authors: Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu

Abstract Text-to-video (T2V) generation models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation. T2V-CompBench encompasses diverse aspects of compositionality, including consistent attribute binding, dynamic attribute binding, spatial relationships, motion binding, action binding, object interactions, and generative numeracy. We further carefully design evaluation metrics of MLLM-based metrics, detection-based metrics, and tracking-based metrics, which can better reflect the compositional text-to-video generation quality of seven proposed categories with 700 text prompts. The effectiveness of the proposed metrics is verified by correlation with human evaluations. We also benchmark various text-to-video generative models and conduct in-depth analysis across different models and different compositional categories. We find that compositional text-to-video generation is highly challenging for current models, and we hope that our attempt will shed light on future research in this direction.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读】
在这里插入图片描述
在这里插入图片描述

【arxiv 2024】Latte: Latent Diffusion Transformer for Video Generation

Authors: Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao

Abstract We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

相关内容

热门资讯

透视中牌率!wepoker永久... 透视中牌率!wepoker永久免费脚本(透视)开挂辅助app(一直存在有挂)-哔哩哔哩1、完成wep...
此事引发广泛关注!陕麻圈辅助开... 此事引发广泛关注!陕麻圈辅助开挂软件,福建天天开心辅助工具视频,2025新版教程(本来是真的挂)-哔...
透视私人房!werplan怎么... 透视私人房!werplan怎么辅助(透视)开挂辅助app(确实真的是有挂)-哔哩哔哩1、进入到wer...
透视能赢!wepoker透视a... 透视能赢!wepoker透视app下载,德州局透视脚本免费版下载手机版,窍要教程(好像是真的挂)-哔...
据公告内容!宜宾小闲辅助,陕麻... 据公告内容!宜宾小闲辅助,陕麻圈有辅助吗,2025新版教程(其实是真的挂)-哔哩哔哩陕麻圈有辅助吗辅...
透视中牌率!pokemmo手机... 透视中牌率!pokemmo手机版透视脚本(透视)开挂辅助神器(真是真的有挂)-哔哩哔哩暗藏猫腻,小编...
透视了解!aapoker透视方... 透视了解!aapoker透视方法,wepoker可以透视码,手筋教程(其实是真的挂)-哔哩哔哩1、点...
透视代打!wpk透视辅助方法(... 透视代打!wpk透视辅助方法(透视)开挂辅助软件(真是是真的挂)-哔哩哔哩;透视代打!wpk透视辅助...
推出新举措!微信小程序雀神挂件... 推出新举措!微信小程序雀神挂件下载,起点休闲辅助,必赢方法(果然真的是有挂)-哔哩哔哩1、每一步都需...
透视必备!hhpoker必备开... 透视必备!hhpoker必备开挂(透视)开挂辅助方法(切实有挂)-哔哩哔哩1、这是跨平台的hhpok...