diffusers
fc775924 - feat: Add Motif-Video model and pipelines (#13551)

Commit

49 days ago

feat: Add Motif-Video model and pipelines (#13551) * feat: add Motif Video T2V and I2V pipelines with AdaptiveProjectedGuidance support Add complete Motif Video implementation to diffusers: New Models: - Add MotifVideoTransformer3DModel with T5Gemma2Encoder for multimodal conditioning - Supports text-to-video and image-to-video generation with vision tower integration New Pipelines: - Add MotifVideoPipeline for text-to-video generation - Default resolution: 736x1280, 121 frames, 25 fps - Supports classifier-free guidance and AdaptiveProjectedGuidance - Add MotifVideoImage2VideoPipeline for image-to-video generation - First frame conditioning with vision encoder - Same defaults as T2V pipeline Enhanced Guidance: - Update AdaptiveProjectedGuidance with normalization_dims parameter - Support "spatial" normalization for 5D tensors (per-frame spatial normalization) - Support custom dimension lists for flexible normalization - Update AdaptiveProjectedMixGuidance with same parameter Documentation & Tests: - Add comprehensive API documentation for transformer and pipelines - Add test suites for both T2V and I2V pipelines - Register all new components in __init__ files - Add dummy objects for torch and transformers backends Total: 18 files changed, 3416 insertions(+), 2 deletions(-) * Remove linear quadratic * Remove musicldm * Update docstring * Address vision_encoder comment * Add copy source in I2V pippeline * Refactor _get_prompt_embeds Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io> * Fix a typo * Refactor MotifVideo transformer to use diffusers Attention conventions - Use default Attention class with custom MotifVideoAttnProcessor2_0 - Inline cross-attention in transformer blocks - Use dispatch_attention_fn for backend support - Inherit AttentionMixin for attn_processors/set_attn_processor - Move TransformerBlockRegistry to _helpers.py - Add _repeated_blocks for regional compilation * Use base classes for scheduler and guider * Implement MotifVideoAttention * Update style and quality * Fix a typo * Fix a typo * Fix a typo * Update year * Address rope dtype * Update docstring and remove frame_rate * Address unused sigmas * Add available processors * Address copy from comment * Remove torch.no_grad() * Remove use_attention_mask * Address inline cross-attention * Address compute dtype * Remove unused variables * Merge main APG into this branch and update documentation * Refactor cross attention processor * Remove unused timestep * Inline create_attention_mask * Make guider required * Address encode_prompt comment * Address preprocess_video comment * Use T5Gemma2Encoder in test cases * Address None feature_extractor * Address output type * Renable skipped tests * Update style and quality * Generate standard transformer test case * Add model test case * Remove guider in documentation * Implement cross_attn layer * Remove prepare_negative_prompt * Address latent is None * Clean up feature_extractor * Fix prepare_latents * Remove transformers assertion * Fix style and quality * Fix python utils/check_copies.py --fix_and_overwrite python utils/check_dummies.py --fix_and_overwrite outputs * Add dropout rate to text config * Skip tests requiring guidance_scale * Fix encode_prompt in test cases * Fix test_cpu_offload_forward_pass_twice * Update tests/pipelines/motif_video/test_motif_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/motif_video/test_motif_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/motif_video/test_motif_video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/motif_video/test_motif_video_image2video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Address test_attention_slicing_forward_pass comment * Update tests/pipelines/motif_video/test_motif_video_image2video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/motif_video/test_motif_video_image2video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update tests/pipelines/motif_video/test_motif_video_image2video.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Skip I2V test cases * Fix style and quality * Add docs to toctree * Fix docs location in toctree and add link in overview * Inline gradient checkpointing * Add _keep_in_fp32_modules for timestep_embedder * Address num_decoder_layers comment * Address guider is not None comment * Remove _keep_in_fp32_modules * Address parameter_dtype comment --------- Co-authored-by: Ken Cheung <ken.cheung@motiftech.io> Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io> Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

References

#13551 - feat: Add Motif-Video model and pipelines

Author

waitingcheung

Parents

8f14cdef

diffusers fc775924 - feat: Add Motif-Video model and pipelines (#13551)

diffusers
fc775924 - feat: Add Motif-Video model and pipelines (#13551)