Fix QwenImage txt_seq_lens handling (#12702)
* Fix QwenImage txt_seq_lens handling
* formatting
* formatting
* remove txt_seq_lens and use bool mask
* use compute_text_seq_len_from_mask
* add seq_lens to dispatch_attention_fn
* use joint_seq_lens
* remove unused index_block
* WIP: Remove seq_lens parameter and use mask-based approach
- Remove seq_lens parameter from dispatch_attention_fn
- Update varlen backends to extract seqlens from masks
- Update QwenImage to pass 2D joint_attention_mask
- Fix native backend to handle 2D boolean masks
- Fix sage_varlen seqlens_q to match seqlens_k for self-attention
Note: sage_varlen still producing black images, needs further investigation
* fix formatting
* undo sage changes
* xformers support
* hub fix
* fix torch compile issues
* fix tests
* use _prepare_attn_mask_native
* proper deprecation notice
* add deprecate to txt_seq_lens
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Only create the mask if there's actual padding
* fix order of docstrings
* Adds performance benchmarks and optimization details for QwenImage
Enhances documentation with comprehensive performance insights for QwenImage pipeline:
* rope_text_seq_len = text_seq_len
* rename to max_txt_seq_len
* removed deprecated args
* undo unrelated change
* Updates QwenImage performance documentation
Removes detailed attention backend benchmarks and simplifies torch.compile performance description
Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU
Streamlines the documentation to provide more concise and actionable performance insights
* Updates deprecation warnings for txt_seq_lens parameter
Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models
Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter
* fix compile
* formatting
* fix compile tests
* rename helper
* remove duplicate
* smaller values
* removed
* use torch.cond for torch compile
* Construct joint attention mask once
* test different backends
* construct joint attention mask once to avoid reconstructing in every block
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* formatting
* raising an error from the EditPlus pipeline when batch_size > 1
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: cdutr <dutra_carlos@hotmail.com>