diffusers
7d5767db - [LTX-2.3] Add multi-modal guidance via custom guider with native transformer kwargs

Commit

4 days ago

[LTX-2.3] Add multi-modal guidance via custom guider with native transformer kwargs Add LTX2MultiModalGuidance guider that handles all 4 guidance types for LTX-2.3 audiovisual generation (CFG, STG, modality isolation, rescale) with separate video/audio scales. The guider passes per-batch transformer kwargs via _model_kwargs, keeping the denoise loop fully generic. Key changes: - New LTX2MultiModalGuidance guider (inherits BaseGuidance, not SkipLayerGuidance) with native transformer kwargs (spatio_temporal_guidance_blocks, isolate_modalities) instead of hooks - Denoise loop is now generic — no model-specific guidance code, just runs guider passes and calls guider() for the combination formula - Separate video/audio guidance scales (video cfg=3.0, audio cfg=7.0 by default) - Audio sample rate exposed from audio decoder for correct MP4 encoding - Connector processes positive/negative prompts separately (batch=1 each) to match reference — batched processing produced different self-attention results - Removed unused guiders from LTX2TextEncoderStep and LTX2ConnectorStep - Fixed SkipLayerGuidance._is_slg_enabled step range (< to <=) - Fixed sigma tensor device placement for GPU models - Updated parity-testing skill with cross-contamination rules and new pitfalls Verified pixel-identical with reference at 960x544x241, 30 steps, full guidance (CFG=3.0, STG=1.0, blocks=[28], modality=3.0, rescale=0.7, audio_cfg=7.0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

References

ltx23-parity-fixes

#13360 - [WIP]LTX modular + 1:1 match + improve agent debugging skills

Author

yiyi@huggingface.co

Parents

7a215eca

diffusers 7d5767db - [LTX-2.3] Add multi-modal guidance via custom guider with native transformer kwargs

diffusers
7d5767db - [LTX-2.3] Add multi-modal guidance via custom guider with native transformer kwargs