diffusers
095c0c4b - Fix LTX-2.3 pipeline quality issues and add parity notes

Commit
23 days ago
Fix LTX-2.3 pipeline quality issues and add parity notes Essential fixes (cause quality degradation without them): - Generate latents in model dtype (self.transformer.dtype) instead of float32. Float32 noise introduces ~1.5e-02 quantization error vs bfloat16 that compounds over 30 denoising steps via 1/sigma amplification, producing washed-out output. - Use fixed max_image_seq_len (4096) for sigma schedule shift instead of actual video_sequence_length. The reference uses a fixed constant (MAX_SHIFT_ANCHOR=4096); passing the real sequence length (e.g. 6144) produces incorrect sigma schedules. Seed-level parity (not quality, but needed for reproducibility): - Generate noise directly in packed [B, S, D] shape to match reference which patchifies before noise generation. Different tensor shapes produce different RNG draws for the same seed. Notes only (no behavioral change): - Add NOTE about x0-space vs velocity-space guidance rounding difference - Add NOTE about denormalize-after-noise ordering in reference - Add commented-out code showing where denormalize would move to match reference ordering (for future investigation) Not part of this commit: the upstream dg845/LTX-2.3-Diffusers VAE config has upsample_residual=[true,true,true,true] but should be [false,...]. Fix submitted as PR#1 to that repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
yiyi@huggingface.co
Parents
Loading