diffusers
ebaa1871 - Eliminate GPU sync overhead and CPU→GPU transfers across LTX2 pipeline (#13564)

Commit
31 days ago
Eliminate GPU sync overhead and CPU→GPU transfers across LTX2 pipeline (#13564) * Remove unnecessary CUDA synchronization points and avoid CPU→GPU tensor creation across the LTX2 pipeline, transformer, scheduler, and connector logic. - Add set_begin_index(0) to schedulers to eliminate DtoH sync in _init_step_index - Replace torch.tensor(..., device=...) with on-device tensor construction for decode scaling - Move RoPE-related tensor creation to GPU to avoid memcpy overhead - Refactor connector padding logic using vectorized masking instead of list-based ops * Apply style fixes * Revert low-impact CUDA synchronization changes and remove redundant `hasattr` check --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Parents
Loading