transformers
52f2268b - Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#45414)

Commit
18 days ago
Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#45414) * Fix `IndexError: pop from an empty deque` under DeepSpeed ZeRO-3 When `kernels` is installed, `@use_kernelized_func` attaches a `rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to be invoked during forward. The model's forward still calls the plain Python `apply_rotary_pos_emb`, so `rotary_fn` is never executed and the trace desynchronizes, raising `IndexError: pop from an empty deque` on the second forward. Skip attaching the kernelized submodule when ZeRO-3 is enabled; users running under ZeRO-3 fall back to the Python implementation, which is what they were getting before #41147. Fixes #45137 * Add dates to new model cards to satisfy check-repository-consistency
Author
Parents
Loading