DeepSpeed
5ca819bc - Add optional torchembed RoPE backend to apply_rotary_pos_emb (#8052)

Commit
3 days ago
Add optional torchembed RoPE backend to apply_rotary_pos_emb (#8052) Adds `torchembed` as an optional fused RoPE backend for `deepspeed.sequence.layer.apply_rotary_pos_emb()`, following the same pattern used in transformers and vLLM. ## Changes - **`deepspeed/sequence/layer.py`**: Add `try/except ImportError` guard for `torchembed._triton.fused_rope_forward`. When `torchembed` is installed, the tensor is on CUDA, and `rotary_dim` is even, the function dispatches to the fused triton kernel instead of the PyTorch reference path. - **`setup.py`**: Add `torchembed` extras key (`pip install deepspeed[torchembed]`). - **`tests/unit/sequence/test_apply_rotary_pos_emb.py`**: Numerical correctness vs PyTorch reference across seq_len (1/17/128), dim (32/64/128), and various rotary_dim. Gradient flow test. ## Implementation details The torchembed kernel processes `(*leading, seq_len, dim)` tensors with `RotaryEmbedding(use_fused=True)`, applying Neox-style RoPE via triton. The helper reshapes arbitrary leading dims, calls the kernel, and restores the original shape — transparent to callers. ## Testing ```bash pytest tests/unit/sequence/test_apply_rotary_pos_emb.py -v ``` --------- Signed-off-by: py-ai-dev <py.oss.ml@gmail.com> Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>
Author
Parents
Loading