Add DeepSpeed NVTX domain support (#7988)
## Summary
Addresses #7912.
This PR adds DeepSpeed-specific NVTX domain support for instrumentation
ranges while preserving the existing fallback behavior.
## Changes
- Add a `DeepSpeed` NVTX domain name for `instrument_w_nvtx`.
- Extend accelerator `range_push` / `range_pop` APIs with optional
`domain` and `category` arguments.
- Use the NVIDIA `nvtx` package domain API in the CUDA accelerator when
available.
- Fall back to `torch.cuda.nvtx` when the `nvtx` package is unavailable.
- Keep non-CUDA accelerator behavior unchanged by accepting and ignoring
the optional arguments.
- Add focused unit tests for domain instrumentation, CUDA domain usage,
and fallback behavior.
## Tests
### Compile check
```bash
PYTHONNOUSERSITE=1 /home/xdu/anaconda3/envs/simlingo/bin/python -m py_compile \
deepspeed/utils/nvtx.py \
accelerator/abstract_accelerator.py \
accelerator/cuda_accelerator.py \
accelerator/cpu_accelerator.py \
accelerator/hpu_accelerator.py \
accelerator/mlu_accelerator.py \
accelerator/mps_accelerator.py \
accelerator/npu_accelerator.py \
accelerator/sdaa_accelerator.py \
accelerator/xpu_accelerator.py \
tests/unit/utils/test_nvtx.py
````
Output:
```text
Passed with no output.
```
### Unit tests
```bash
PYTHONNOUSERSITE=1 PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/xdu/anaconda3/envs/simlingo/bin/python -m pytest \
tests/unit/utils/test_nvtx.py \
tests/unit/accelerator/test_accelerator.py -v
```
Key output:
```text
NVTX instrumentation calls: [('push', '_sample_nvtx_function', 'DeepSpeed', None), ('pop', 'DeepSpeed')]
CUDA NVTX domain calls: [('push', 'my_range', 'zero'), ('pop',)]
CUDA torch.nvtx fallback calls: [('push', 'my_range'), ('pop',)]
11 passed, 4 warnings in 1.88s
```
### Pre-commit
```bash
PRE_COMMIT_HOME=/tmp/pre-commit-cache PYTHONNOUSERSITE=1 /home/xdu/anaconda3/envs/simlingo/bin/python -m pre_commit run --files \
accelerator/abstract_accelerator.py \
accelerator/cpu_accelerator.py \
accelerator/cuda_accelerator.py \
accelerator/hpu_accelerator.py \
accelerator/mlu_accelerator.py \
accelerator/mps_accelerator.py \
accelerator/npu_accelerator.py \
accelerator/sdaa_accelerator.py \
accelerator/xpu_accelerator.py \
deepspeed/utils/nvtx.py \
tests/unit/utils/test_nvtx.py
```
Output:
```text
All hooks passed.
```
````
````
Signed-off-by: heurry <restart12212022@163.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Ma, Guokai <guokai.ma@gmail.com>