Skip Triton import for AMD (#5110)
When testing DeepSpeed inference on an `AMD Instinct MI250X/MI250` GPU,
the `pytorch-triton-rocm` module would break the `torch.cuda` device
API. To address this, importing `triton` is skipped when the GPU is
determined to be `AMD`.
This change allows DeepSpeed to be executed on an AMD GPU w/o kernel
injection in the DeepSpeedExamples [text-generation
example](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/text-generation)
using the following command:
```bash
deepspeed --num_gpus 1 inference-test.py --model facebook/opt-125m
```
TODO: Root-cause the interaction between `pytorch-triton-rocm` and
DeepSpeed to understand why this is causing the `torch.cuda` device API
to break.