[MLU] Fix FA2 check error, remove deepspeed-mlu deps. (#36159)
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
* Cambricon support SDPA and flash_attn
* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu
* Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support.
* fix testing errors.
* Merge branch 'hf/main' into main
* fix get_device_count error.
* fix mlu testing utils.
* fix code quality and style.
* switch to @require_torch_multi_accelerator