fix(num_devices): fix num_shard/num device auto compute when NVIDIA_VISIBLE_DEVICES == "all" or "void" (#3346)
* fix(num_devices): fix num_shard/num devices auto compute when NVIDIA_VISIBLE_DEVICES == "all"
the computed num_shards was always 1 in this case, no matter what
* fix(num_devices): make TGI shard auto compute compliant with nvidia-container-toolkit in cdi mode