Fix device selection using CUDA_VISIBLE_DEVICES (#6530)
This PR addresses #5818.
Instead of contiguous numbers based on the device count, this PR uses
device indices in `--include`.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>