Update the use of nvidia-smi for GPU healthcheck (#98036)
This goes together with https://github.com/pytorch/test-infra/pull/3967 to:
* Provide a more accurate health check command with `nvidia-smi`
* Avoid running the check in the edge case when `nvidia-smi` doesn't even exist due to GitHub outage, i.e. https://github.com/pytorch/pytorch/actions/runs/4591098682/jobs/8107204277
* Also check for the number of GPU as part of the health check. The number of GPUs needs to be a power of 2 on a healthy runner. Fixes https://github.com/pytorch/test-infra/issues/4000
### Testing
Luckily, the PR picked up the broken runner https://github.com/pytorch/pytorch/actions/runs/4640688249/jobs/8213191715, and the script correctly detected that the runner had only 3/4 GPUS and shut it down
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98036
Approved by: https://github.com/weiwangmeta