[doc] launcher (#868)
As discussed in https://github.com/microsoft/DeepSpeed/issues/662 this PR modifies the doc:
* explains what to use instead of CUDA_VISIBLE_DEVICES
* puts the `--hostfile` cl arg in the correct place in the invocation script
Fixes: https://github.com/microsoft/DeepSpeed/issues/662
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>