Graceful exit on failures for multi-node runs (#2008)
* Use Popen.terminate() to stop the child processes gracefully; Kill them if terminate doesn't work
* The Popen.kill() command cause the training processes to end abruptly. This may cause the child processes to become zombies without communicating properly to the parent process about the kill signal. So the ssh session continue to wait for signals from the child processes, causing it to not return back to the pdsh command
Fixes microsoft#1995
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>