moving early termination to common_utils
This could slow down non-GPU tests since it runs cuda synchronize when cuda is available, not is the target device
fix flake8
dont stop for common_distributed
fix tests and address diff comments
use is_initialize() rather than is_available()