Fix breakpoint API in test_script.py on TPU. (#2263)
* Fix breakpoint API in test_script.py on TPU.
* only call set_trigger on the main process
* The test passed.
* add a comment
* Call mark_step after all_reduce to make torch_xla run collective op like the torch.distributed below, rather than waiting untill the tensor is referenced again to run the pending operations.