[Distributed] Hide token from Python for _xla_all_reduce path (#4913)
Summary:
Now that we have the token cached in the C++ layer, and we really don't need to expose it to Python anymore. This is a first effort to to route one of the all_reduce to use the C++ cached token and remove the token from the pipes. This is one of the efforts to integrate pytorch/pytorch#93173.
Test Plan:
PJRT_DEVICE=TPU python test/test_mp_replication.py