[10/N] Update barrier with CPU/CUDA implementations (#86368)
### Changes
- Updates for the barrier collective
- NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from
### Context
https://github.com/pytorch/pytorch/issues/86225
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86368
Approved by: https://github.com/kwen2501