[3/n] Thread PG: add threaded PG implementation (#88627)
Summary: After the previous 2 diffs, finally we can add the threaded ProcessGroup implementation.
Test Plan: TBD
Reviewed By: XilunWu
Differential Revision: D40992593
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88627
Approved by: https://github.com/XilunWu, https://github.com/H-Huang