[c10d] Increase socket buffer size to allow ProcessGroup init up to 12k ranks (#107878)
The c10d socket and gloo listener both set their buffer size to 2048 which causes connection issue at 4k scale. This diff sets the buffer size to `-1` which uses `somaxconn` as the actual buffer size, aiming to enable 24k PG init without crash. The experiment shows the ability to successful creation of 12k ranks without crash.
split the original diff for OSS vs. internal.
Caution: we need the change on both gloo and c10d to enable 12k PG init. Updating only one side may not offer the benefit.
Differential Revision: [D48634654](https://our.internmc.facebook.com/intern/diff/D48634654/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107878
Approved by: https://github.com/H-Huang, https://github.com/fduwjj