improve sort multi-core perf by adjusting grain_size w.r.t. dim_size (#74897)
Differential Revision: [D37441443](https://our.internmc.facebook.com/intern/diff/D37441443)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74897
Approved by: https://github.com/frank-wei