Choose num_threads in parallel_for based on GRAIN_SIZE (#26886)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24080
The OpenMP implementation of `parallel_for` now chooses the number of cores to use on a sliding scale between 1 and `OMP_NUM_THREADS`. This prevents wasteful core usage on many-core systems such as in https://github.com/pytorch/pytorch/issues/24080.
This is also consistent with the comment on GRAIN_SIZE:
https://github.com/pytorch/pytorch/blob/e327df396564f937d17b5f28e2529229260c65bf/aten/src/ATen/Parallel.h#L10-L11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26886
Differential Revision: D17610292
Pulled By: ezyang
fbshipit-source-id: 60b9fe4b0eecb41a28c1488e3a575674c8f7000c