histc: Avoid dispatch in parallel region (#68520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68520
Ref #56794
This changes the code from allocating 1 tensor per thread inside the
parallel region, to allocating one larger tensor outside the parallel
region and manually viewing each thread's slice of the histogram.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D32929365
Pulled By: ngimel
fbshipit-source-id: e28da2736e849a0282b70f34d11526d3355d5bd5