Fix for num_threads==1 in OpenMP "parallel for" (#36479)
Summary:
fixes gh-32284
Move the non-parallel stanza out of the parallel context, and use `num_threads` to limit nesting `parallel for`s. The nesting caused a memory leak in the test script in the issue.
This should probably have a test somewhere: are there tests for ParallelOpenMP?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36479
Differential Revision: D21652452
Pulled By: ilia-cher
fbshipit-source-id: 2cda7777c0eafbe268550a82fed306e52fb6eb25