Ensure num_threads is initialized in get_num_threads (#64486)
Summary:
Possible source of the recent layernorm CI failures. `lazy_init_num_threads` appears at the top of `parallel_for` and can change the number of threads set. So, we need to ensure `num_threads` is initialized during `get_num_threads` calls as well. It's already done this way for OpenMP, but is missing from other parallel backends.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64486
Reviewed By: mruberry
Differential Revision: D30752615
Pulled By: ngimel
fbshipit-source-id: 085873ce312edbee1254c0aaae30dec7fcfe2c57