Make mkldnn Stream object thread_local and enable mkldnn thread-safe (#17022)
Summary:
This PR fixes following issue: https://github.com/pytorch/pytorch/issues/16828
It is a combination of two things:
1) MKLDNN streams are not thread-safe but are currently shared between different threads. This change makes them thread_local
2) By default MKLDNN primitives can share global memory and can't be invoked from multiple threads. This PR enables the MKLDNN_ENABLE_CONCURRENT_EXEC cmake configuration option that makes them thread-safe.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17022
Differential Revision: D14069052
Pulled By: ezyang
fbshipit-source-id: f8f7fcb86c40f5d751fb35dfccc2f802b6e137c6