[nnc] Support thread level parallelism in fused kernels (#63386)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30360382
Pulled By: bertmaher
fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6