tracing for intra_op_parallel (#20603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603
When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized.
This diff also traces child tasks.
Reviewed By: ilia-cher
Differential Revision: D14820008
fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9