[Static Runtime] Use composite op for TE fusion (#74126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74126
When we perform fusion without the composite op, `TensorExprDynamicGroup`, it ends up not reusing the output tensor buffers. So, until we figure out a way to do that with `TensorExprGroup` op, it seems strictly better to use composite op, even though it involves going to the JIT.
ghstack-source-id: 151191941
Test Plan:
Tested locally with `ptvsc2_predictor_bench` on the Video model.
Performance analysis with `caffe2/caffe2/fb/predictor/bench:limb` on the Video model locally showed an improvement of ~1% with this change.
Reviewed By: mikeiovine
Differential Revision: D34831280
fbshipit-source-id: e523878364b519ccd51b78d52d9f6c9d3e8def17
(cherry picked from commit 268d3b39fe78e5cf098a292aec580387d5ec8f4e)