[Static Runtime] Benchmark reports native nodes (#63346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63346
We have seen that we can get significant perf wins essentially for free by implementing native ops for ops that we cannot write out variants for (e.g. TupleUnpack D30306955 (https://github.com/pytorch/pytorch/commit/078b8004a62a51f75e1fbd8d08eea359af6bb1d7), append D30326461 (https://github.com/pytorch/pytorch/commit/9d9e7a8d7294834ddad957ddb1f4cd5a0e741e55)). Therefore, whether or not SR is using a native implementation is valuable information. By capturing this in the benchmarking suite, we can hopefully avoid wasting time profiling/manually inspecting `native_ops.cpp`
Reviewed By: hlu1
Differential Revision: D30346752
fbshipit-source-id: 205b090513b6a5a6ce4cb92f75ab0395b15d08f9