[PyTorch] Refactor Dispatcher to inline less code in fast path (#51163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51163
The Dispatcher seems to have been in a precarious local
maximum: I tried to make several different changes to parameter
passing and ended up with regressions due to reduced inlining that
swamped any gains I might have gotten from the parameter passing
changes.
This diff reduces the amount of inline code on the fast path. It
should both reduce code size and provide a platform for making further
improvements to the dispatcher code.
It is a slight performance regression, but it unblocked the following
two diffs (which seem to get us back where we were) from landing.
ghstack-source-id: 120693163
Test Plan:
CI, framework overhead benchmarks to check the size of the
regression
Compared timing for empty framework overhead benchmark before/after.
Build command: `buck build mode/no-gpu //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark mode/opt-clang --show-output`
Run with `numactl -m 0 -C 3 path/to/cpp_benchmark -op empty -niter 100`
Before:
```
I0126 16:02:04.373075 2135872 bench.cpp:139] Mean 0.266272
I0126 16:02:04.373106 2135872 bench.cpp:140] Median 0.266347
I0126 16:02:04.373111 2135872 bench.cpp:141] Min 0.263585
I0126 16:02:04.373117 2135872 bench.cpp:142] stddev 0.0021264
I0126 16:02:04.373131 2135872 bench.cpp:143] stddev / mean 0.00798581
```
After:
```
I0126 16:02:30.377992 2137048 bench.cpp:139] Mean 0.27579
I0126 16:02:30.378023 2137048 bench.cpp:140] Median 0.275281
I0126 16:02:30.378029 2137048 bench.cpp:141] Min 0.270617
I0126 16:02:30.378034 2137048 bench.cpp:142] stddev 0.00308287
I0126 16:02:30.378044 2137048 bench.cpp:143] stddev / mean 0.0111783
```
Yes, it's a regression, but I compared D26069629 stacked on this diff vs not:
With this diff:
```
I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645
I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485
I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485
I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359
I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537
```
Without:
```
I0126 20:40:27.815824 3240699 bench.cpp:139] Mean 0.270755
I0126 20:40:27.815860 3240699 bench.cpp:140] Median 0.268998
I0126 20:40:27.815866 3240699 bench.cpp:141] Min 0.268306
I0126 20:40:27.815873 3240699 bench.cpp:142] stddev 0.00260365
I0126 20:40:27.815886 3240699 bench.cpp:143] stddev / mean 0.00961624
```
So we do seem to have accomplished something w.r.t. not overwhelming the inliner.
Reviewed By: ezyang
Differential Revision: D26091377
fbshipit-source-id: c9b7f4e187059fa15452b7c75fc29816022b92b1