[PyTorch] Debug-gate static_assert in KernelFunction::makeFromUnboxedFunctor (#51367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51367
Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).
I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.
The win is not entirely clear; please see the test plan for details.
ghstack-source-id: 121378960
Test Plan:
1) Built RegisterCPU.cpp with -ftime-trace before and after. It doesn't seem to call out any difference in the details, but the overall time is stably down more like 10% (55s before and 49s after).
2) Did a full rebuild of aten-cpu with -ftime-trace before and
after. No significant difference in build times shown (it says *after*
is a regression, but it's using wall-time data and the machine is
loaded during builds so there's some noise).
3) Re-profiled with Templight.
Before:
{F366557311}
After:
{F366557501}
Not sure what to conclude overall. A known problem with templight is that template instantiations form more of a dependency graph than a tree because they're cached internally, so eliminating the first caller of a template may just move the time to another caller. However, it looks like we have actually reduced is_functor traffic.
UPDATE: I don't think that the -ftime-trace measurement was reliable; it seems to skew running times. I built this diff vs its base 5 times and measured the CPU ("user") time each time. Results (in seconds):
previous diff: [51.97, 50.54, 50.49, 52.89, 51.61]
mean: 51.5 std: 0.906
this diff: [50.53, 50.41, 50.57, 50.67, 50.94]
mean: 50.6 std: 0.179
Reviewed By: ezyang
Differential Revision: D26153793
fbshipit-source-id: 9a66912c1b2b068f453e78be57454e4e62b7107b