pytorch
10d40764 - [PyTorch] Reduce template expansion in call_functor_with_args_from_stack (#51313)

Commit
3 years ago
[PyTorch] Reduce template expansion in call_functor_with_args_from_stack (#51313) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51313 The problem here is similar to the one described in https://devblogs.microsoft.com/cppblog/build-throughput-series-more-efficient-template-metaprogramming/ in that we are iterating over an integer seqeunce of length N, where N is the number of argument types to our function, and specializing `TypeListAt` (which we call `element_t`) for each Ith element of the typelist, which instantiates O(I) template specializations, for a total of O(N^2). The solution is also similar: we iterate over the typelist directly. Unlike in the blog post, we do also need the index in the sequence, so we retain the index_sequence. ghstack-source-id: 121363464 Test Plan: Inspect -ftime-trace output for RegisterCPU.cpp. Before: P168220187 After: P168220294 we can see that we spend less time instantiating call_functor_with_args_from_stack and spend a similar amount of time compiling it. The win is modest, but it's a win and I've already written it so I'm sending it out. (I was hoping it would reduce compilation time for make_boxed_from_unboxed_functor.) Reviewed By: bhosmer Differential Revision: D26136784 fbshipit-source-id: c91a523486e3019bd21dcd03e51a58aa25aa0981
Author
Parents
Loading