Optimize perf for calling ops with custom classes (#38257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38257
It seems we're doing a runtime type check for custom classes on each operator call if the operator has custom class arguments.
This does not have an effect on operators without custom class arguments, but this is a problem for operators with custom class arguments,
for example operators taking a at::native::xnnpack::Conv2dOpContext argument.
The long term solution would be to move those checks to op registration time instead of doing them at call time,
but as an intermediate fix, we can at least make the check fast by
- Using ska::flat_hash_map instead of std::unordered_map
- Using std::type_index instead of std::string (i.e. avoid calling std::hash on a std::string)
ghstack-source-id: 106805209
Test Plan: waitforsandcastle
Reviewed By: ezyang
Differential Revision: D21507226
fbshipit-source-id: bd120d5574734be843c197673ea4222599fee7cb