DispatchKeySet perf improvements (#70364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70364
A bunch of optimizations I made while staring at callgrind, after the DispatchKeySet changes further down in this stack.
There are basically three optimizations in this PR:
- Making `DispatchKeySet`'s constexpr (where previously they weren't)
- Condensing multiple keyset membership calls into a single function call
- Making `TensorImpl::layout()` fastpath. The common case it to return `kstrided`, but we were doing a bunch of checks before returning it in most cases.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33301590
Pulled By: bdhirsh
fbshipit-source-id: 6ec28e66e7fe21f9decae317e8a4013dcf44e2fb
(cherry picked from commit 5defa1676e12ba1538ffbf7a7a0fe79eff504210)