Functionalize and compute joint simultaneously. (#88063)
This also comes with some bug fixes that were uncovered from doing
this:
- Forward device calls to inner tensor in FunctionalTensorWrapper
- Make legacyExtractDispatchKey exclude Functionalize, so that
it can get at the real device type key. This is noncontroversial.
- Stop stripping dense from key set. The reason for this is
FunctionalWrapperTensor may be used in contexts where people
query if it is dense or not. If it doesn't report this correctly
(from the dispatch key), it will cause errors. This caused some
torchbench models to fail when I did one-pass tracing.
- Save and restore reapply views TLS correctly
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88063
Approved by: https://github.com/bdhirsh