[PyTorch] Add NestedTensorCPU and NestedTensorCUDA dispatch keys (#75808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808
Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor.
ghstack-source-id: 154171542
(Note: this ignores all push blocking failures!)
Test Plan: CI?
Reviewed By: bdhirsh
Differential Revision: D35603836
fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6
(cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)