Fixes PackedSequence.to (and unifies PackedSequence conversions) (#27245)
Summary:
PackedSequence.to(device) incorrectly places one of three tensors on the device and leaves the other two tensors where they are. If these devices are distinct then further operations on PackedSequence will fail. This behavior is inconsistent with Tensor.to and PackedSequence's behavior when .cuda() is called.
Additionally, PackedSequence defines multiple other conversion functions that were independently and inconsistently implemented.
This PR unifies all implementations and makes the PackedSequence.to behavior more consistent with Tensor.to. It is not completely consistent per comments. test_device_mask in test_nn.py is updated to validate the new functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27245
Differential Revision: D17757850
Pulled By: mruberry
fbshipit-source-id: 58f0bd40f1aa300fb0a91ee743483d645f977dc5