[sparsity] Fix GPU training for sparsity (#66412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66412
The GPU training was not supported in the sparsifier.
The reason was that when the sparsifier was created the masks would default to the CPU.
Attaching a GPU model to the sparsifier would throw an error.
The solution is to create the masks on the same device as the weight.
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31590675
Pulled By: z-a-f
fbshipit-source-id: 98c2c1cedc7c60aecea4076e5254ef6b3443139e