[PyTorch] Round T up to next multiple of 8 in NestedTensor case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77903
Code comment should explain why; in brief, it lets us use Tensor cores.
Differential Revision: [D36527773](https://our.internmc.facebook.com/intern/diff/D36527773/)
Approved by: https://github.com/ngimel, https://github.com/cpuhrsch