Adding output_size to to_padded_tensor (#76640)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76640
- Adding output_size argument to to_padded_tensor
- Modified add_padding_kernelLauncher and kernels to iterate over padded tensor batch size instead of nested tensor batch size
- No fast path for CPU version
Test Plan:
buck test mode/dev-nosan //caffe2/test:nested
Performance test using N1763981:
{F728168808}
Reviewed By: cpuhrsch
Differential Revision: D36056902
fbshipit-source-id: d6df2939d6649128a7f43a2ef32d227870a8e583
(cherry picked from commit 09465f36f09d4d74c9b3303981d8cce0c7c1092a)