[vulkan] Pad channels when using texture storage instead of "tight packing" (#95251)
Currently, in Vulkan 4D tensors are represented in GPU textures by simply combining the batch and channel dimensions into the depth axis. However, if the number of channels is not a multiple of 4, then data belonging to the same batch can cross texel boundaries.
For instance, consider a tensor with `N=2`, `C=3`. The depth axis of the texture would contain the data
```
|tex1|tex2|
-----------
|AAAB|BB00|
```
Where A represents data from `n=1`and B represents data form `n=2`.
This packing structure ("tight packing") makes some ops that care about batch boundaries more complex and inefficient to implement. Therefore this diff introduces channel padding when storing tensors as image textures.
The same tensor with `N=2`, `C=3` would now have the depth axis contain
```
|tex1|tex2|
-----------
|AAA0|BBB0|
```
Differential Revision: [D43068669](https://our.internmc.facebook.com/intern/diff/D43068669/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D43068669/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95251
Approved by: https://github.com/salilsdesai