Fixed output memory format mismatch for bicubic2d (#90470)
Description:
- output memory format is matching input for bicubic2d
Problem: output tensor's memory format does not match input format for bicubic2d
```python
import torch
i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last)
assert i.is_contiguous(memory_format=torch.channels_last)
o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic")
assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})"
> AssertionError: Should be channels last but given channels first (True)
```
Related PR fixing bilinear ops: https://github.com/pytorch/pytorch/pull/53535 (cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @bdhirsh )
Discovered together with @NicolasHug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev
- Updated code to match grad input / output memory formats
- temporary tensor creation matches memory format in `separable_upsample_generic_Nd_kernel_impl`
- Updated tests
- Added missing forward AD support for bicubic with antialiasing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90470
Approved by: https://github.com/NicolasHug, https://github.com/lezcano