add channels last with mixed data type support for GroupNorm backward (#89485)
### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.
### Testing
Single socket (28cores):
* Contiguous:
shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257
* Channels Last:
shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317
Single core:
* Contiguous:
shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436
* Channels Last:
shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
| fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89485
Approved by: https://github.com/jgong5, https://github.com/malfet